Portrait of Irina Rish

Irina Rish

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Research Topics
Computational Neuroscience
Deep Learning
Generative Models
Multimodal Learning
Natural Language Processing
Online Learning
Reinforcement Learning

Biography

Irina Rish is a full professor at the Université de Montréal (UdeM), where she leads the Autonomous AI Lab, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

In addition to holding a Canada Excellence Research Chair (CERC) and a CIFAR Chair, she leads the U.S. Department of Energy’s INCITE project on Scalable Foundation Models on Summit & Frontier supercomputers at the Oak Ridge Leadership Computing Facility. She co-founded and serves as CSO of Nolano.ai.

Rish’s current research interests include neural scaling laws and emergent behaviors (capabilities and alignment) in foundation models, as well as continual learning, out-of-distribution generalization and robustness.

Before joining UdeM in 2019, she was a research scientist at the IBM T.J. Watson Research Center, where she worked on various projects at the intersection of neuroscience and AI, and led the Neuro-AI challenge. She was awarded the IBM Eminence & Excellence Award and IBM Outstanding Innovation Award (2018), IBM Outstanding Technical Achievement Award (2017) and IBM Research Accomplishment Award (2009).

She holds 64 patents and has published 120 research papers, several book chapters, three edited books and a monograph on sparse modeling.

Current Students

Research Intern
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Concordia University
Principal supervisor :
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Collaborating researcher
Master's Research - Concordia University
Principal supervisor :
Master's Research - Université de Montréal
Professional Master's - Université de Montréal
PhD - Concordia University
Principal supervisor :
Master's Research - Université de Montréal
Collaborating researcher
Postdoctorate - Université de Montréal
Principal supervisor :
Master's Research - Université de Montréal
Collaborating Alumni
PhD - Université de Montréal
Master's Research - Université de Montréal
Master's Research - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - Université de Montréal
Co-supervisor :
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
PhD - McGill University
Principal supervisor :
Postdoctorate - Université de Montréal
Principal supervisor :
PhD - Concordia University
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
PhD - Université de Montréal
Master's Research - Université de Montréal

Publications

Towards Out-of-Distribution Adversarial Robustness
Adam Ibrahim
Charles Guille-Escuret
Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fail… (see more)s to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different
Dialogue System with Missing Observation
Djallel Bouneffouf
Mayank Agarwal
Within the domain of dialogue, the ability to orchestrate multiple independently trained dialogue agents to create a unified system is of pa… (see more)rticular importance. Where we define orchestration as the task of selecting a subset of skills which most appropriately answer a user input using features extracted from both the user input and the individual skills. In this work, we study the task of online dialogue orchestration where the user feedback associated with the dialogue agent may not always be observed. In order to address the missing feedback setting, we propose to combine the attentive contextual bandit approach with an unsupervised learning mechanism such as clustering. By leveraging clustering to estimate missing reward, we are able to learn from each incoming event, even those with missing rewards. Promising empirical results are obtained on proprietary conversational datasets.
Estimating individual minimum calibration for deep-learning with predictive performance recovery: An example case of gait surface classification from wearable sensor gait data.
Guillaume Lam
P. Dixon
Towards ethical multimodal systems
Alexis Roger
Esma Aimeur
A Survey on Compositional Generalization in Applications
Baihan Lin
Djallel Bouneffouf
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
We present a smoothly broken power law functional form (that we refer to as a Broken Neural Scaling Law (BNSL)) that accurately models&extra… (see more)polates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures&for each of various tasks within a large&diverse set of upstream&downstream tasks, in zero-shot, prompted,&finetuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, AI capabilities, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, OOD detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, fairness, molecules, computer programming/coding, math word problems,"emergent phase transitions", arithmetic, supervised learning, unsupervised/self-supervised learning,&reinforcement learning (single agent&multi-agent). When compared to other functional forms for neural scaling, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models&extrapolates scaling behavior that other functional forms are incapable of expressing such as the nonmonotonic transitions present in the scaling behavior of phenomena such as double descent&the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws
AI Agents Learn to Trust
Ardavan S. Nobandegani
T. Shultz
GOKU-UI: Ubiquitous Inference through Attention and Multiple Shooting for Continuous-time Generative Models
Germán Abrevaya
Mahta Ramezanian-Panahi
Jean-Christophe Gagnon-Audet
Pablo Polosecki
Silvina Ponce Dawson
Guillermo Cecchi
Scientific Machine Learning (SciML) is a burgeoning field that synergistically combines domain-aware and interpretable models with agnosti… (see more)c machine learning techniques. In this work, we introduce GOKU-UI, an evolution of the SciML generative model GOKU-nets. The GOKU-UI broadens the original model’s spectrum to incorporate other classes of differential equations, such as Stochastic Differential Equations (SDEs), and integrates a distributed, i.e. ubiquitous, inference through attention mechanisms and a novel multiple shooting training strategy in the latent space. These enhancements have led to a significant increase in its performance in both reconstruction and forecast tasks, as demonstrated by our evaluation of simulated and empirical data. Specifically, GOKU-UI outperformed all baseline models on synthetic datasets even with a training set 32-fold smaller, underscoring its remarkable data efficiency. Furthermore, when applied to empirical human brain data, while incorporating stochastic Stuart-Landau
Gradient Masked Averaging for Federated Learning
Irene Tenison
Sai Aravind Sreeramadas
Vaikkunth Mugunthan
Edouard Oyallon
Federated learning (FL) is an emerging paradigm that permits a large number of clients with heterogeneous data to coordinate learning of a u… (see more)nified global model without the need to share data amongst each other. A major challenge in federated learning is the heterogeneity of data across client, which can degrade the performance of standard FL algorithms. Standard FL algorithms involve averaging of model parameters or gradient updates to approximate the global model at the server. However, we argue that in heterogeneous settings, averaging can result in information loss and lead to poor generalization due to the bias induced by dominant client gradients. We hypothesize that to generalize better across non-i.i.d datasets, the algorithms should focus on learning the invariant mechanism that is constant while ignoring spurious mechanisms that differ across clients. Inspired from recent works in Out-of-Distribution generalization, we propose a gradient masked averaging approach for FL as an alternative to the standard averaging of client updates. This aggregation technique for client updates can be adapted as a drop-in replacement in most existing federated algorithms. We perform extensive experiments on multiple FL algorithms with in-distribution, real-world, feature-skewed out-of-distribution, and quantity imbalanced datasets and show that it provides consistent improvements, particularly in the case of heterogeneous clients.
Lag-Llama: Towards Foundation Models for Time Series Forecasting
Kashif Rasul
Arjun Ashok
Andrew Robert Williams
Arian Khorasani
George Adamopoulos
Rishika Bhagwatkar
Marin Biloš
Hena Ghonia
N. Hassen
Anderson Schneider
Sahil Garg
Yuriy Nevmyvaka
Aiming to build foundation models for time-series forecasting and study their scaling behavior, we present here our work-in-progress on Lag-… (see more)Llama , a general-purpose univariate probabilistic time-series forecasting model trained on a large collection of time-series data. The model shows good zero-shot prediction capabilities on unseen “out-of-distribution” time-series datasets, outperforming supervised baselines. We use smoothly broken power-laws [7] to fit and predict model scaling behavior. The open source code is made available at https://github
Towards Continual Reinforcement Learning: A Review and Perspectives
Continual Learning with Foundation Models: An Empirical Study of Latent Replay
Oleksiy Ostapenko
Timothee LESORT
Pau Rodriguez
Md Rifat Arefin
Arthur Douillard
Rapid development of large-scale pre-training has resulted in foundation models that can act as effective feature extractors on a variety of… (see more) downstream tasks and domains. Motivated by this, we study the efficacy of pre-trained vision models as a foundation for downstream continual learning (CL) scenarios. Our goal is twofold. First, we want to understand the compute-accuracy trade-off between CL in the raw-data space and in the latent space of pre-trained encoders. Second, we investigate how the characteristics of the encoder, the pre-training algorithm and data, as well as of the resulting latent space affect CL performance. For this, we compare the efficacy of various pre-trained models in large-scale benchmarking scenarios with a vanilla replay setting applied in the latent and in the raw-data space. Notably, this study shows how transfer, forgetting, task similarity and learning are dependent on the input data characteristics and not necessarily on the CL algorithms. First, we show that under some circumstances reasonable CL performance can readily be achieved with a non-parametric classifier at negligible compute. We then show how models pre-trained on broader data result in better performance for various replay sizes. We explain this with representational similarity and transfer properties of these representations. Finally, we show the effectiveness of self-supervised pre-training for downstream domains that are out-of-distribution as compared to the pre-training domain. We point out and validate several research directions that can further increase the efficacy of latent CL including representation ensembling. The diverse set of datasets used in this study can serve as a compute-efficient playground for further CL research. We will publish the code.