Portrait of Irina Rish

Irina Rish

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Research Topics
Computational Neuroscience
Deep Learning
Generative Models
Multimodal Learning
Natural Language Processing
Online Learning
Reinforcement Learning

Biography

Irina Rish is a full professor at the Université de Montréal (UdeM), where she leads the Autonomous AI Lab, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

In addition to holding a Canada Excellence Research Chair (CERC) and a CIFAR Chair, she leads the U.S. Department of Energy’s INCITE project on Scalable Foundation Models on Summit & Frontier supercomputers at the Oak Ridge Leadership Computing Facility. She co-founded and serves as CSO of Nolano.ai.

Rish’s current research interests include neural scaling laws and emergent behaviors (capabilities and alignment) in foundation models, as well as continual learning, out-of-distribution generalization and robustness.

Before joining UdeM in 2019, she was a research scientist at the IBM T.J. Watson Research Center, where she worked on various projects at the intersection of neuroscience and AI, and led the Neuro-AI challenge. She was awarded the IBM Eminence & Excellence Award and IBM Outstanding Innovation Award (2018), IBM Outstanding Technical Achievement Award (2017) and IBM Research Accomplishment Award (2009).

She holds 64 patents and has published 120 research papers, several book chapters, three edited books and a monograph on sparse modeling.

Current Students

Research Intern
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Concordia University
Principal supervisor :
PhD - Université de Montréal
Collaborating researcher - Université de Montréal
Master's Research - Concordia University
Principal supervisor :
PhD - Université de Montréal
Collaborating researcher
Co-supervisor :
Independent visiting researcher - -
Collaborating researcher - Université de Montréal
Collaborating Alumni - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Master's Research - Concordia University
Principal supervisor :
Collaborating researcher - Université de Montréal
Collaborating Alumni - Université de Montréal
PhD - Concordia University
Principal supervisor :
Master's Research - Université de Montréal
Collaborating Alumni - Université de Montréal
Collaborating researcher
PhD - Université de Montréal
Collaborating researcher - Université de Montréal
Collaborating researcher - McGill University
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher
Co-supervisor :
Collaborating researcher - Polytechnique Montréal
PhD - McGill University
Principal supervisor :
Master's Research - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - McGill University
Principal supervisor :
Collaborating researcher
PhD - Concordia University
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating Alumni - Université de Montréal
Undergraduate - McGill University
PhD - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
PhD - McGill University

Publications

Maximum State Entropy Exploration using Predecessor and Successor Representations
Animals have a developed ability to explore that aids them in important tasks such as locating food, exploring for shelter, and finding misp… (see more)laced items. These exploration skills necessarily track where they have been so that they can plan for finding items with relative efficiency. Contemporary exploration algorithms often learn a less efficient exploration strategy because they either condition only on the current state or simply rely on making random open-loop exploratory moves. In this work, we propose
WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series
Beyond performance: the role of task demand, effort, and individual differences in ab initio pilots
Mohammad-Javad Darvishi-Bayazi
Andrew Law
Sergio Mejia Romero
Sion Jennings
Jocelyn Faubert
Aviation safety depends on the skill and expertise of pilots to meet the task demands of flying an aircraft in an effective and efficient ma… (see more)nner. During flight training, students may respond differently to imposed task demands based on individual differences in capacity, physiological arousal, and effort. To ensure that pilots achieve a common desired level of expertise, training programs should account for individual differences to optimize pilot performance. This study investigates the relationship between task performance and physiological correlates of effort in ab initio pilots. Twenty-four participants conducted a flight simulator task with three difficulty levels and were asked to rate their perceived demand and effort using the NASA TLX. We recorded heart rate, EEG brain activity, and pupil size to assess changes in the participants’ mental and physiological states across different task demands. We found that, despite group-level correlations between performance error and physiological responses, individual differences in physiological responses to task demands reflected different levels of participant effort and task efficiency. These findings suggest that physiological monitoring of student pilots might provide beneficial insights to flight instructors to optimize pilot training at the individual level.
Neural efficiency in an aviation task with different levels of difficulty: Assessing different biometrics during a performance task
Andrew Law
Sergio Mejia Romero
Sion Jennings
Jocelyn Faubert
Cognitive Models as Simulators: Using Cognitive Models to Tap into Implicit Human Feedback
Ardavan S. Nobandegani
Thomas Shultz
Continual Pre-Training of Large Language Models: How to (re)warm your model?
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes a… (see more)vailable. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch
Towards Out-of-Distribution Adversarial Robustness
Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fail… (see more)s to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different
Dialogue System with Missing Observation
Djallel Bouneffouf
Mayank Agarwal
Within the domain of dialogue, the ability to orchestrate multiple independently trained dialogue agents to create a unified system is of pa… (see more)rticular importance. Where we define orchestration as the task of selecting a subset of skills which most appropriately answer a user input using features extracted from both the user input and the individual skills. In this work, we study the task of online dialogue orchestration where the user feedback associated with the dialogue agent may not always be observed. In order to address the missing feedback setting, we propose to combine the attentive contextual bandit approach with an unsupervised learning mechanism such as clustering. By leveraging clustering to estimate missing reward, we are able to learn from each incoming event, even those with missing rewards. Promising empirical results are obtained on proprietary conversational datasets.
Estimating individual minimum calibration for deep-learning with predictive performance recovery: An example case of gait surface classification from wearable sensor gait data
Philippe C. Dixon
Towards ethical multimodal systems
Esma Aimeur
A Survey on Compositional Generalization in Applications
Djallel Bouneffouf
Broken Neural Scaling Laws
We present a smoothly broken power law functional form (that we refer to as a Broken Neural Scaling Law (BNSL)) that accurately models &… (see more) extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures & for each of various tasks within a large & diverse set of upstream & downstream tasks, in zero-shot, prompted, & finetuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, AI capabilities, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, OOD detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, fairness, molecules, computer programming/coding, math word problems, "emergent phase transitions", arithmetic, supervised learning, unsupervised/self-supervised learning, & reinforcement learning (single agent & multi-agent). When compared to other functional forms for neural scaling, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models & extrapolates scaling behavior that other functional forms are incapable of expressing such as the nonmonotonic transitions present in the scaling behavior of phenomena such as double descent & the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws