Irina Rish

Biography

Irina Rish is a full professor at the Université de Montréal (UdeM), where she leads the Autonomous AI Lab, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

In addition to holding a Canada Excellence Research Chair (CERC) and a CIFAR Chair, she leads the U.S. Department of Energy’s INCITE project on Scalable Foundation Models on Summit & Frontier supercomputers at the Oak Ridge Leadership Computing Facility. She co-founded and serves as CSO of Nolano.ai.

Rish’s current research interests include neural scaling laws and emergent behaviors (capabilities and alignment) in foundation models, as well as continual learning, out-of-distribution generalization and robustness.

Before joining UdeM in 2019, she was a research scientist at the IBM T.J. Watson Research Center, where she worked on various projects at the intersection of neuroscience and AI, and led the Neuro-AI challenge. She was awarded the IBM Eminence & Excellence Award and IBM Outstanding Innovation Award (2018), IBM Outstanding Technical Achievement Award (2017) and IBM Research Accomplishment Award (2009).

She holds 64 patents and has published 120 research papers, several book chapters, three edited books and a monograph on sparse modeling.

Current Students

George Adamopoulos

Research Intern

Ivan Anokhin

PhD - Université de Montréal

Co-supervisor :

Samira Ebrahimi Kahou

Rifat Arefin

PhD - Université de Montréal

Arjun Ashok

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

PhD - McGill University

Principal supervisor :

Blake Richards

Mohammad Javad Darvishi Bayazi

Amin Darabi

PhD - Université de Montréal

Collaborating researcher - Université de Montréal

Wagner Drew

Master's Research - Concordia University

Principal supervisor :

Mirco Ravanelli

Mojtaba Faramarzi

PhD - Université de Montréal

Parviz Haggi Mani

Independent visiting researcher - -

Nadhir Hassen

Collaborating Alumni - Université de Montréal

Master's Research

Collaborating Alumni - Université de Montréal

Principal supervisor :

Ioannis Mitliagkas

Nizar Islah

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

PhD - Université de Montréal

Zafir Khalid

Master's Research - Concordia University

Principal supervisor :

Collaborating researcher - Université de Montréal

Neeraj Kumar

Collaborating Alumni - Université de Montréal

Gwen Legate

PhD - Concordia University

Principal supervisor :

David Lemay

Master's Research - Université de Montréal

Jonathan Lim

Collaborating researcher

Collaborating Alumni - Université de Montréal

Collaborating researcher

Andrei Mircea

PhD - Université de Montréal

Collaborating researcher - Université de Montréal

Gabriela Moisescu-Pareja

Collaborating researcher - McGill University

Principal supervisor :

Doina Precup

Timothy Nest

PhD - Université de Montréal

Co-supervisor :

Eilif B. Muller

Mohammad Pezeshki

Collaborating researcher

Co-supervisor :

PhD - McGill University

Principal supervisor :

Pouya Bashivan

Mahta Ramezanian

Master's Research - Université de Montréal

Co-supervisor :

Guillaume Dumas

Matthew Riemer

PhD - Université de Montréal

Alexis Roger

PhD - McGill University

Principal supervisor :

Blake Richards

Munish Sathish Kumar

Collaborating researcher

Vaibhav Singh

PhD - Concordia University

Principal supervisor :

Gopeshh Subbaraj

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Collaborating Alumni - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

He Zhu

PhD - McGill University

Publications

Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training

The ever-growing availability of unlabeled data presents both opportunities and challenges for training artificial intelligence systems. Whi… (see more)le self-supervised learning (SSL) has emerged as a powerful paradigm for extracting meaningful representations from vast amounts of unlabeled data, existing methods still struggle to adapt to the non-stationary, non-IID nature of real-world data streams without forgetting previously learned knowledge. Recent works have adopted a repeated cosine annealing schedule for large-scale continual pre-training; however, these schedules (1) inherently cause forgetting during the re-warming phase and (2) have not been systematically compared to existing continual SSL methods. In this work, we systematically compare the widely used cosine schedule with the recently proposed infinite learning rate schedule and empirically find the latter to be a more effective alternative. Our extensive empirical evaluation across diverse image and language datasets demonstrates that the infinite learning rate schedule consistently enhances continual pre-training performance compared to a repeated cosine decay without being restricted to a fixed iteration budget. For instance, in a small-scale MAE pre-training setup, it outperforms several strong baselines from the literature. We then scale up our experiments to larger MAE pre-training and autoregressive language model pre-training. Our results show that the infinite learning rate schedule remains effective at scale, surpassing repeated cosine decay for both MAE pre-training and zero-shot LM benchmarks.

2025-06-11

ICML.cc/2025/Workshop/ES-FoMo-III (published)

MuLoCo: Muon is a practical inner optimizer for DiLoCo

Benjamin Therien

Xiaolong Huang

2025-06-11

ICML.cc/2025/Workshop/ES-FoMo-III (published)

AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models

Jacob Chmura

Shahrad Mohammadzadeh

Taz Scott-Talib

Nishanth Anand

2025-06-09

ICML.cc/2025/Workshop/CODEML (published)

Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning

Andrei Mircea

Supriyo Chakraborty

Nima Chitsazan

Ekaterina Lobacheva

This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models … (see more)undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) decreasing the loss at which deceleration occurs, and (2) improving the log-log rate of loss improvement after deceleration. We attribute loss deceleration to a type of degenerate training dynamics we term zero-sum learning (ZSL). In ZSL, per-example gradients become systematically opposed, leading to destructive interference in per-example changes in loss. As a result, improving loss on one subset of examples degrades it on another, bottlenecking overall progress. Loss deceleration and ZSL provide new insights into the training dynamics underlying language model scaling laws, and could potentially be targeted directly to improve language models independent of scale. We make our code and artefacts available at: https://github.com/mirandrom/zsl

2025-06-05

ArXiv (preprint)

Artificial Neural Networks for Magnetoencephalography: A review of an emerging field

Vanessa Hadid

Magnetoencephalography (MEG) is a cutting-edge neuroimaging technique that measures the intricate brain dynamics underlying cognitive proces… (see more)ses with an unparalleled combination of high temporal and spatial precision. MEG data analytics has always relied on advanced signal processing and mathematical and statistical tools for various tasks ranging from data cleaning to probing the signals' rich dynamics and estimating the neural sources underlying the surface-level recordings. Like in most domains, the surge in Artificial Intelligence (AI) has led to the increased use of Machine Learning (ML) methods for MEG data classification. More recently, an emerging trend in this field is using Artificial Neural Networks (ANNs) to address many MEG-related tasks. This review provides a comprehensive overview of how ANNs are being used with MEG data from three vantage points: First, we review work that employs ANNs for MEG signal classification, i.e., for brain decoding. Second, we report on work that has used ANNs as putative models of information processing in the human brain. Finally, we examine studies that use ANNs as techniques to tackle methodological questions in MEG, including artifact correction and source estimation. Furthermore, we assess the current strengths and limitations of using ANNs with MEG and discuss future challenges and opportunities in this field. Finally, by establishing a detailed portrait of the field and providing practical recommendations for the future, this review seeks to provide a helpful reference for both seasoned MEG researchers and newcomers to the field who are interested in using ANNs to enhance the exploration of the complex dynamics of the human brain with MEG.

2025-05-27

Journal of Neural Engineering (published)

AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N

Tianyu Zhang

Andrew Robert Williams

Phillip Wozny

Kai-Hendrik Cohrs

Koen Ponse

Marco Jiralerspong

Soham Rajesh Phade

Sunil Srinivasa

Li Li

Yang Zhang

Prateek Gupta

Erman Acar

Yoshua Bengio

Stephan Zheng

2025-05-01

ICML.cc/2025/Conference (poster)

Continual Pre-training of MoEs: How robust is your router?

Benjamin Therien

Charles-Etienne Joseph

Zain Sarwar

Ashwinee Panda

Anirban Das

Shi-Xiong Zhang

Stephen Rawls

Sambit Sahu

2025-03-06

ArXiv (preprint)

Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training

2025-03-04

ArXiv (preprint)

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

2025-02-13

TMLR (accepted)

Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference

Matthew D Riemer

Gopeshh Subbaraj

Glen Berseth

Realtime environments change even as agents perform action inference and learning, thus requiring high interaction frequencies to effectivel… (see more)y minimize regret. However, recent advances in machine learning involve larger neural networks with longer inference times, raising questions about their applicability in realtime systems where reaction time is crucial. We present an analysis of lower bounds on regret in realtime reinforcement learning (RL) environments to show that minimizing long-term regret is generally impossible within the typical sequential interaction and learning paradigm, but often becomes possible when sufficient asynchronous compute is available. We propose novel algorithms for staggering asynchronous inference processes to ensure that actions are taken at consistent time intervals, and demonstrate that use of models with high action inference times is only constrained by the environment's effective stochasticity over the inference horizon, and not by action frequency. Our analysis shows that the number of inference processes needed scales linearly with increasing inference times while enabling use of models that are multiple orders of magnitude larger than existing approaches when learning from a realtime simulation of Game Boy games such as Pokémon and Tetris.

2025-01-22

ICLR.cc/2025/Conference (poster)

Handling Delay in Real-Time Reinforcement Learning

Ivan Anokhin

Rishav

Matthew D Riemer

Stephen Chung

Samira Ebrahimi Kahou

2025-01-22

ICLR.cc/2025/Conference (poster)

Handling Delay in Real-Time Reinforcement Learning

Ivan Anokhin

Rishav

Matthew D Riemer

Stephen Chung

Samira Ebrahimi Kahou

Real-time reinforcement learning (RL) introduces several challenges. First, policies are constrained to a fixed number of actions per second… (see more) due to hardware limitations. Second, the environment may change while the network is still computing an action, leading to observational delay. The first issue can partly be addressed with pipelining, leading to higher throughput and potentially better policies. However, the second issue remains: if each neuron operates in parallel with an execution time of

2025-01-22

ICLR.cc/2025/Conference (poster)