Portrait of Irina Rish

Irina Rish

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research Department
Research Topics
Computational Neuroscience
Deep Learning
Generative Models
Multimodal Learning
Natural Language Processing
Online Learning
Reinforcement Learning

Biography

Irina Rish is a full professor at the Université de Montréal (UdeM), where she leads the Autonomous AI Lab, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

In addition to holding a Canada Excellence Research Chair (CERC) and a CIFAR Chair, she leads the U.S. Department of Energy’s INCITE project on Scalable Foundation Models on Summit & Frontier supercomputers at the Oak Ridge Leadership Computing Facility. She co-founded and serves as CSO of Nolano.ai.

Rish’s current research interests include neural scaling laws and emergent behaviors (capabilities and alignment) in foundation models, as well as continual learning, out-of-distribution generalization and robustness.

Before joining UdeM in 2019, she was a research scientist at the IBM T.J. Watson Research Center, where she worked on various projects at the intersection of neuroscience and AI, and led the Neuro-AI challenge. She was awarded the IBM Eminence & Excellence Award and IBM Outstanding Innovation Award (2018), IBM Outstanding Technical Achievement Award (2017) and IBM Research Accomplishment Award (2009).

She holds 64 patents and has published 120 research papers, several book chapters, three edited books and a monograph on sparse modeling.

Current Students

Research Intern
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
PhD - McGill University
PhD - Université de Montréal
Master's Research - Concordia University
PhD - Université de Montréal
Independent visiting researcher - -
Collaborating Alumni - Université de Montréal
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Master's Research - Concordia University
Principal supervisor :
Master's Research - Université de Montréal
Collaborating Alumni - Université de Montréal
PhD - Concordia University
Principal supervisor :
Master's Research - Université de Montréal
Collaborating researcher
Master's Research - Université de Montréal
Collaborating researcher
PhD - Université de Montréal
Collaborating researcher - Université de Montréal
Collaborating researcher - McGill University
PhD - Université de Montréal
Collaborating researcher
PhD - McGill University
Principal supervisor :
Master's Research - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - McGill University
PhD - Concordia University
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating Alumni - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
PhD - McGill University

Publications

Context is Key: A Benchmark for Forecasting with Essential Textual Information
Andrew Robert Williams
Étienne Marcotte
Valentina Zantedeschi
Jithendaraa Subramanian
Roland Riachi
Alexandre Lacoste
Forecasting is a critical task in decision making across various domains. While numerical data provides a foundation, it often lacks crucial… (see more) context necessary for accurate predictions. Human forecasters frequently rely on additional information, such as background knowledge or constraints, which can be efficiently communicated through natural language. However, the ability of existing forecasting models to effectively integrate this textual information remains an open question. To address this, we introduce"Context is Key"(CiK), a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context, requiring models to integrate both modalities. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters, and propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings. By presenting this benchmark, we aim to advance multimodal forecasting, promoting models that are both accurate and accessible to decision-makers with varied technical expertise. The benchmark can be visualized at https://servicenow.github.io/context-is-key-forecasting/v0/ .
Context is Key: A Benchmark for Forecasting with Essential Textual Information
Andrew Robert Williams
Étienne Marcotte
Valentina Zantedeschi
Jithendaraa Subramanian
Roland Riachi
Alexandre Lacoste
Forecasting is a critical task in decision making across various domains. While numerical data provides a foundation, it often lacks crucial… (see more) context necessary for accurate predictions. Human forecasters frequently rely on additional information, such as background knowledge or constraints, which can be efficiently communicated through natural language. However, the ability of existing forecasting models to effectively integrate this textual information remains an open question. To address this, we introduce"Context is Key"(CiK), a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context, requiring models to integrate both modalities. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters, and propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings. By presenting this benchmark, we aim to advance multimodal forecasting, promoting models that are both accurate and accessible to decision-makers with varied technical expertise. The benchmark can be visualized at https://servicenow.github.io/context-is-key-forecasting/v0/ .
$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Introducing Brain Foundation Models
Hena Ghonia
Roland Riachi
Bruno Aristimunha
Md Rifat Arefin
Sylvain Chevallier
Brain function represents one of the most complex systems driving our world. Decoding its signals poses significant challenges, particularly… (see more) due to the limited availability of data and the high cost of recordings. The existence of large hospital datasets and laboratory collections partially mitigates this issue. However, the lack of standardized recording protocols, varying numbers of channels, diverse setups, scenarios, and recording devices further complicate the task. This work addresses these challenges by introducing the Brain Foundation Model (BFM), a suite of open-source models trained on brain signals. These models serve as foundational tools for various types of time-series neuroimaging tasks. This work presents the first model of the BFM series, which is trained on electroencephalogram signal data. Our results demonstrate that BFM-EEG can generate signals more accurately than other models. Upon acceptance, we will release the model weights and pipeline.
Language model scaling laws and zero-sum learning
Supriyo Chakraborty
Nima Chitsazan
This work aims to understand how, in terms of training dynamics, scaling up language model size yields predictable loss improvements. We fin… (see more)d that these improvements can be tied back to loss deceleration, an abrupt transition in the rate of loss improvement, characterized by piece-wise linear behavior in log-log space. Notably, improvements from increased model size appear to be a result of (1) improving the loss at which this transition occurs; and (2) improving the rate of loss improvement after this transition. As an explanation for the mechanism underlying this transition (and the effect of model size on loss it mediates), we propose the zero-sum learning (ZSL) hypothesis. In ZSL, per-token gradients become systematically opposed, leading to degenerate training dynamics where the model can't improve loss on one token without harming it on another; bottlenecking the overall rate at which loss can improve. We find compelling evidence of ZSL, as well as unexpected results which shed light on other factors contributing to ZSL.
LLMs and Personalities: Inconsistencies Across Scales
This study investigates the application of human psychometric assessments to large language models (LLMs) to examine their consistency and m… (see more)alleability in exhibiting personality traits. We administered the Big Five Inventory (BFI) and the Eysenck Personality Questionnaire-Revised (EPQ-R) to various LLMs across different model sizes and persona prompts. Our results reveal substantial variability in responses due to question order shuffling, challenging the notion of a stable LLM "personality." Larger models demonstrated more consistent responses, while persona prompts significantly influenced trait scores. Notably, the assistant persona led to more predictable scaling, with larger models exhibiting more socially desirable and less variable traits. In contrast, non-conventional personas displayed unpredictable behaviors, sometimes extending personality trait scores beyond the typical human range. These findings have important implications for understanding LLM behavior under different conditions and reflect on the consequences of scaling.
LLMs and Personalities: Inconsistencies Across Scales
This study investigates the application of human psychometric assessments to large language models (LLMs) to examine their consistency and m… (see more)alleability in exhibiting personality traits. We administered the Big Five Inventory (BFI) and the Eysenck Personality Questionnaire-Revised (EPQ-R) to various LLMs across different model sizes and persona prompts. Our results reveal substantial variability in responses due to question order shuffling, challenging the notion of a stable LLM "personality." Larger models demonstrated more consistent responses, while persona prompts significantly influenced trait scores. Notably, the assistant persona led to more predictable scaling, with larger models exhibiting more socially desirable and less variable traits. In contrast, non-conventional personas displayed unpredictable behaviors, sometimes extending personality trait scores beyond the typical human range. These findings have important implications for understanding LLM behavior under different conditions and reflect on the consequences of scaling.
RedPajama: an Open Dataset for Training Large Language Models
Maurice Weber
Daniel Y Fu
Quentin Gregory Anthony
Yonatan Oren
Shane Adams
Anton Alexandrov
Xiaozhong Lyu
Huu Nguyen
Xiaozhe Yao
Virginia Adams
Ben Athiwaratkun
Rahul Chalamala
Kezhen Chen
Max Ryabinin
Tri Dao
Percy Liang
Christopher Re
Ce Zhang
Using Unity to Help Solve Reinforcement Learning
Connor Brennan
Andrew Robert Williams
Vedant Vyas
Leveraging the depth and flexibility of XLand as well as the rapid prototyping features of the Unity engine, we present the United Unity Uni… (see more)verse — an open-source toolkit designed to accelerate the creation of innovative reinforcement learning environments. This toolkit includes a robust implementation of XLand 2.0 complemented by a user-friendly interface which allows users to modify the details of procedurally generated terrains and task rules with ease. Additionally, we provide a curated selection of terrains and rule sets, accompanied by implementations of reinforcement learning baselines to facilitate quick experimentation with novel architectural designs for adaptive agents. Furthermore, we illustrate how the United Unity Universe serves as a high-level language that enables researchers to develop diverse and endlessly variable 3D environments within a unified framework. This functionality establishes the United Unity Universe (U3) as an essential tool for advancing the field of reinforcement learning, especially in the development of adaptive and generalizable learning systems.
When Machines Outshine Humans in Object Recognition, Benchmarking Dilemma
Md Rifat Arefin
Jocelyn Faubert
Knowledge Distillation in Federated Learning: A Practical Guide
Alessio Mora
Irene Tenison
Paolo Bellavista
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models
Tejas Pandey
Aaryan Bhagat