Irina Rish

Biography

Irina Rish is a full professor at the Université de Montréal (UdeM), where she leads the Autonomous AI Lab, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

In addition to holding a Canada Excellence Research Chair (CERC) and a CIFAR Chair, she leads the U.S. Department of Energy’s INCITE project on Scalable Foundation Models on Summit & Frontier supercomputers at the Oak Ridge Leadership Computing Facility. She co-founded and serves as CSO of Nolano.ai.

Rish’s current research interests include neural scaling laws and emergent behaviors (capabilities and alignment) in foundation models, as well as continual learning, out-of-distribution generalization and robustness.

Before joining UdeM in 2019, she was a research scientist at the IBM T.J. Watson Research Center, where she worked on various projects at the intersection of neuroscience and AI, and led the Neuro-AI challenge. She was awarded the IBM Eminence & Excellence Award and IBM Outstanding Innovation Award (2018), IBM Outstanding Technical Achievement Award (2017) and IBM Research Accomplishment Award (2009).

She holds 64 patents and has published 120 research papers, several book chapters, three edited books and a monograph on sparse modeling.

Current Students

George Adamopoulos

Research Intern

Ivan Anokhin

PhD - Université de Montréal

Co-supervisor :

Samira Ebrahimi Kahou

Rifat Arefin

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

PhD - McGill University

Principal supervisor :

Blake Richards

Mohammad Javad Darvishi Bayazi

Amin Darabi

PhD - Université de Montréal

Collaborating researcher - Université de Montréal

Wagner Drew

Master's Research - Concordia University

Principal supervisor :

Mirco Ravanelli

Mojtaba Faramarzi

PhD - Université de Montréal

Kshitij Gupta

Collaborating researcher

Co-supervisor :

Sarath Chandar

Parviz Haggi Mani

Independent visiting researcher - -

Nadhir Hassen

Collaborating researcher - Université de Montréal

Master's Research

Collaborating Alumni - Université de Montréal

Principal supervisor :

Ioannis Mitliagkas

Nizar Islah

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

PhD - Université de Montréal

Zafir Khalid

Master's Research - Concordia University

Principal supervisor :

Collaborating researcher - Université de Montréal

Neeraj Kumar

Collaborating Alumni - Université de Montréal

Gwen Legate

PhD - Concordia University

Principal supervisor :

David Lemay

Master's Research - Université de Montréal

Amin Mansouri

Collaborating Alumni - Université de Montréal

Collaborating researcher

Andrei Mircea

PhD - Université de Montréal

Collaborating researcher - Université de Montréal

Gabriela Moisescu-Pareja

Collaborating researcher - McGill University

Principal supervisor :

Doina Precup

Timothy Nest

PhD - Université de Montréal

Co-supervisor :

Eilif B. Muller

Mohammad Pezeshki

Collaborating researcher

Co-supervisor :

PhD - McGill University

Principal supervisor :

Pouya Bashivan

Mahta Ramezanian

Master's Research - Université de Montréal

Co-supervisor :

Guillaume Dumas

Matthew Riemer

PhD - Université de Montréal

Alexis Roger

PhD - McGill University

Principal supervisor :

Blake Richards

Munish Sathish Kumar

Collaborating researcher

Vaibhav Singh

PhD - Concordia University

Principal supervisor :

Gopeshh Subbaraj

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Collaborating Alumni - Université de Montréal

Sihui Wei

Undergraduate - McGill University

Andrew Williams

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

He Zhu

PhD - McGill University

Publications

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Andrew Robert Williams

Étienne Marcotte

Valentina Zantedeschi

Jithendaraa Subramanian

Alexandre Lacoste

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Andrew Robert Williams

Étienne Marcotte

Valentina Zantedeschi

Jithendaraa Subramanian

Alexandre Lacoste

Forecasting is a critical task in decision-making across numerous domains. While historical numerical data provide a start, they fail to con… (see more)vey the complete context for reliable and accurate predictions. Human forecasters frequently rely on additional information, such as background knowledge and constraints, which can efficiently be communicated through natural language. However, in spite of recent progress with LLM-based forecasters, their ability to effectively integrate this textual information remains an open question. To address this, we introduce "Context is Key" (CiK), a time-series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context, requiring models to integrate both modalities; crucially, every task in CiK requires understanding textual context to be solved successfully. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters, and propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings. This benchmark aims to advance multimodal forecasting by promoting models that are both accurate and accessible to decision-makers with varied technical expertise. The benchmark can be visualized at https://servicenow.github.io/context-is-key-forecasting/v0.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

proceedings.mlr.press

AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N

Tianyu Zhang

Andrew Robert Williams

Phillip Wozny

Kai-Hendrik Cohrs

Koen Ponse

Marco Jiralerspong

Soham Rajesh Phade

Sunil Srinivasa

Li Li

Yang Zhang

Prateek Gupta

Erman Acar

Yoshua Bengio

Stephan Zheng

Global cooperation on climate change mitigation is essential to limit temperature increases while supporting long-term, equitable economic g… (see more)rowth and sustainable development. Achieving such cooperation among diverse regions, each with different incentives, in a dynamic environment shaped by complex geopolitical and economic factors, without a central authority, is a profoundly challenging game-theoretic problem. This article introduces RICE-N, a multi-region integrated assessment model that simulates the global climate, economy, and climate negotiations and agreements. RICE-N uses multi-agent reinforcement learning (MARL) to encourage agents to develop strategic behaviors based on the environmental dynamics and the actions of the others. We present two negotiation protocols: (1) Bilateral Negotiation, an exemplary protocol and (2) Basic Club, inspired from Climate Clubs and the carbon border adjustment mechanism (Nordhaus, 2015; Comissions, 2022). We compare their impact against a no-negotiation baseline with various mitigation strategies, showing that both protocols significantly reduce temperature growth at the cost of a minor drop in production while ensuring a more equitable distribution of the emission reduction costs.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

proceedings.mlr.press

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

Kevin Kasa

Graham W. Taylor

Krishnamurthy Dj Dvijotham

Alexandre Lacoste

AI agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cau… (see more)se unintended or harmful behavior. Inspired by the well-established concept of firewalls, we show that a simple, modular and model-agnostic defense operating at the agent--tool interface achieves perfect security (0% or the lowest possible attack success rate) with high utility (task success rate) across four public benchmarks: AgentDojo, Agent Security Bench, InjecAgent and tau-Bench, while achieving a state-of-the-art security-utility tradeoff compared to prior results. Specifically, we employ a defense based on two firewalls: a Tool-Input Firewall (Minimizer) and a Tool-Output Firewall (Sanitizer). Unlike prior complex approaches, this firewall defense makes minimal assumptions on the agent and can be deployed out-of-the-box, while maintaining strong performance without compromising utility. However, our analysis also reveals critical limitations in these existing benchmarks, including flawed success metrics, implementation bugs, and most importantly, weak attacks, hindering significant progress in the field. To foster more meaningful progress, we present targeted fixes to these issues for AgentDojo and Agent Security Bench while proposing best-practices for more robust benchmark design. Further, we demonstrate that although these firewalls push the state-of-the-art on existing benchmarks, it is still possible to bypass them in practice, underscoring the need to incorporate stronger attacks in security benchmarks. Overall, our work shows that existing agentic security benchmarks are easily saturated by a simple approach and highlights the need for stronger agentic security benchmarks with carefully chosen evaluation metrics and strong adaptive attacks.

2025-10-06

ArXiv (preprint)

arxiv.org

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

Kevin Kasa

Graham W. Taylor

Krishnamurthy Dj Dvijotham

Alexandre Lacoste

2025-10-06

ArXiv (preprint)

arxiv.org

A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy

Maxime Heuillet

Rishika Bhagwatkar

Jonas Ngnawe

Yann Batiste Pequignot

Ola Ahmad

Deep learning models operating in the image domain are vulnerable to small input perturbations. For years, robustness to such perturbations … (see more)was pursued by training models from scratch (i.e., with random initializations) using specialized loss ob- jectives. Recently, robust fine-tuning has emerged as a more efficient alternative: instead of training from scratch, pretrained models are adapted to maximize pre- dictive performance and robustness. To conduct robust fine-tuning, practitioners design an optimization strategy that includes the model update protocol (e.g., full or partial) and the specialized loss objective. Additional design choices include the architecture type and size, and the pretrained representation. These design choices affect robust generalization, which is the model’s ability to maintain performance when exposed to new and unseen perturbations at test time. Understanding how these design choices influence generalization remains an open question with signif- icant practical implications. In response, we present an empirical study spanning 6 datasets, 40 pretrained architectures, 2 specialized losses, and 3 adaptation proto- cols — yielding 1, 440 training configurations and 7, 200 robustness measurements across five perturbation types. To our knowledge, this is the most diverse and comprehensive benchmark of robust fine-tuning to date. While attention-based architectures and robust pretrained representations are increasingly popular, we find that convolutional neural networks pretrained in a supervised manner on large datasets often perform best. Our analysis both confirms and challenges prior design assumptions, highlighting promising research directions and offering practical guidance.

2025-09-29

NeurIPS.cc/2025/Workshop/Reliable_ML (published)

A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy

Maxime Heuillet

Rishika Bhagwatkar

Jonas Ngnawe

Yann Batiste Pequignot

Ola Ahmad

Deep learning models operating in the image domain are vulnerable to small input perturbations. For years, robustness to such perturbations … (see more)was pursued by training models from scratch (i.e., with random initializations) using specialized loss objectives. Recently, robust fine-tuning has emerged as a more efficient alternative: instead of training from scratch, pretrained models are adapted to maximize predictive performance and robustness. To conduct robust fine-tuning, practitioners design an optimization strategy that includes the model update protocol (e.g., full or partial) and the specialized loss objective. Additional design choices include the architecture type and size, and the pretrained representation. These design choices affect robust generalization, which is the model's ability to maintain performance when exposed to new and unseen perturbations at test time. Understanding how these design choices influence generalization remains an open question with significant practical implications. In response, we present an empirical study spanning 6 datasets, 40 pretrained architectures, 2 specialized losses, and 3 adaptation protocols, yielding 1,440 training configurations and 7,200 robustness measurements across five perturbation types. To our knowledge, this is the most diverse and comprehensive benchmark of robust fine-tuning to date. While attention-based architectures and robust pretrained representations are increasingly popular, we find that convolutional neural networks pretrained in a supervised manner on large datasets often perform best. Our analysis both confirms and challenges prior design assumptions, highlighting promising research directions and offering practical guidance.

2025-09-29

NeurIPS.cc/2025/Workshop/Reliable_ML (published)

Continual Pre-training of MoEs: How robust is your router?

Benjamin Therien

Charles-Etienne Joseph

Zain Sarwar

Ashwinee Panda

Anirban Das

Shi-Xiong Zhang

Stephen Rawls

Sambit Sahu

Sparsely-activated Mixture of Experts (MoE) transformers are promising architectures for foundation models. Compared to dense transformers t… (see more)hat require the same amount of floating-point operations (FLOPs) per forward pass, MoEs benefit from improved sample efficiency at training time and achieve much stronger performance. Many closed-source and open-source frontier language models have thus adopted an MoE architecture. Naturally, practitioners will want to extend the capabilities of these models with large amounts of newly collected data without completely re-training them. Prior work has shown that a simple combination of replay, learning rate re-warming, and re-decaying can enable the continual pre-training (CPT) of dense decoder-only transformers with minimal performance degradation compared to full re-training. In the case of decoder-only MoE transformers, however, it is unclear how the routing algorithm will impact continual pre-training performance: 1) *do the MoE transformer's routers exacerbate forgetting relative to a dense model?*; 2) *do the routers maintain a balanced load on previous distributions after CPT?*; 3) *are the same strategies applied to dense models sufficient to continually pre-train MoE LLMs?* In what follows, we conduct a large-scale study training a 500M parameter dense transformer and four 500M-active/2B-total parameter MoE transformers, following the Switch Transformer architecture and a granular DeepSeek-inspired architecture. Each model is trained for 600B tokens. Our results establish a surprising robustness to distribution shifts for MoEs using both Sinkhorn-Balanced and Z-and-Aux-loss-balanced routing algorithms, even in MoEs continually pre-trained without replay. Moreover, we show that MoE LLMs maintain their sample efficiency (relative to a FLOP-matched dense model) during CPT and that they can match the performance of a fully re-trained MoE at a fraction of the cost.

2025-09-26

TMLR (accepted)

Continual Pre-training of MoEs: How robust is your router?

Benjamin Therien

Charles-Etienne Joseph

Zain Sarwar

Ashwinee Panda

Anirban Das

Shi-Xiong Zhang

Stephen Rawls

Sambit Sahu

2025-09-26

TMLR (accepted)

Beyond Naïve Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs

Andrew Robert Williams

Vincent Zhihao Zheng

Nicolas Chapados

Étienne Marcotte

Valentina Zantedeschi

Alexandre Drouin

Forecasting in real-world settings requires models to integrate not only historical data but also relevant contextual information, often ava… (see more)ilable in textual form. While recent work has shown that large language models (LLMs) can be effective context-aided forecasters via naïve direct prompting, their full potential remains underexplored. We address this gap with 4 strategies, providing new insights into the zero-shot capabilities of LLMs in this setting. ReDP improves interpretability by eliciting explicit reasoning traces, allowing us to assess the model’s reasoning over the context independently from its forecast accuracy. CorDP leverages LLMs solely to refine existing forecasts with context, enhancing their applicability in real-world forecasting pipelines. IC-DP proposes embedding historical examples of context-aided forecasting tasks in the prompt, substantially improving accuracy even for the largest models. Finally, RouteDP optimizes resource efficiency by using LLMs to estimate task difficulty, and routing the most challenging tasks to larger models. Evaluated on different kinds of context-aided forecasting tasks from the CiK benchmark, our strategies demonstrate distinct benefits over naïve prompting across LLMs of different sizes and families. These results open the door to further simple yet effective improvements in LLM-based context-aided forecasting.

2025-09-23

NeurIPS.cc/2025/Workshop/BERT2S (published)

Beyond Na\"ive Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs

Andrew Robert Williams

Vincent Zhihao Zheng

Nicolas Chapados

Étienne Marcotte

Valentina Zantedeschi

Alexandre Drouin

Forecasting in real-world settings requires models to integrate not only historical data but also relevant contextual information, often ava… (see more)ilable in textual form. While recent work has shown that large language models (LLMs) can be effective context-aided forecasters via na\"ive direct prompting, their full potential remains underexplored. We address this gap with 4 strategies, providing new insights into the zero-shot capabilities of LLMs in this setting. ReDP improves interpretability by eliciting explicit reasoning traces, allowing us to assess the model's reasoning over the context independently from its forecast accuracy. CorDP leverages LLMs solely to refine existing forecasts with context, enhancing their applicability in real-world forecasting pipelines. IC-DP proposes embedding historical examples of context-aided forecasting tasks in the prompt, substantially improving accuracy even for the largest models. Finally, RouteDP optimizes resource efficiency by using LLMs to estimate task difficulty, and routing the most challenging tasks to larger models. Evaluated on different kinds of context-aided forecasting tasks from the CiK benchmark, our strategies demonstrate distinct benefits over na\"ive prompting across LLMs of different sizes and families. These results open the door to further simple yet effective improvements in LLM-based context-aided forecasting.

2025-09-23

NeurIPS.cc/2025/Workshop/BERT2S (published)