Irina Rish

Biographie

Irina Rish est professeure titulaire à l'Université de Montréal (UdeM), où elle dirige le Laboratoire d'IA autonome. Membre du corps professoral de Mila – Institut québécois d’intelligence artificielle, elle est titulaire d'une chaire d'excellence en recherche du Canada (CERC) et d'une chaire en IA Canada-CIFAR. Irina dirige le projet INCITE du ministère américain de l'Environnement au sujet des modèles de fondation évolutifs sur les superordinateurs Summit et Frontier à l'Oak Ridge Leadership Computing Facility (OLCF). Elle est cofondatrice et directrice scientifique de Nolano.ai.

Ses recherches actuelles portent sur les lois de mise à l'échelle neuronale et les comportements émergents (capacités et alignement) dans les modèles de fondation, ainsi que sur l'apprentissage continu, la généralisation hors distribution et la robustesse. Avant de se joindre à l'UdeM en 2019, Irina était chercheuse au Centre de recherche IBM Thomas J. Watson, où elle a travaillé sur divers projets à l'intersection des neurosciences et de l'IA, et dirigé le défi NeuroAI. Elle a reçu plusieurs prix IBM : ceux de l’excellence et de l’innovation exceptionnelle (2018), celui de la réalisation technique exceptionnelle (2017), et celui de l’accomplissement en recherche (2009). Elle détient 64 brevets et a écrit plus de 120 articles de recherche, plusieurs chapitres de livres, trois livres publiés et une monographie sur la modélisation éparse.

Étudiants actuels

George Adamopoulos

Stagiaire de recherche

Ivan Anokhin

Doctorat - UdeM

Co-superviseur⋅e :

Samira Ebrahimi Kahou

Doctorat - UdeM

Arjun Ashok

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Mohammad Javad Darvishi Bayazi

Amin Darabi

Doctorat - UdeM

Collaborateur·rice de recherche - UdeM

Wagner Drew

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Collaborateur·rice de recherche

Co-superviseur⋅e :

Sarath Chandar

Parviz Haggi Mani

Visiteur de recherche indépendant - -

Nadhir Hassen

Collaborateur·rice de recherche - UdeM

Maîtrise recherche

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Ioannis Mitliagkas

Nizar Islah

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Neeraj Kumar

Collaborateur·rice alumni - UdeM

Gwen Legate

Doctorat - Concordia

Superviseur⋅e principal⋅e :

David Lemay

Maîtrise recherche - UdeM

Amin Mansouri

Collaborateur·rice alumni - UdeM

Collaborateur·rice de recherche

Doctorat - UdeM

Collaborateur·rice de recherche - UdeM

Gabriela Moisescu-Pareja

Collaborateur·rice de recherche - McGill

Superviseur⋅e principal⋅e :

Doina Precup

Timothy Nest

Doctorat - UdeM

Co-superviseur⋅e :

Eilif B. Muller

Mohammad Pezeshki

Collaborateur·rice de recherche

Co-superviseur⋅e :

Collaborateur·rice de recherche - Polytechnique

Motahareh Pourrahimi

Doctorat - McGill

Superviseur⋅e principal⋅e :

Pouya Bashivan

Mahta Ramezanian

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Guillaume Dumas

Matthew Riemer

Doctorat - UdeM

Alexis Roger

Doctorat - McGill

Superviseur⋅e principal⋅e :

Blake Richards

Munish Sathish Kumar

Collaborateur·rice de recherche

Vaibhav Singh

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Collaborateur·rice alumni - UdeM

Sihui Wei

Baccalauréat - McGill

Andrew Williams

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

He Zhu

Doctorat - McGill

Publications

$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers

Charles-Etienne Joseph

Learned optimizers (LOs) have the potential to significantly reduce the wall-clock training time of neural networks. However, they can strug… (voir plus)gle to optimize unseen tasks (*meta-generalize*), especially when training networks wider than those seen during meta-training. To address this, we derive the Maximal Update Parametrization (

2025-12-31

International Conference on Learning Representations (Accept (Poster))

Critical Role of EEG Signals in Assessment of Sex-Specific Insights in Neurological Diagnostics via Machine Learning Approach

Mohammad-Javad Darvishi-Bayazi

Mohammad Sajjad Ghaemi

Jocelyn Faubert

Abstract

Early detection and diagnosis of pathology are essential for efficient treatment and therapeutic … (voir plus)interventions. The emergence of Artificial Intelligence (AI) and deep machine learning techniques have demonstrated the promising capability of brain imaging data to predict various pathological diseases. However, plenty of diseases have imbalanced distribution across different sexes. Furthermore, the impact of sex-specific patterns and biomarkers in predicting diseases has remained unexplored as a fundamental subject matter to inform the treatment paradigms. This paper underscored the generalization and transferability of sex-related patterns in functional data, specifically Electroencephalogram (EEG) signals through Artificial Deep Neural Networks. We conducted training on a broad spectrum of EEG recordings involving participants ranging from 221 to 12,000, including healthy and pathological subjects. Our evaluation leveraged datasets from various sources and participant groups, featuring distribution shifts. While the artificial models demonstrated accurate sex detection on datasets without fine-tuning, their performance declined with significant distribution shifts. Furthermore, we explored the relationship between sex and pathology by visualizing salient features for target detection in distinct subgroups. Our findings revealed unprecedented insights into the negligible role of sex-specific patterns in pathology detection despite the presence of prominent and consistent patterns within sex groups. These results are essential for developing more robust and unbiased AI models for disease prediction and informing the treatment paradigms.

2025-12-09

Scientific Reports (publié)

Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

Kevin Kasa

Graham W. Taylor

Krishnamurthy Dj Dvijotham

Alexandre Lacoste

AI agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cau… (voir plus)se unintended or harmful behavior. Inspired by the well-established concept of firewalls, we show that a simple, modular and model-agnostic defense operating at the agent--tool interface achieves perfect security (0% or the lowest possible attack success rate) with high utility (task success rate) across four public benchmarks: AgentDojo, Agent Security Bench, InjecAgent and tau-Bench, while achieving a state-of-the-art security-utility tradeoff compared to prior results. Specifically, we employ a defense based on two firewalls: a Tool-Input Firewall (Minimizer) and a Tool-Output Firewall (Sanitizer). Unlike prior complex approaches, this firewall defense makes minimal assumptions on the agent and can be deployed out-of-the-box, while maintaining strong performance without compromising utility. However, our analysis also reveals critical limitations in these existing benchmarks, including flawed success metrics, implementation bugs, and most importantly, weak attacks, hindering significant progress in the field. To foster more meaningful progress, we present targeted fixes to these issues for AgentDojo and Agent Security Bench while proposing best-practices for more robust benchmark design. Further, we demonstrate that although these firewalls push the state-of-the-art on existing benchmarks, it is still possible to bypass them in practice, underscoring the need to incorporate stronger attacks in security benchmarks. Overall, our work shows that existing agentic security benchmarks are easily saturated by a simple approach and highlights the need for stronger agentic security benchmarks with carefully chosen evaluation metrics and strong adaptive attacks.

2025-09-30

arXiv (publié)

arxiv.org

A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy

Ola Ahmad

Deep learning models operating in the image domain are vulnerable to small input perturbations. For years, robustness to such perturbations … (voir plus)was pursued by training models from scratch (i.e., with random initializations) using specialized loss objectives. Recently, robust fine-tuning has emerged as a more efficient alternative: instead of training from scratch, pretrained models are adapted to maximize predictive performance and robustness. To conduct robust fine-tuning, practitioners design an optimization strategy that includes the model update protocol (e.g., full or partial) and the specialized loss objective. Additional design choices include the architecture type and size, and the pretrained representation. These design choices affect robust generalization, which is the model's ability to maintain performance when exposed to new and unseen perturbations at test time. Understanding how these design choices influence generalization remains an open question with significant practical implications. In response, we present an empirical study spanning 6 datasets, 40 pretrained architectures, 2 specialized losses, and 3 adaptation protocols, yielding 1,440 training configurations and 7,200 robustness measurements across five perturbation types. To our knowledge, this is the most diverse and comprehensive benchmark of robust fine-tuning to date. While attention-based architectures and robust pretrained representations are increasingly popular, we find that convolutional neural networks pretrained in a supervised manner on large datasets often perform best. Our analysis both confirms and challenges prior design assumptions, highlighting promising research directions and offering practical guidance.

2025-09-28

NeurIPS.cc/2025/Workshop/Reliable_ML (publié)

Continual Pre-training of MoEs: How robust is your router?

Charles-Etienne Joseph

Zain Sarwar

Ashwinee Panda

Anirban Das

Shi-Xiong Zhang

Stephen Rawls

Sambit Sahu

2025-09-25

TMLR (accepté)

Beyond Naive Prompting: Strategies for Improved Zero-shot Context-aided Forecasting with LLMs

Arjun Ashok

Andrew Robert Williams

Vincent Zhihao Zheng

Nicolas Chapados

Étienne Marcotte

Valentina Zantedeschi

Alexandre Drouin

Forecasting in real-world settings requires models to integrate not only historical data but also relevant contextual information, often ava… (voir plus)ilable in textual form. While recent work has shown that large language models (LLMs) can be effective context-aided forecasters via naïve direct prompting, their full potential remains underexplored. We address this gap with 4 strategies, providing new insights into the zero-shot capabilities of LLMs in this setting. ReDP improves interpretability by eliciting explicit reasoning traces, allowing us to assess the model's reasoning over the context independently from its forecast accuracy. CorDP leverages LLMs solely to refine existing forecasts with context, enhancing their applicability in real-world forecasting pipelines. IC-DP proposes embedding historical examples of context-aided forecasting tasks in the prompt, substantially improving accuracy even for the largest models. Finally, RouteDP optimizes resource efficiency by using LLMs to estimate task difficulty, and routing the most challenging tasks to larger models. Evaluated on different kinds of context-aided forecasting tasks from the CiK benchmark, our strategies demonstrate distinct benefits over naïve prompting across LLMs of different sizes and families. These results open the door to further simple yet effective improvements in LLM-based context-aided forecasting.

2025-09-22

NeurIPS.cc/2025/Workshop/BERT2S (poster)

Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models

Istabrak Abbes

Gopeshh Subbaraj

Matthew D Riemer

Nizar Islah

Tsuguchika Tabaru

Hiroaki Kingetsu

A. Chandar

2025-09-21

NeurIPS.cc/2025/Workshop/WiML (publié)

Warming Up for Zeroth-Order Federated Pre-Training with Low Resource Clients

Gwen Legate

Federated learning enables collaborative model training across numerous edge devices without requiring participants to share data; however, … (voir plus)memory and communication constraints on these edge devices may preclude their participation in training. We consider a setting in which a subset of edge devices are below a critical memory or communication threshold required to conduct model updates. Under typical federated optimization algorithms, these devices are excluded from training which renders their data inaccessible and increases system induced bias. We are inspired by MeZO, a zeroth-order method used for memory-efficient fine-tuning. The increased variance inherent to zeroth-order gradient approximations has relegated previous zeroth-order optimizers exclusively to the domain of fine tuning; a limitation we seek to correct. We devise a federated, memory-efficient zeroth-order optimizer, ZOWarmUp that permits zeroth-order training from a random initialization. ZOWarmUp leverages differing client capabilities and careful variance reduction techniques to facilitate participation of under-represented, low-resource clients in model training. Like other federated zeroth-order methods, ZOWarmUp eliminates the need for edge devices to transmit their full gradients to the server and instead relies on only a small set of random seeds, rendering the up-link communication cost negligible. We present experiments using various datasets and model architectures to show that ZOWarmUp is a robust algorithm that can can be applied under a wide variety of circumstances. For systems with a high proportion of edge devices that would otherwise be excluded from training, this algorithm provides access to a greater volume and diversity of data, thus improving training outcomes.

2025-09-02

ArXiv (prépublication)

arxiv.org

Persistent Instability in LLM's Personality Measurements: Effects of Scale, Reasoning, and Conversation History

Tommaso Tosato

Saskia Helbling

Yorguin-Jose Mantilla-Ramos

Mahmood Hegazy

Alberto Tosato

D. Lemay

Guillaume Dumas

2025-08-05

ArXiv (prépublication)

arxiv.org

Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training

The ever-growing availability of unlabeled data presents both opportunities and challenges for training artificial intelligence systems. Whi… (voir plus)le self-supervised learning (SSL) has emerged as a powerful paradigm for extracting meaningful representations from vast amounts of unlabeled data, existing methods still struggle to adapt to the non-stationary, non-IID nature of real-world data streams without forgetting previously learned knowledge. Recent works have adopted a repeated cosine annealing schedule for large-scale continual pre-training; however, these schedules (1) inherently cause forgetting during the re-warming phase and (2) have not been systematically compared to existing continual SSL methods. In this work, we systematically compare the widely used cosine schedule with the recently proposed infinite learning rate schedule and empirically find the latter to be a more effective alternative. Our extensive empirical evaluation across diverse image and language datasets demonstrates that the infinite learning rate schedule consistently enhances continual pre-training performance compared to a repeated cosine decay without being restricted to a fixed iteration budget. For instance, in a small-scale MAE pre-training setup, it outperforms several strong baselines from the literature. We then scale up our experiments to larger MAE pre-training and autoregressive language model pre-training. Our results show that the infinite learning rate schedule remains effective at scale, surpassing repeated cosine decay for both MAE pre-training and zero-shot LM benchmarks.

2025-06-10

ICML.cc/2025/Workshop/ES-FoMo-III (publié)

MuLoCo: Muon is a practical inner optimizer for DiLoCo

Xiaolong Huang

2025-06-10

ICML.cc/2025/Workshop/ES-FoMo-III (publié)