Irina Rish

Biography

Irina Rish is a full professor at the Université de Montréal (UdeM), where she leads the Autonomous AI Lab, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

In addition to holding a Canada Excellence Research Chair (CERC) and a CIFAR Chair, she leads the U.S. Department of Energy’s INCITE project on Scalable Foundation Models on Summit & Frontier supercomputers at the Oak Ridge Leadership Computing Facility. She co-founded and serves as CSO of Nolano.ai.

Rish’s current research interests include neural scaling laws and emergent behaviors (capabilities and alignment) in foundation models, as well as continual learning, out-of-distribution generalization and robustness.

Before joining UdeM in 2019, she was a research scientist at the IBM T.J. Watson Research Center, where she worked on various projects at the intersection of neuroscience and AI, and led the Neuro-AI challenge. She was awarded the IBM Eminence & Excellence Award and IBM Outstanding Innovation Award (2018), IBM Outstanding Technical Achievement Award (2017) and IBM Research Accomplishment Award (2009).

She holds 64 patents and has published 120 research papers, several book chapters, three edited books and a monograph on sparse modeling.

Current Students

George Adamopoulos

Research Intern

Ivan Anokhin

PhD - Université de Montréal

Co-supervisor :

Samira Ebrahimi Kahou

Rifat Arefin

PhD - Université de Montréal

Arjun Ashok

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

PhD - McGill University

Principal supervisor :

Blake Richards

Mohammad Javad Darvishi Bayazi

Amin Darabi

PhD - Université de Montréal

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Karim Jerbi

Wagner Drew

Master's Research - Concordia University

Principal supervisor :

Mirco Ravanelli

Mojtaba Faramarzi

PhD - Université de Montréal

Nadhir Hassen

Collaborating Alumni - Université de Montréal

Master's Research

Collaborating Alumni - Université de Montréal

Principal supervisor :

Ioannis Mitliagkas

Nizar Islah

PhD - Université de Montréal

Principal supervisor :

Eilif Benjamin Muller

PhD - Université de Montréal

Collaborating researcher

Zafir Khalid

Master's Research - Concordia University

Principal supervisor :

Master's Research - Université de Montréal

Neeraj Kumar

Collaborating Alumni - Université de Montréal

Gwen Legate

PhD - Concordia University

Principal supervisor :

Eugene Belilovsky

David Lemay

Master's Research - Université de Montréal

Jonathan Lim

Collaborating researcher

amin.mansouri@mila.quebec

Baihan Lin

Independent visiting researcher - Mt. Sinai

Master's Research - Université de Montréal

Collaborating researcher

Andrei Mircea

PhD - Université de Montréal

Master's Research - Université de Montréal

Diganta Misra

Master's Research - Université de Montréal

Timothy Nest

PhD - Université de Montréal

Co-supervisor :

Eilif Benjamin Muller

Mohammad Pezeshki

Collaborating researcher

Co-supervisor :

PhD - McGill University

Principal supervisor :

Pouya Bashivan

Mahta Ramezanian

Master's Research - Université de Montréal

Co-supervisor :

Guillaume Dumas

Roland Riachi

Collaborating researcher - Université de Montréal

Matthew Riemer

PhD - Université de Montréal

Alexis Roger

PhD - McGill University

Principal supervisor :

Blake Richards

Vaibhav Singh

PhD - Concordia University

Principal supervisor :

Eugene Belilovsky

Gopeshh Subbaraj

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

Publications

The Effect of Data Corruption on Multimodal Long Form Responses

Daniel Z Kaplan

Alexis Roger

Mohamed Osman

Despite significant progress, Vision-Language Models (VLMs) still struggle with hallucinations, especially in long-form responses. Existing … (see more)strategies have had limited successes in specific cases, and long-form generation remains problematic. In this work we attempt to establish the link between the data used to train the model and the hallucinations in the model's output. To this end, we examine hallucinations through data corruption. We develop a method to corrupt training data and then train models with this data to see the effect on performance. We will show that corrupting only a small portion of the long-form training data significantly impairs the performance of the model on long-form tasks, while leaving simpler tasks like visual question-answering and multiple choice relatively intact. All training code and models are released for reproducibility and future research.

2024-07-03

ICML.cc/2024/Workshop/FM-Wild (poster)

TriLM vs FloatLM: Ternary LLMs are more Performant than Quantized FP16 LLMs

Ayush Kaushal

Tejas Vaidhya

Tejas Pandey

Aaryan Bhagat

Ternary LLMs offer significantly better performance for their size (measured in bits) than the models trained and deployed in FP16/BF16. Giv… (see more)en the widespread usage of quantization before deployment and advancements in Post Training Quantization of LLMs, a pivotal question arises: do ternary LLMs indeed provide any discernible benefits? To address this, we first build an open family of pre-trained ternary Large Language Models (TriLM). Additionally, we include their counterparts pre-trained in FP16 (FloatLM) and quantized versions of FloatLM (QuantLM) with parameters across almost two orders of magnitude - from 99M to 3.9B parameters. We demonstrate that TriLMs with 3B+ parameters start to offer competitive performance compared to FloatLMs with the same parameter count, while providing significantly better performance for their size. Specifically, TriLM 3.9B, with less bits than FloatLM 830M, ranks between FloatLM 2.4B and FloatLM 3.9B when averaged across 6 popular commonsense and reasoning benchmarks. TriLMs also outperform quantized models, with TriLM 3.9B surpassing the larger QuantLM-3bit 3.9B. Furthermore, across knowledge-based benchmarks, TriLM maintains a superiority for its size, but lags for its parameter count. TriLM 3.9B falls halfway between FloatLM 1.5B and 2.4B, close to QuantLM-4bit 2.4B. To advance research on Ternary LMs, we open source over 500+ checkpoints across the model families.

2024-07-03

ICML.cc/2024/Workshop/FM-Wild (poster)

VFA: Vision Frequency Analysis of Foundation Models and Human

Mohammad Javad Darvishi Bayazi

Md Rifat Arefin

Jocelyn Faubert

Machine learning models often struggle with distribution shifts in real-world scenarios, whereas humans exhibit robust adaptation. Models th… (see more)at better align with human perception may achieve higher out-of-distribution generalization. In this study, we investigate how various characteristics of large-scale computer vision models influence their alignment with human capabilities and robustness. Our findings indicate that increasing model and data size, along with incorporating rich semantic information and multiple modalities, significantly enhances models' alignment with human perception and their overall robustness. Our empirical analysis demonstrates a strong correlation between out-of-distribution accuracy and human alignment.

2024-07-03

ICML.cc/2024/Workshop/FM-Wild (poster)

VFA: Vision Frequency Analysis of Foundation Models and Human

Mohammad Javad Darvishi Bayazi

Md Rifat Arefin

Jocelyn Faubert

2024-07-03

ICML.cc/2024/Workshop/FM-Wild (poster)

doi.org

Adversarial Training with Synthesized Data: A Path to Robust and Generalizable Neural Networks

Reza Bayat

Adversarial Training (AT) is a well-known framework designed to mitigate adversarial vulnerabilities in neural networks. Recent research ind… (see more)icates that incorporating adversarial examples (AEs) in training can enhance models' generalization capabilities. To understand the impact of AEs on learning dynamics, we study AT through the lens of sample difficulty methodologies. Our findings show that AT leads to more stable learning dynamics compared to Natural Training (NT), resulting in gradual performance improvements and less overconfident predictions. This suggests that AT steers training away from learning easy, perturbable spurious features toward more resilient and generalizable ones. However, a trade-off exists between adversarial robustness and generalization gains, due to robust overfitting, limiting practical deployment. To address this, we propose using synthesized data to bridge this gap. Our results demonstrate that AT benefits significantly from synthesized data, whereas NT does not, enhancing generalization without compromising robustness and offering new avenues for developing robust and generalizable models.

2024-06-28

ICML.cc/2024/Workshop/NextGenAISafety (poster)

Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques

Rishika Bhagwatkar

Shravan Nayak

Reza Bayat

Alexis Roger

Daniel Z Kaplan

Pouya Bashivan

Vision-Language Models (VLMs) have witnessed a surge in both research and real-world applications. However, as they becoming increasingly pr… (see more)evalent, ensuring their robustness against adversarial attacks is paramount. This work systematically investigates the impact of model design choices on the adversarial robustness of VLMs against image-based attacks. Additionally, we introduce novel, cost-effective approaches to enhance robustness through prompt formatting. By rephrasing questions and suggesting potential adversarial perturbations, we demonstrate substantial improvements in model robustness against strong image-based attacks such as Auto-PGD. Our findings provide important guidelines for developing more robust VLMs, particularly for deployment in safety-critical environments.

2024-06-28

ICML.cc/2024/Workshop/NextGenAISafety (poster)

doi.org

Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

Karolis Jucys

George Adamopoulos

Mehrab Hamidi

Stephanie Milani

Mohammad Reza Samsami

Artem Zholus

Sonia Joseph

Blake Richards

Özgür Şimşek

Understanding the mechanisms behind decisions taken by large foundation models in sequential tasks is critical to ensuring that such systems… (see more) operate transparently and safely. However, interpretability methods have not yet been applied extensively to large-scale agents based on reinforcement learning. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We try to illuminate its reasoning mechanisms by applying various interpretability techniques. First, we analyze the attention mechanism while the agent solves its training task --- crafting a diamond pickaxe. The agent seems to pay attention to the 4 last frames and several key-frames further back. This provides clues as to how it maintains coherence in the task that takes 3-10 minutes, despite the agent's short memory span of only six seconds. Second, we perform various interventions, which help us uncover a worrying case of goal misgeneralization: VPT mistakenly identifies a villager wearing brown clothes as a tree trunk and punches it to death, when positioned stationary under green tree leaves. We demonstrate similar misbehavior in a related agent (STEVE-1), which motivates the use of VPT as a model organism for large-scale vision-based agent interpretability.

2024-06-24

ICML.cc/2024/Workshop/MI (poster)

doi.org

Handling Delay in Reinforcement Learning Caused by Parallel Computations of Neurons

Ivan Anokhin

Rishav

Stephen Chung

Samira Ebrahimi Kahou

Biological neural networks operate in parallel, a feature that sets them apart from artificial neural networks and can significantly enhance… (see more) inference speed. However, this parallelism introduces challenges: when each neuron operates asynchronously with a fixed execution time, an

2024-06-19

ICML.cc/2024/Workshop/ARLET (poster)

Realtime Reinforcement Learning: Towards Rapid Asynchronous Deployment of Large Models

Matthew D Riemer

Gopeshh Subbaraj

Glen Berseth

Realtime environments change even as agents perform action inference and learning, thus requiring high interaction frequencies to effectivel… (see more)y minimize long-term regret. However, recent advances in machine learning involve larger neural networks with longer inference times, raising questions about their applicability in realtime systems where reaction time is crucial. We present an analysis of lower bounds on regret in realtime environments to show that minimizing long-term regret is generally impossible within the typical sequential interaction and learning paradigm, but often becomes possible when sufficient asynchronous compute is available. We propose novel algorithms for staggering asynchronous inference processes to ensure that actions are taken at consistent time intervals, and demonstrate that use of models with high action inference times is only constrained by the environment's effective stochasticity over the inference horizon, and not by action frequency. Our analysis shows that the number of inference processes needed scales linearly with increasing inference times while enabling use of models that are multiple orders of magnitude larger than existing approaches when learning from a realtime simulation of Game Boy games such as Pokemon and Tetris.

2024-06-19

ICML.cc/2024/Workshop/ARLET (poster)

Scalable Approaches for a Theory of Many Minds

Maximilian Puelma Touzel

Amin Memarian

Matthew D Riemer

Andrei Mircea

Andrew Robert Williams

Elin Ahlstrand

Lucas Lehnert

Rupali Bhati

Guillaume Dumas

A major challenge as we move towards building agents for real-world problems, which could involve a massive number of human and/or machine a… (see more)gents, is that we must learn to reason about the behavior of these many other agents. In this paper, we consider the problem of scaling a predictive Theory of Mind (ToM) model to a very large number of interacting agents with a fixed computational budget. Motivated by the limited diversity of agent types, existing approaches to scalable TOM learn versatile single-agent representations for quickly adapting to new agents encountered sequentially. We consider the more general setting that many agents are observed in parallel and formulate the corresponding Theory of Many Minds (ToMM) problem of estimating the joint policy. We frame the scaling behavior of solutions in terms of parameter sharing schemes and in particular propose two parameter-free architectural features that endow models with the ability to exploit action correlations: encoding a multi-agent context, and decoding through an abstracted joint action space. The increased predictive capabilities that have come with foundation models have made it easier to imagine the possibility of using these models to make simulations that imitate the behavior of many agents within complex real-world systems. Being able to perform these simulations in a general-purpose way would not only help make more capable agents, it also would be a very useful capability for applications in social science, political science, and economics.

2024-06-18

ICML.cc/2024/Workshop/Agentic_Markets (poster)

Is a Good Description Worth a Thousand Pictures? Reducing Multimodal Alignment to Text-Based, Unimodal Alignment

Amin Memarian

Touraj Laleh

Ardavan S. Nobandegani

Generative AI systems (ChatGPT, Llama, etc.) are increasingly adopted across a range of high-stake domains, including healthcare and crimina… (see more)l justice system. This rapid adoption indeed raises moral and ethical concerns. The emerging field of AI alignment aims to make AI systems that respect human values. In this work, we focus on evaluating the ethics of multimodal AI systems involving both text and images --- a relatively under-explored area, as most alignment work is currently focused on language models. Specifically, here we investigate whether the multimodal alignment problem (i.e., the problem of aligning a multimodal system) could be effectively reduced to the (text-based) unimodal alignment problem, wherein a language model would make a moral judgment purely based on a description of an image. Focusing on GPT-4 and LLaVA as two prominent examples of multimodal systems, here we demonstrate, rather surprisingly, that this reduction can be achieved with a relatively small loss in moral judgment performance in the case of LLaVa, and virtually no loss in the case of GPT-4.

2024-06-17

ICML.cc/2024/Workshop/MFHAIA (poster)

Lost in Translation: The Algorithmic Gap Between LMs and the Brain

Tosato Tommaso

Tikeng Notsawo Pascal Junior

Helbling Saskia

Guillaume Dumas

Language Models (LMs) have achieved impressive performance on various linguistic tasks, but their relationship to human language processing … (see more)in the brain remains unclear. This paper examines the gaps and overlaps between LMs and the brain at different levels of analysis, emphasizing the importance of looking beyond input-output behavior to examine and compare the internal processes of these systems. We discuss how insights from neuroscience, such as sparsity, modularity, internal states, and interactive learning, can inform the development of more biologically plausible language models. Furthermore, we explore the role of scaling laws in bridging the gap between LMs and human cognition, highlighting the need for efficiency constraints analogous to those in biological systems. By developing LMs that more closely mimic brain function, we aim to advance both artificial intelligence and our understanding of human cognition.

2024-06-17

ICML.cc/2024/Workshop/LLMs_and_Cognition (poster)