Irina Rish

Biography

Irina Rish is a full professor at the Université de Montréal (UdeM), where she leads the Autonomous AI Lab, and a core academic member of Mila – Quebec Artificial Intelligence Institute.

In addition to holding a Canada Excellence Research Chair (CERC) and a CIFAR Chair, she leads the U.S. Department of Energy’s INCITE project on Scalable Foundation Models on Summit & Frontier supercomputers at the Oak Ridge Leadership Computing Facility. She co-founded and serves as CSO of Nolano.ai.

Rish’s current research interests include neural scaling laws and emergent behaviors (capabilities and alignment) in foundation models, as well as continual learning, out-of-distribution generalization and robustness.

Before joining UdeM in 2019, she was a research scientist at the IBM T.J. Watson Research Center, where she worked on various projects at the intersection of neuroscience and AI, and led the Neuro-AI challenge. She was awarded the IBM Eminence & Excellence Award and IBM Outstanding Innovation Award (2018), IBM Outstanding Technical Achievement Award (2017) and IBM Research Accomplishment Award (2009).

She holds 64 patents and has published 120 research papers, several book chapters, three edited books and a monograph on sparse modeling.

Current Students

George Adamopoulos

Research Intern

Ivan Anokhin

PhD - Université de Montréal

Co-supervisor :

Samira Ebrahimi Kahou

Rifat Arefin

PhD - Université de Montréal

Arjun Ashok

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

PhD - McGill University

Principal supervisor :

Blake Richards

Mohammad Javad Darvishi Bayazi

Amin Darabi

PhD - Université de Montréal

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Karim Jerbi

Wagner Drew

Master's Research - Concordia University

Principal supervisor :

Mirco Ravanelli

Mojtaba Faramarzi

PhD - Université de Montréal

Parviz Haggi Mani

Independent visiting researcher - -

parviz.haggi@gmail.com

Nadhir Hassen

Collaborating Alumni - Université de Montréal

Master's Research

Collaborating Alumni - Université de Montréal

Principal supervisor :

Ioannis Mitliagkas

Nizar Islah

PhD - Université de Montréal

Principal supervisor :

Eilif Benjamin Muller

PhD - Université de Montréal

Collaborating researcher

Zafir Khalid

Master's Research - Concordia University

Principal supervisor :

Master's Research - Université de Montréal

Neeraj Kumar

Collaborating Alumni - Université de Montréal

Gwen Legate

PhD - Concordia University

Principal supervisor :

David Lemay

Master's Research - Université de Montréal

amin.mansouri@mila.quebec

Jonathan Lim

Collaborating researcher

Master's Research - Université de Montréal

Collaborating researcher

Andrei Mircea

PhD - Université de Montréal

Collaborating researcher - Université de Montréal

Gabriela Moisescu-Pareja

Collaborating researcher - McGill University

Principal supervisor :

Doina Precup

Timothy Nest

PhD - Université de Montréal

Co-supervisor :

Eilif Benjamin Muller

Mohammad Pezeshki

Collaborating researcher

Co-supervisor :

PhD - McGill University

Principal supervisor :

Pouya Bashivan

Mahta Ramezanian

Master's Research - Université de Montréal

Co-supervisor :

Guillaume Dumas

Roland Riachi

Collaborating researcher - Université de Montréal

Matthew Riemer

PhD - Université de Montréal

Alexis Roger

PhD - McGill University

Principal supervisor :

Blake Richards

Vaibhav Singh

PhD - Concordia University

Principal supervisor :

Gopeshh Subbaraj

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Collaborating Alumni - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Master's Research - Université de Montréal

Publications

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim

Benjamin Therien

Kshitij Gupta

Mats Leon Richter

Quentin Anthony

Timothee LESORT

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes ava… (see more)ilable. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (English

2024-03-13

ArXiv (preprint)

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim

Benjamin Therien

Kshitij Gupta

Mats Leon Richter

Quentin Gregory Anthony

Timothee LESORT

2024-03-13

ArXiv (preprint)

Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

Tikeng Notsawo Pascal Junior

Pascal Notsawo

2024-03-04

ICLR.cc/2024/Workshop/ME-FoMo (poster)

Effective Latent Differential Equation Models via Attention and Multiple Shooting

Germán Abrevaya

Mahta Ramezanian-Panahi

Jean-Christophe Gagnon-Audet

Pablo Polosecki

Silvina Ponce Dawson

Guillermo Cecchi

Guillaume Dumas

2024-02-26

TMLR (accepted)

Mohammad Javad Darvishi Bayazi

Amplifying Pathological Detection in EEG Signaling Pathways through Cross-Dataset Transfer Learning

Mohammad S. Ghaemi

Timothee LESORT

Md Rifat Arefin

Jocelyn Faubert

2024-02-01

Computers in Biology and Medicine (published)

Dance of the Neurons: Unraveling Sex from Brain Signals (short paper).

Mohammad-Javad Darvishi Bayazi

Mohammad S. Ghaemi

Jocelyn Faubert

2024-01-01

ML4CMH@AAAI (published)

dblp.uni-trier.de

Improving Adversarial Robustness in Vision-Language Models with Architecture and Prompt Design.

2024-01-01

EMNLP (Findings) (published)

Towards Machines that Trust: AI Agents Learn to Trust in the Trust Game

Ardavan S. Nobandegani

Thomas Shultz

Widely considered a cornerstone of human morality, trust shapes many aspects of human social interactions. In this work, we present a theore… (see more)tical analysis of the

2023-12-20

ArXiv (preprint)

Challenging Common Assumptions about Catastrophic Forgetting and Knowledge Accumulation

Timothee LESORT

Pau Rodriguez

Md Rifat Arefin

2023-11-20

Proceedings of The 2nd Conference on Lifelong Learning Agents (published)

proceedings.mlr.press

Lag-Llama: Towards Foundation Models for Time Series Forecasting

Kashif Rasul

Arjun Ashok

Andrew Robert Williams

Arian Khorasani

George Adamopoulos

Rishika Bhagwatkar

Marin Biloš

Hena Ghonia

Nadhir Hassen

Anderson Schneider

Sahil Garg

Alexandre Drouin

Nicolas Chapados

Yuriy Nevmyvaka

Aiming to build foundation models for time-series forecasting and study their scaling behavior, we present here our work-in-progress on Lag-… (see more)Llama, a general-purpose univariate probabilistic time-series forecasting model trained on a large collection of time-series data. The model shows good zero-shot prediction capabilities on unseen "out-of-distribution" time-series datasets, outperforming supervised baselines. We use smoothly broken power-laws to fit and predict model scaling behavior. The open source code is made available at https://github.com/kashif/pytorch-transformer-ts.

2023-11-01

NeurIPS.cc/2023/Workshop/R0-FoMo (poster)

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

Kashif Rasul

Arjun Ashok

Andrew Robert Williams

Arian Khorasani

George Adamopoulos

Rishika Bhagwatkar

Marin Bilovs

Hena Ghonia

N. Hassen

Anderson Schneider

Sahil Garg

Alexandre Drouin

Nicolas Chapados

Yuriy Nevmyvaka

Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-sho… (see more)t and few-shot generalization. However, despite the success of foundation models in modalities such as natural language processing and computer vision, the development of foundation models for time series forecasting has lagged behind. We present Lag-Llama, a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities compared to a wide range of forecasting models on downstream datasets across domains. Moreover, when fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance, outperforming prior deep learning approaches, emerging as the best general-purpose model on average. Lag-Llama serves as a strong contender to the current state-of-art in time series forecasting and paves the way for future advancements in foundation models tailored to time series data.

2023-10-12

ArXiv (preprint)