Irina Rish

Biographie

Irina Rish est professeure titulaire à l'Université de Montréal (UdeM), où elle dirige le Laboratoire d'IA autonome. Membre du corps professoral de Mila – Institut québécois d’intelligence artificielle, elle est titulaire d'une chaire d'excellence en recherche du Canada (CERC) et d'une chaire en IA Canada-CIFAR. Irina dirige le projet INCITE du ministère américain de l'Environnement au sujet des modèles de fondation évolutifs sur les superordinateurs Summit et Frontier à l'Oak Ridge Leadership Computing Facility (OLCF). Elle est cofondatrice et directrice scientifique de Nolano.ai.

Ses recherches actuelles portent sur les lois de mise à l'échelle neuronale et les comportements émergents (capacités et alignement) dans les modèles de fondation, ainsi que sur l'apprentissage continu, la généralisation hors distribution et la robustesse. Avant de se joindre à l'UdeM en 2019, Irina était chercheuse au Centre de recherche IBM Thomas J. Watson, où elle a travaillé sur divers projets à l'intersection des neurosciences et de l'IA, et dirigé le défi NeuroAI. Elle a reçu plusieurs prix IBM : ceux de l’excellence et de l’innovation exceptionnelle (2018), celui de la réalisation technique exceptionnelle (2017), et celui de l’accomplissement en recherche (2009). Elle détient 64 brevets et a écrit plus de 120 articles de recherche, plusieurs chapitres de livres, trois livres publiés et une monographie sur la modélisation éparse.

Étudiants actuels

Stagiaire de recherche

Ivan Anokhin

Doctorat - UdeM

Co-superviseur⋅e :

Samira Ebrahimi Kahou

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Doctorat - McGill

Superviseur⋅e principal⋅e :

Blake Richards

Mohammad Javad Darvishi Bayazi

Amin Darabi

Doctorat - UdeM

Collaborateur·rice de recherche - UdeM

Wagner Drew

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Visiteur de recherche indépendant - -

Nadhir Hassen

Collaborateur·rice alumni - UdeM

Maîtrise recherche

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Ioannis Mitliagkas

Nizar Islah

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Maîtrise recherche - UdeM

Neeraj Kumar

Collaborateur·rice alumni - UdeM

Gwen Legate

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Eugene Belilovsky

David Lemay

Maîtrise recherche - UdeM

Jonathan Lim

Collaborateur·rice de recherche

Maîtrise recherche - UdeM

Collaborateur·rice de recherche

Doctorat - UdeM

Collaborateur·rice de recherche - UdeM

Gabriela Moisescu-Pareja

Collaborateur·rice de recherche - McGill

Superviseur⋅e principal⋅e :

Doina Precup

Timothy Nest

Doctorat - UdeM

Co-superviseur⋅e :

Eilif B. Muller

Mohammad Pezeshki

Collaborateur·rice de recherche

Co-superviseur⋅e :

Doctorat - McGill

Superviseur⋅e principal⋅e :

Pouya Bashivan

Mahta Ramezanian

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Guillaume Dumas

Matthew Riemer

Doctorat - UdeM

Alexis Roger

Doctorat - McGill

Superviseur⋅e principal⋅e :

Blake Richards

Munish Sathish Kumar

Collaborateur·rice de recherche

Vaibhav Singh

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Collaborateur·rice alumni - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

He Zhu

Doctorat - McGill

Publications

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Quentin Gregory Anthony

Timothee LESORT

Eugene Belilovsky

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes ava… (voir plus)ilable. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (English

2024-03-13

ArXiv (prépublication)

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Quentin Gregory Anthony

Timothee LESORT

Eugene Belilovsky

2024-03-13

ArXiv (prépublication)

Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

Tikeng Notsawo Pascal Junior

Pascal Notsawo

2024-03-04

ICLR.cc/2024/Workshop/ME-FoMo (poster)

Jean-christophe Gagnon-audet

Effective Latent Differential Equation Models via Attention and Multiple Shooting

Germán Abrevaya

Mahta Ramezanian-Panahi

Pablo Polosecki

Silvina Ponce Dawson

Guillermo Cecchi

Guillaume Dumas

2024-02-26

TMLR (accepté)

Mohammad Javad Darvishi Bayazi

Amplifying Pathological Detection in EEG Signaling Pathways through Cross-Dataset Transfer Learning

Mohammad S. Ghaemi

Timothee LESORT

Md Rifat Arefin

Jocelyn Faubert

2024-02-01

Computers in Biology and Medicine (publié)

Dance of the Neurons: Unraveling Sex from Brain Signals (short paper).

Mohammad-Javad Darvishi Bayazi

Mohammad S. Ghaemi

Jocelyn Faubert

2024-01-01

ML4CMH@AAAI (publié)

dblp.uni-trier.de

Improving Adversarial Robustness in Vision-Language Models with Architecture and Prompt Design.

2024-01-01

EMNLP (Findings) (publié)

Towards Machines that Trust: AI Agents Learn to Trust in the Trust Game

Ardavan S. Nobandegani

Thomas Shultz

Widely considered a cornerstone of human morality, trust shapes many aspects of human social interactions. In this work, we present a theore… (voir plus)tical analysis of the

2023-12-20

ArXiv (prépublication)

Challenging Common Assumptions about Catastrophic Forgetting and Knowledge Accumulation

Timothee LESORT

Pau Rodriguez

Md Rifat Arefin

2023-11-20

Proceedings of The 2nd Conference on Lifelong Learning Agents (publié)

proceedings.mlr.press

Lag-Llama: Towards Foundation Models for Time Series Forecasting

Kashif Rasul

Andrew Robert Williams

Arian Khorasani

Rishika Bhagwatkar

Marin Biloš

Hena Ghonia

Nadhir Hassen

Anderson Schneider

Sahil Garg

Alexandre Drouin

Nicolas Chapados

Yuriy Nevmyvaka

Aiming to build foundation models for time-series forecasting and study their scaling behavior, we present here our work-in-progress on Lag-… (voir plus)Llama, a general-purpose univariate probabilistic time-series forecasting model trained on a large collection of time-series data. The model shows good zero-shot prediction capabilities on unseen "out-of-distribution" time-series datasets, outperforming supervised baselines. We use smoothly broken power-laws to fit and predict model scaling behavior. The open source code is made available at https://github.com/kashif/pytorch-transformer-ts.

2023-11-01

NeurIPS.cc/2023/Workshop/R0-FoMo (poster)

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

Kashif Rasul

Andrew Robert Williams

Arian Khorasani

Rishika Bhagwatkar

Marin Bilovs

Hena Ghonia

Nadhir Hassen

Anderson Schneider

Sahil Garg

Alexandre Drouin

Nicolas Chapados

Yuriy Nevmyvaka

2023-10-12

ArXiv (prépublication)

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

Kashif Rasul

Andrew Robert Williams

Arian Khorasani

Rishika Bhagwatkar

Marin Bilovs

Hena Ghonia

N. Hassen

Anderson Schneider

Sahil Garg

Alexandre Drouin

Nicolas Chapados

Yuriy Nevmyvaka

Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-sho… (voir plus)t and few-shot generalization. However, despite the success of foundation models in modalities such as natural language processing and computer vision, the development of foundation models for time series forecasting has lagged behind. We present Lag-Llama, a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities compared to a wide range of forecasting models on downstream datasets across domains. Moreover, when fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance, outperforming prior deep learning approaches, emerging as the best general-purpose model on average. Lag-Llama serves as a strong contender to the current state-of-art in time series forecasting and paves the way for future advancements in foundation models tailored to time series data.

2023-10-12

ArXiv (prépublication)