Irina Rish

Biographie

Irina Rish est professeure titulaire à l'Université de Montréal (UdeM), où elle dirige le Laboratoire d'IA autonome. Membre du corps professoral de Mila – Institut québécois d’intelligence artificielle, elle est titulaire d'une chaire d'excellence en recherche du Canada (CERC) et d'une chaire en IA Canada-CIFAR. Irina dirige le projet INCITE du ministère américain de l'Environnement au sujet des modèles de fondation évolutifs sur les superordinateurs Summit et Frontier à l'Oak Ridge Leadership Computing Facility (OLCF). Elle est cofondatrice et directrice scientifique de Nolano.ai.

Ses recherches actuelles portent sur les lois de mise à l'échelle neuronale et les comportements émergents (capacités et alignement) dans les modèles de fondation, ainsi que sur l'apprentissage continu, la généralisation hors distribution et la robustesse. Avant de se joindre à l'UdeM en 2019, Irina était chercheuse au Centre de recherche IBM Thomas J. Watson, où elle a travaillé sur divers projets à l'intersection des neurosciences et de l'IA, et dirigé le défi NeuroAI. Elle a reçu plusieurs prix IBM : ceux de l’excellence et de l’innovation exceptionnelle (2018), celui de la réalisation technique exceptionnelle (2017), et celui de l’accomplissement en recherche (2009). Elle détient 64 brevets et a écrit plus de 120 articles de recherche, plusieurs chapitres de livres, trois livres publiés et une monographie sur la modélisation éparse.

Étudiants actuels

George Adamopoulos

Stagiaire de recherche

Ivan Anokhin

Doctorat - UdeM

Co-superviseur⋅e :

Samira Ebrahimi Kahou

Doctorat - UdeM

Arjun Ashok

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Doctorat - McGill

Superviseur⋅e principal⋅e :

Blake Richards

Mohammad Javad Darvishi Bayazi

Amin Darabi

Doctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Karim Jerbi

Wagner Drew

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Visiteur de recherche indépendant - -

parviz.haggi@gmail.com

Nadhir Hassen

Collaborateur·rice alumni - UdeM

Maîtrise recherche

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Ioannis Mitliagkas

Nizar Islah

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Eilif Benjamin Muller

Doctorat - UdeM

Collaborateur·rice de recherche

Zafir Khalid

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Maîtrise recherche - UdeM

Neeraj Kumar

Collaborateur·rice alumni - UdeM

Gwen Legate

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Eugene Belilovsky

David Lemay

Maîtrise recherche - UdeM

Jonathan Lim

Collaborateur·rice de recherche

amin.mansouri@mila.quebec

Baihan Lin

Visiteur de recherche indépendant - Mt. Sinai

Maîtrise recherche - UdeM

Collaborateur·rice de recherche

Doctorat - UdeM

Maîtrise recherche - UdeM

Diganta Misra

Maîtrise recherche - UdeM

Gabriela Moisescu-Pareja

Collaborateur·rice de recherche - McGill

Superviseur⋅e principal⋅e :

Doina Precup

Timothy Nest

Doctorat - UdeM

Co-superviseur⋅e :

Eilif Benjamin Muller

Mohammad Pezeshki

Collaborateur·rice de recherche

Co-superviseur⋅e :

Doctorat - McGill

Superviseur⋅e principal⋅e :

Pouya Bashivan

Mahta Ramezanian

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Guillaume Dumas

Roland Riachi

Collaborateur·rice de recherche - UdeM

Matthew Riemer

Doctorat - UdeM

Alexis Roger

Doctorat - McGill

Superviseur⋅e principal⋅e :

Blake Richards

Vaibhav Singh

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Collaborateur·rice alumni - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Publications

Maximum State Entropy Exploration using Predecessor and Successor Representations

Arnav Kumar Jain

Lucas Lehnert

Glen Berseth

Animals have a developed ability to explore that aids them in important tasks such as locating food, exploring for shelter, and finding misp… (voir plus)laced items. These exploration skills necessarily track where they have been so that they can plan for finding items with relative efficiency. Contemporary exploration algorithms often learn a less efficient exploration strategy because they either condition only on the current state or simply rely on making random open-loop exploratory moves. In this work, we propose

WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series

Jean-Christophe Gagnon-Audet

Kartik Ahuja

Mohammad Javad Darvishi Bayazi

Pooneh Mousavi

Guillaume Dumas

2023-09-01

TMLR (accepté)

Beyond performance: the role of task demand, effort, and individual differences in ab initio pilots

Mohammad Javad Darvishi Bayazi

Andrew Law

Sergio Mejia Romero

Sion Jennings

Jocelyn Faubert

2023-08-28

Scientific Reports (publié)

Neural efficiency in an aviation task with different levels of difficulty: Assessing different biometrics during a performance task

Mohammad Javad Darvishi Bayazi

Andrew Law

Sergio Mejia Romero

Sion Jennings

Jocelyn Faubert

2023-08-01

Journal of Vision (publié)

Cognitive Models as Simulators: Using Cognitive Models to Tap into Implicit Human Feedback

Ardavan S. Nobandegani

Thomas Shultz

2023-06-20

ICML.cc/2023/Workshop/ILHF (publié)

Continual Pre-Training of Large Language Models: How to (re)warm your model?

Kshitij Gupta

Benjamin Thérien

Adam Ibrahim

Mats Leon Richter

Quentin Gregory Anthony

Eugene Belilovsky

Timothee LESORT

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes a… (voir plus)vailable. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch

2023-06-20

ICML.cc/2023/Workshop/ES-FoMO (poster)

Towards Out-of-Distribution Adversarial Robustness

Adam Ibrahim

Charles Guille-Escuret

Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fail… (voir plus)s to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different

2023-06-20

ICML.cc/2023/Workshop/AdvML-Frontiers (publié)

Dialogue System with Missing Observation

Djallel Bouneffouf

Mayank Agarwal

Within the domain of dialogue, the ability to orchestrate multiple independently trained dialogue agents to create a unified system is of pa… (voir plus)rticular importance. Where we define orchestration as the task of selecting a subset of skills which most appropriately answer a user input using features extracted from both the user input and the individual skills. In this work, we study the task of online dialogue orchestration where the user feedback associated with the dialogue agent may not always be observed. In order to address the missing feedback setting, we propose to combine the attentive contextual bandit approach with an unsupervised learning mechanism such as clustering. By leveraging clustering to estimate missing reward, we are able to learn from each incoming event, even those with missing rewards. Promising empirical results are obtained on proprietary conversational datasets.

2023-06-04

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (publié)

Estimating individual minimum calibration for deep-learning with predictive performance recovery: An example case of gait surface classification from wearable sensor gait data.

Guillaume Lam

P. Dixon

2023-06-01

Journal of Biomechanics (publié)

Towards ethical multimodal systems

Alexis Roger

Esma Aimeur

2023-04-26

ArXiv (prépublication)

arxiv.org

A Survey on Compositional Generalization in Applications

Baihan Lin

Djallel Bouneffouf

2023-02-02

ArXiv (prépublication)

arxiv.org

Broken Neural Scaling Laws

Ethan Caballero

Kshitij Gupta

David Scott Krueger

We present a smoothly broken power law functional form (that we refer to as a Broken Neural Scaling Law (BNSL)) that accurately models&extra… (voir plus)polates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures&for each of various tasks within a large&diverse set of upstream&downstream tasks, in zero-shot, prompted,&finetuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, AI capabilities, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, OOD detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, fairness, molecules, computer programming/coding, math word problems,"emergent phase transitions", arithmetic, supervised learning, unsupervised/self-supervised learning,&reinforcement learning (single agent&multi-agent). When compared to other functional forms for neural scaling, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models&extrapolates scaling behavior that other functional forms are incapable of expressing such as the nonmonotonic transitions present in the scaling behavior of phenomena such as double descent&the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws

2023-02-01

ICLR.cc/2023/Conference (poster)