Portrait de Irina Rish

Irina Rish

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure titulaire, Université de Montréal, Département d'informatique et de recherche opérationnelle

Biographie

Irina Rish est professeure titulaire à l'Université de Montréal (UdeM), où elle dirige le Laboratoire d'IA autonome. Membre du corps professoral de Mila – Institut québécois d’intelligence artificielle, elle est titulaire d'une chaire d'excellence en recherche du Canada (CERC) et d'une chaire en IA Canada-CIFAR. Irina dirige le projet INCITE du ministère américain de l'Environnement au sujet des modèles de fondation évolutifs sur les superordinateurs Summit et Frontier à l'Oak Ridge Leadership Computing Facility (OLCF). Elle est cofondatrice et directrice scientifique de Nolano.ai.

Ses recherches actuelles portent sur les lois de mise à l'échelle neuronale et les comportements émergents (capacités et alignement) dans les modèles de fondation, ainsi que sur l'apprentissage continu, la généralisation hors distribution et la robustesse. Avant de se joindre à l'UdeM en 2019, Irina était chercheuse au Centre de recherche IBM Thomas J. Watson, où elle a travaillé sur divers projets à l'intersection des neurosciences et de l'IA, et dirigé le défi NeuroAI. Elle a reçu plusieurs prix IBM : ceux de l’excellence et de l’innovation exceptionnelle (2018), celui de la réalisation technique exceptionnelle (2017), et celui de l’accomplissement en recherche (2009). Elle détient 64 brevets et a écrit plus de 120 articles de recherche, plusieurs chapitres de livres, trois livres publiés et une monographie sur la modélisation éparse.

Étudiants actuels

Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Maîtrise recherche - Université de Montréal
Doctorat - Université de Montréal
Visiteur de recherche indépendant
Maîtrise recherche - Université de Montréal
Maîtrise recherche - Université de Montréal
Doctorat - Université de Montréal
Co-superviseur⋅e :
Collaborateur·rice de recherche
Doctorat - Université de Montréal
Co-superviseur⋅e :
Collaborateur·rice de recherche - Université de Montréal
Stagiaire de recherche - Technical University of Munich
Maîtrise recherche - Université de Montréal
Maîtrise recherche - Université de Montréal
Doctorat - McGill University
Superviseur⋅e principal⋅e :
Visiteur de recherche indépendant - Université de Montréal
Co-superviseur⋅e :
Doctorat - Concordia University
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Co-superviseur⋅e :
Collaborateur·rice alumni - Université de Montréal
Co-superviseur⋅e :
Maîtrise recherche - Université de Montréal
Co-superviseur⋅e :
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Collaborateur·rice de recherche
Doctorat - Université de Montréal
Doctorat - McGill University
Superviseur⋅e principal⋅e :
Stagiaire de recherche - Université de Montréal
Maîtrise professionnelle - Université de Montréal
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Stagiaire de recherche - Université de Montréal
Collaborateur·rice de recherche - Politecnico di Milano
Doctorat - Université de Montréal
Co-superviseur⋅e :
Maîtrise recherche - Université de Montréal
Maîtrise recherche - Université de Montréal
Co-superviseur⋅e :
Maîtrise recherche - Université de Montréal
Collaborateur·rice de recherche - Université de Montréal
Doctorat - Université de Montréal
Maîtrise recherche - Université de Montréal
Maîtrise recherche - Université de Montréal
Doctorat - Université de Montréal
Co-superviseur⋅e :
Doctorat - Concordia University
Superviseur⋅e principal⋅e :
Postdoctorat - Université de Montréal
Superviseur⋅e principal⋅e :

Publications

A Survey on Compositional Generalization in Applications
Baihan Lin
Djallel Bouneffouf
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
We present a smoothly broken power law functional form (that we refer to as a Broken Neural Scaling Law (BNSL)) that accurately models&extra… (voir plus)polates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures&for each of various tasks within a large&diverse set of upstream&downstream tasks, in zero-shot, prompted,&finetuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, AI capabilities, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, OOD detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, fairness, molecules, computer programming/coding, math word problems,"emergent phase transitions", arithmetic, supervised learning, unsupervised/self-supervised learning,&reinforcement learning (single agent&multi-agent). When compared to other functional forms for neural scaling, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models&extrapolates scaling behavior that other functional forms are incapable of expressing such as the nonmonotonic transitions present in the scaling behavior of phenomena such as double descent&the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws
AI Agents Learn to Trust
Ardavan S. Nobandegani
T. Shultz
GOKU-UI: Ubiquitous Inference through Attention and Multiple Shooting for Continuous-time Generative Models
Germán Abrevaya
Mahta Ramezanian-Panahi
Jean-Christophe Gagnon-Audet
Pablo Polosecki
Silvina Ponce Dawson
Guillermo Cecchi
Scientific Machine Learning (SciML) is a burgeoning field that synergistically combines domain-aware and interpretable models with agnosti… (voir plus)c machine learning techniques. In this work, we introduce GOKU-UI, an evolution of the SciML generative model GOKU-nets. The GOKU-UI broadens the original model’s spectrum to incorporate other classes of differential equations, such as Stochastic Differential Equations (SDEs), and integrates a distributed, i.e. ubiquitous, inference through attention mechanisms and a novel multiple shooting training strategy in the latent space. These enhancements have led to a significant increase in its performance in both reconstruction and forecast tasks, as demonstrated by our evaluation of simulated and empirical data. Specifically, GOKU-UI outperformed all baseline models on synthetic datasets even with a training set 32-fold smaller, underscoring its remarkable data efficiency. Furthermore, when applied to empirical human brain data, while incorporating stochastic Stuart-Landau
Lag-Llama: Towards Foundation Models for Time Series Forecasting
Kashif Rasul
Arjun Ashok
Andrew Robert Williams
Arian Khorasani
George Adamopoulos
Rishika Bhagwatkar
Marin Biloš
Hena Ghonia
N. Hassen
Anderson Schneider
Sahil Garg
Yuriy Nevmyvaka
Aiming to build foundation models for time-series forecasting and study their scaling behavior, we present here our work-in-progress on Lag-… (voir plus)Llama , a general-purpose univariate probabilistic time-series forecasting model trained on a large collection of time-series data. The model shows good zero-shot prediction capabilities on unseen “out-of-distribution” time-series datasets, outperforming supervised baselines. We use smoothly broken power-laws [7] to fit and predict model scaling behavior. The open source code is made available at https://github
Towards Continual Reinforcement Learning: A Review and Perspectives
Continual Learning with Foundation Models: An Empirical Study of Latent Replay
Oleksiy Ostapenko
Timothee LESORT
Pau Rodriguez
Md Rifat Arefin
Arthur Douillard
Rapid development of large-scale pre-training has resulted in foundation models that can act as effective feature extractors on a variety of… (voir plus) downstream tasks and domains. Motivated by this, we study the efficacy of pre-trained vision models as a foundation for downstream continual learning (CL) scenarios. Our goal is twofold. First, we want to understand the compute-accuracy trade-off between CL in the raw-data space and in the latent space of pre-trained encoders. Second, we investigate how the characteristics of the encoder, the pre-training algorithm and data, as well as of the resulting latent space affect CL performance. For this, we compare the efficacy of various pre-trained models in large-scale benchmarking scenarios with a vanilla replay setting applied in the latent and in the raw-data space. Notably, this study shows how transfer, forgetting, task similarity and learning are dependent on the input data characteristics and not necessarily on the CL algorithms. First, we show that under some circumstances reasonable CL performance can readily be achieved with a non-parametric classifier at negligible compute. We then show how models pre-trained on broader data result in better performance for various replay sizes. We explain this with representational similarity and transfer properties of these representations. Finally, we show the effectiveness of self-supervised pre-training for downstream domains that are out-of-distribution as compared to the pre-training domain. We point out and validate several research directions that can further increase the efficacy of latent CL including representation ensembling. The diverse set of datasets used in this study can serve as a compute-efficient playground for further CL research. We will publish the code.
APP: Anytime Progressive Pruning
Diganta Misra
Bharat Runwal
Tianlong Chen
Zhangyang Wang
With the latest advances in deep learning, several methods have been investigated for optimal learning settings in scenarios where the data … (voir plus)stream is continuous over time. However, training sparse networks in such settings has often been overlooked. In this paper, we explore the problem of training a neural network with a target sparsity in a particular case of online learning: the anytime learning at macroscale paradigm (ALMA). We propose a novel way of progressive pruning, referred to as \textit{Anytime Progressive Pruning} (APP); the proposed approach significantly outperforms the baseline dense and Anytime OSP models across multiple architectures and datasets under short, moderate, and long-sequence training. Our method, for example, shows an improvement in accuracy of
Knowledge Distillation for Federated Learning: a Practical Guide
Alessio Mora
Irene Tenison
Paolo Bellavista
Federated Learning (FL) enables the training of Deep Learning models without centrally collecting possibly sensitive raw data. This paves th… (voir plus)e way for stronger privacy guarantees when building predictive models. The most used algorithms for FL are parameter-averaging based schemes (e.g., Federated Averaging) that, however, have well known limits: (i) Clients must implement the same model architecture; (ii) Transmitting model weights and model updates implies high communication cost, which scales up with the number of model parameters; (iii) In presence of non-IID data distributions, parameter-averaging aggregation schemes perform poorly due to client model drifts. Federated adaptations of regular Knowledge Distillation (KD) can solve and/or mitigate the weaknesses of parameter-averaging FL algorithms while possibly introducing other trade-offs. In this article, we provide a review of KD-based algorithms tailored for specific FL issues.
Aligning MAGMA by Few-Shot Learning and Finetuning
Jean-Charles Layoun
Alexis Roger
Generative Models of Brain Dynamics
Mahta Ramezanian-Panahi
Germán Abrevaya
Jean-Christophe Gagnon-Audet
Vikram Voleti
Challenging Common Assumptions about Catastrophic Forgetting
Timothee LESORT
Oleksiy Ostapenko
Pau Rodriguez
Md Rifat Arefin
Diganta Misra
Building learning agents that can progressively learn and accumulate knowledge is the core goal of the continual learning (CL) research fiel… (voir plus)d. Unfortunately, training a model on new data usually compromises the performance on past data. In the CL literature, this effect is referred to as catastrophic forgetting (CF). CF has been largely studied, and a plethora of methods have been proposed to address it on short sequences of non-overlapping tasks. In such setups, CF always leads to a quick and significant drop in performance in past tasks. Nevertheless, despite CF, recent work showed that SGD training on linear models accumulates knowledge in a CL regression setup. This phenomenon becomes especially visible when tasks reoccur. We might then wonder if DNNs trained with SGD or any standard gradient-based optimization accumulate knowledge in such a way. Such phenomena would have interesting consequences for applying DNNs to real continual scenarios. Indeed, standard gradient-based optimization methods are significantly less computationally expensive than existing CL algorithms. In this paper, we study the progressive knowledge accumulation (KA) in DNNs trained with gradient-based algorithms in long sequences of tasks with data re-occurrence. We propose a new framework, SCoLe (Scaling Continual Learning), to investigate KA and discover that catastrophic forgetting has a limited effect on DNNs trained with SGD. When trained on long sequences with data sparsely re-occurring, the overall accuracy improves, which might be counter-intuitive given the CF phenomenon. We empirically investigate KA in DNNs under various data occurrence frequencies and propose simple and scalable strategies to increase knowledge accumulation in DNNs.