Irina Rish

Biographie

Irina Rish est professeure titulaire à l'Université de Montréal (UdeM), où elle dirige le Laboratoire d'IA autonome. Membre du corps professoral de Mila – Institut québécois d’intelligence artificielle, elle est titulaire d'une chaire d'excellence en recherche du Canada (CERC) et d'une chaire en IA Canada-CIFAR. Irina dirige le projet INCITE du ministère américain de l'Environnement au sujet des modèles de fondation évolutifs sur les superordinateurs Summit et Frontier à l'Oak Ridge Leadership Computing Facility (OLCF). Elle est cofondatrice et directrice scientifique de Nolano.ai.

Ses recherches actuelles portent sur les lois de mise à l'échelle neuronale et les comportements émergents (capacités et alignement) dans les modèles de fondation, ainsi que sur l'apprentissage continu, la généralisation hors distribution et la robustesse. Avant de se joindre à l'UdeM en 2019, Irina était chercheuse au Centre de recherche IBM Thomas J. Watson, où elle a travaillé sur divers projets à l'intersection des neurosciences et de l'IA, et dirigé le défi NeuroAI. Elle a reçu plusieurs prix IBM : ceux de l’excellence et de l’innovation exceptionnelle (2018), celui de la réalisation technique exceptionnelle (2017), et celui de l’accomplissement en recherche (2009). Elle détient 64 brevets et a écrit plus de 120 articles de recherche, plusieurs chapitres de livres, trois livres publiés et une monographie sur la modélisation éparse.

Étudiants actuels

Ivan Anokhin

Doctorat - UdeM

Co-superviseur⋅e :

Samira Ebrahimi Kahou

Doctorat - UdeM

Arjun Ashok

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Doctorat - UdeM

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Eugene Belilovsky

Mohammad Javad Darvishi Bayazi

Amin Darabi

Doctorat - UdeM

Collaborateur·rice de recherche - UdeM

Wagner Drew

Maîtrise recherche - Concordia

Mojtaba Faramarzi

Doctorat - UdeM

Parviz Haggi Mani

Visiteur de recherche indépendant - -

Nadhir Hassen

Collaborateur·rice de recherche - UdeM

Niklas Herbster

Stagiaire de recherche - UdeM

Co-superviseur⋅e :

Maîtrise recherche

Collaborateur·rice alumni - UdeM

Nizar Islah

Doctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Collaborateur·rice de recherche - UdeM

Neeraj Kumar

Collaborateur·rice alumni - UdeM

Gwen Legate

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Eugene Belilovsky

David Lemay

Maîtrise recherche - UdeM

Amin Mansouri

Collaborateur·rice alumni - UdeM

Collaborateur·rice de recherche

Doctorat - UdeM

Collaborateur·rice de recherche - UdeM

Gabriela Moisescu-Pareja

Collaborateur·rice de recherche - McGill

Timothy Nest

Doctorat - UdeM

Jonathan Pilault

Collaborateur·rice de recherche - Polytechnique

Motahareh Pourrahimi

Doctorat - McGill

Superviseur⋅e principal⋅e :

Pouya Bashivan

Mahta Ramezanian

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - McGill

Collaborateur·rice de recherche

Zibo Shang

Doctorat - UdeM

Vaibhav Singh

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Collaborateur·rice de recherche - UdeM

Jean-christophe Gagnon-audet

Sihui Wei

Baccalauréat - McGill

Andrew Williams

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

He Zhu

Doctorat - McGill

Publications

A Remedy For Distributional Shifts Through Expected Domain Translation.

Soroosh Shahtalebi

Frank Rudzicz

Machine learning models often fail to generalize to unseen domains due to the distributional shifts. A family of such shifts, “correlation… (voir plus) shifts,” is caused by spurious correlations in the data. It is studied under the overarching topic of “domain generalization.” In this work, we employ multi-modal translation networks to tackle the correlation shifts that appear when data is sampled out-of-distribution. Learning a generative model from training domains enables us to translate each training sample under the special characteristics of other possible domains. We show that by training a predictor solely on the generated samples, the spurious correlations in training domains average out, and the invariant features corresponding to true correlations emerge. Our proposed technique, Expected Domain Translation (EDT), is benchmarked on the Colored MNIST dataset and drastically improves the state-of-the-art classification accuracy by 38% with train-domain validation model selection.

2022-05-22

IEEE International Conference on Acoustics, Speech, and Signal Processing (publié)

Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal

Sharath Chandra Raparthy

Yoshua Bengio

Guillaume Lajoie

Multi-head, key-value attention is the backbone of the widely successful Transformer model and its variants. This attention mechanism uses m… (voir plus)ultiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interactions, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Importantly, standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner through an additional soft competition stage between the query-key combination and value pairing. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval, and can easily be implemented in lieu of standard attention heads in any network architecture.

2022-04-24

International Conference on Learning Representations (Accept (Spotlight))

Jean-christophe Gagnon-audet

WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series Tasks

Kartik Ahuja

Mohammad Javad Darvishi Bayazi

Guillaume Dumas

2022-03-17

ArXiv (prépublication)

Cognitive Models as Simulators: The Case of Moral Decision-Making

Ardavan S. Nobandegani

T. Shultz

2021-12-31

CogSci (publié)

Continual Learning In Environments With Polynomial Mixing Times

Matthew Riemer

Sharath Chandra Raparthy

Ignacio Cases

Gopeshh Subbaraj

Maximilian Puelma Touzel

The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mi… (voir plus)xing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In particular, we theoretically establish that scalable MDPs have mixing times that scale polynomially with the size of the problem. We go on to demonstrate that polynomial mixing times present significant difficulties for existing approaches, which suffer from myopic bias and stale bootstrapped estimates. To validate our theory, we study the empirical scaling behavior of mixing times with respect to the number of tasks and task duration for high performing policies deployed across multiple Atari games. Our analysis demonstrates both that polynomial mixing times do emerge in practice and how their existence may lead to unstable learning behavior like catastrophic forgetting in continual learning settings.

2021-12-31

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (publié)

Optimizing deep learning for Magnetoencephalography (MEG): From sensory perception to sex prediction and brain fingerprinting

Arthur Dehgan

Karim Jerbi

2021-12-31

2022 Conference on Cognitive Computational Neuroscience (publié)

PRACTICAL GUIDE

Paolo Bellavista

2021-12-31

(publié)

www.semanticscholar.org

Summarizing Societies: Agent Abstraction in Multi-Agent Reinforcement Learning

Maximilian Puelma Touzel

Matthew D Riemer

Rupali Bhati

Agents cannot make sense of many-agent societies through direct consideration of small-scale, low-level agent identities, but instead must r… (voir plus)ecognize emergent collective identities. Here, we take a first step towards a framework for recognizing this structure in large groups of low-level agents so that they can be modeled as a much smaller number of high-level agents—a process that we call agent abstraction. We illustrate this process by extending bisimulation metrics for state abstraction in reinforcement learning to the setting of multi-agent reinforcement learning and analyze a straightforward, if crude, abstraction based on experienced joint actions. It addresses non-stationarity due to other learning agents by improving minimax regret by a intuitive factor. To test if this compression factor provides signal for higher-level agency, we applied it to a large dataset of human play of the popular social dilemma game Diplomacy. We find that it correlates strongly with the degree of ground-truth abstraction of low-level units into the human players.

2021-12-31

(publié)

Jean-christophe Gagnon-audet

Generative Models of Brain Dynamics -- A review

Mahta Ramezanian Panahi

Germán Abrevaya

Vikram Voleti

Guillaume Dumas

The principled design and discovery of biologically- and physically-informed models of neuronal dynamics has been advancing since the mid-tw… (voir plus)entieth century. Recent developments in artificial intelligence (AI) have accelerated this progress. This review article gives a high-level overview of the approaches across different scales of organization and levels of abstraction. The studies covered in this paper include fundamental models in computational neuroscience, nonlinear dynamics, data-driven methods, as well as emergent practices. While not all of these models span the intersection of neuroscience, AI, and system dynamics, all of them do or can work in tandem as generative models, which, as we argue, provide superior properties for the analysis of neuroscientific data. We discuss the limitations and unique dynamical traits of brain data and the complementary need for hypothesis- and data-driven modeling. By way of conclusion, we present several hybrid generative models from recent literature in scientific machine learning, which can be efficiently deployed to yield interpretable models of neural dynamics.

2021-12-21

ArXiv (prépublication)

Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers

A. Chandar

Empirical science of neural scaling laws is a rapidly growing area of significant importance to the future of machine learning, particularly… (voir plus) in the light of recent breakthroughs achieved by large-scale pre-trained models such as GPT-3, CLIP and DALL-e. Accurately predicting the neural network performance with increasing resources such as data, compute and model size provides a more comprehensive evaluation of different approaches across multiple scales, as opposed to traditional point-wise comparisons of fixed-size models on fixed-size benchmarks, and, most importantly, allows for focus on the best-scaling, and thus most promising in the future, approaches. In this work, we consider a challenging problem of few-shot learning in image classification, especially when the target data distribution in the few-shot phase is different from the source, training, data distribution, in a sense that it includes new image classes not encountered during training. Our current main goal is to investigate how the amount of pre-training data affects the few-shot generalization performance of standard image classifiers. Our key observations are that (1) such performance improvements are well-approximated by power laws (linear log-log plots) as the training set size increases, (2) this applies to both cases of target data coming from either the same or from a different domain (i.e., new classes) as the training data, and (3) few-shot performance on new classes converges at a faster rate than the standard classification performance on previously seen classes. Our findings shed new light on the relationship between scale and generalization.

2021-10-12

ArXiv (prépublication)

Approximate Bayesian Optimisation for Neural Networks

Nadhir Hassen

2021-08-26

ArXiv (prépublication)

Toward Optimal Solution for the Context-Attentive Bandit Problem

Djallel Bouneffouf

Raphael Feraud

Sohini Upadhyay

Yasaman Khazaeni

2021-08-18

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (publié)