Irina Rish

Biographie

Irina Rish est professeure titulaire à l'Université de Montréal (UdeM), où elle dirige le Laboratoire d'IA autonome. Membre du corps professoral de Mila – Institut québécois d’intelligence artificielle, elle est titulaire d'une chaire d'excellence en recherche du Canada (CERC) et d'une chaire en IA Canada-CIFAR. Irina dirige le projet INCITE du ministère américain de l'Environnement au sujet des modèles de fondation évolutifs sur les superordinateurs Summit et Frontier à l'Oak Ridge Leadership Computing Facility (OLCF). Elle est cofondatrice et directrice scientifique de Nolano.ai.

Ses recherches actuelles portent sur les lois de mise à l'échelle neuronale et les comportements émergents (capacités et alignement) dans les modèles de fondation, ainsi que sur l'apprentissage continu, la généralisation hors distribution et la robustesse. Avant de se joindre à l'UdeM en 2019, Irina était chercheuse au Centre de recherche IBM Thomas J. Watson, où elle a travaillé sur divers projets à l'intersection des neurosciences et de l'IA, et dirigé le défi NeuroAI. Elle a reçu plusieurs prix IBM : ceux de l’excellence et de l’innovation exceptionnelle (2018), celui de la réalisation technique exceptionnelle (2017), et celui de l’accomplissement en recherche (2009). Elle détient 64 brevets et a écrit plus de 120 articles de recherche, plusieurs chapitres de livres, trois livres publiés et une monographie sur la modélisation éparse.

Étudiants actuels

George Adamopoulos

Stagiaire de recherche

Ivan Anokhin

Doctorat - UdeM

Co-superviseur⋅e :

Samira Ebrahimi Kahou

Doctorat - UdeM

Arjun Ashok

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Doctorat - McGill

Superviseur⋅e principal⋅e :

Blake Richards

Mohammad Javad Darvishi Bayazi

Amin Darabi

Doctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Karim Jerbi

Wagner Drew

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Collaborateur·rice alumni - UdeM

Maîtrise recherche

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Ioannis Mitliagkas

Nizar Islah

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Eilif Benjamin Muller

Doctorat - UdeM

Collaborateur·rice de recherche

Zafir Khalid

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Maîtrise recherche - UdeM

Neeraj Kumar

Collaborateur·rice alumni - UdeM

Gwen Legate

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Eugene Belilovsky

David Lemay

Maîtrise recherche - UdeM

Jonathan Lim

Collaborateur·rice de recherche

amin.mansouri@mila.quebec

Baihan Lin

Visiteur de recherche indépendant - Mt. Sinai

Maîtrise recherche - UdeM

Collaborateur·rice de recherche

Doctorat - UdeM

Maîtrise recherche - UdeM

Diganta Misra

Maîtrise recherche - UdeM

Timothy Nest

Doctorat - UdeM

Co-superviseur⋅e :

Eilif Benjamin Muller

Mohammad Pezeshki

Collaborateur·rice de recherche

Co-superviseur⋅e :

Doctorat - McGill

Superviseur⋅e principal⋅e :

Pouya Bashivan

Mahta Ramezanian

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Guillaume Dumas

Roland Riachi

Collaborateur·rice de recherche - UdeM

Matthew Riemer

Doctorat - UdeM

Alexis Roger

Doctorat - McGill

Superviseur⋅e principal⋅e :

Blake Richards

Vaibhav Singh

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Publications

Parametric Scattering Networks

Shanel Gauthier

Benjamin Th'erien

Laurent Alséne-Racicot

Eugene Belilovsky

Michael Eickenberg

Guy Wolf

The wavelet scattering transform creates geometric in-variants and deformation stability. In multiple signal do-mains, it has been shown to … (voir plus)yield more discriminative rep-resentations compared to other non-learned representations and to outperform learned representations in certain tasks, particularly on limited labeled data and highly structured signals. The wavelet filters used in the scattering trans-form are typically selected to create a tight frame via a pa-rameterized mother wavelet. In this work, we investigate whether this standard wavelet filterbank construction is op-timal. Focusing on Morlet wavelets, we propose to learn the scales, orientations, and aspect ratios of the filters to produce problem-specific parameterizations of the scattering transform. We show that our learned versions of the scattering transform yield significant performance gains in small-sample classification settings over the standard scat-tering transform. Moreover, our empirical results suggest that traditional filterbank constructions may not always be necessary for scattering transforms to extract effective rep-resentations.

2021-07-20

ArXiv (preprint)

Double-Linear Thompson Sampling for Context-Attentive Bandits

Djallel Bouneffouf

Raphael Feraud

Sohini Upadhyay

Yasaman Khazaeni

In this paper, we analyze and extend an online learning frame-work known as Context-Attentive Bandit, motivated by various practical applica… (voir plus)tions, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration; however, the agent has a freedom to choose which variables to observe. We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS), which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets.

2021-06-06

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (publié)

Toward Skills Dialog Orchestration with Online Learning

Djallel Bouneffouf

Raphael Feraud

Sohini Upadhyay

Mayank Agarwal

Yasaman Khazaeni

Building multi-domain AI agents is a challenging task and an open problem in the area of AI. Within the domain of dialog, the ability to orc… (voir plus)hestrate multiple independently trained dialog agents, or skills, to create a unified system is of particular significance. In this work, we study the task of online posterior dialog orchestration, where we define posterior orchestration as the task of selecting a subset of skills which most appropriately answer a user input using features extracted from both the user input and the individual skills. To account for the various costs associated with extracting skill features, we consider online posterior orchestration under a skill execution budget. We formalize this setting as Context Attentive Bandit with Observations (CABO), a variant of context attentive bandits, and evaluate it on proprietary conversational datasets.

2021-06-06

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (publié)

SAND-mask: An Enhanced Gradient Masking Strategy for the Discovery of Invariances in Domain Generalization

Soroosh Shahtalebi

Jean-Christophe Gagnon-Audet

Touraj Laleh

Mojtaba Faramarzi

Kartik Ahuja

A major bottleneck in the real-world applications of machine learning models is their failure in generalizing to unseen domains whose data d… (voir plus)istribution is not i.i.d to the training domains. This failure often stems from learning non-generalizable features in the training domains that are spuriously correlated with the label of data. To address this shortcoming, there has been a growing surge of interest in learning good explanations that are hard to vary, which is studied under the notion of Out-of-Distribution (OOD) Generalization. The search for good explanations that are \textit{invariant} across different domains can be seen as finding local (global) minimas in the loss landscape that hold true across all of the training domains. In this paper, we propose a masking strategy, which determines a continuous weight based on the agreement of gradients that flow in each edge of network, in order to control the amount of update received by the edge in each step of optimization. Particularly, our proposed technique referred to as"Smoothed-AND (SAND)-masking", not only validates the agreement in the direction of gradients but also promotes the agreement among their magnitudes to further ensure the discovery of invariances across training domains. SAND-mask is validated over the Domainbed benchmark for domain generalization and significantly improves the state-of-the-art accuracy on the Colored MNIST dataset while providing competitive results on other domain generalization datasets.

2021-06-04

ArXiv (prépublication)

Continual Learning in Deep Networks: an Analysis of the Last Layer

Timothee LESORT

Thomas George

We study how different output layers in a deep neural network learn and forget in continual learning settings. The following three factors… (voir plus) can affect catastrophic forgetting in the output layer: (1) weights modifications, (2) interference, and (3) projection drift. In this paper, our goal is to provide more insights into how changing the output layers may address (1) and (2). Some potential solutions to those issues are proposed and evaluated here in several continual learning scenarios. We show that the best-performing type of the output layer depends on the data distribution drifts and/or the amount of data available. In particular, in some cases where a standard linear layer would fail, it turns out that changing parameterization is sufficient in order to achieve a significantly better performance, whithout introducing a continual-learning algorithm and instead using the standard SGD to train a model. Our analysis and results shed light on the dynamics of the output layer in continual learning scenarios, and suggest a way of selecting the best type of output layer for a given scenario.

2021-06-03

ArXiv (prépublication)

Learning Brain Dynamics With Coupled Low-Dimensional Nonlinear Oscillators and Deep Recurrent Networks

Germán Abrevaya

Guillaume Dumas

Aleksandr Y. Aravkin

Peng Zheng

Jean-Christophe Gagnon-Audet

James Kozloski

Pablo Polosecki

Guillaume Lajoie

David Cox

Silvina Ponce Dawson

Guillermo Cecchi

Many natural systems, especially biological ones, exhibit complex multivariate nonlinear dynamical behaviors that can be hard to capture by … (voir plus)linear autoregressive models. On the other hand, generic nonlinear models such as deep recurrent neural networks often require large amounts of training data, not always available in domains such as brain imaging; also, they often lack interpretability. Domain knowledge about the types of dynamics typically observed in such systems, such as a certain type of dynamical systems models, could complement purely data-driven techniques by providing a good prior. In this work, we consider a class of ordinary differential equation (ODE) models known as van der Pol (VDP) oscil lators and evaluate their ability to capture a low-dimensional representation of neural activity measured by different brain imaging modalities, such as calcium imaging (CaI) and fMRI, in different living organisms: larval zebrafish, rat, and human. We develop a novel and efficient approach to the nontrivial problem of parameters estimation for a network of coupled dynamical systems from multivariate data and demonstrate that the resulting VDP models are both accurate and interpretable, as VDP's coupling matrix reveals anatomically meaningful excitatory and inhibitory interactions across different brain subsystems. VDP outperforms linear autoregressive models (VAR) in terms of both the data fit accuracy and the quality of insight provided by the coupling matrices and often tends to generalize better to unseen data when predicting future brain activity, being comparable to and sometimes better than the recurrent neural networks (LSTMs). Finally, we demonstrate that our (generative) VDP model can also serve as a data-augmentation tool leading to marked improvements in predictive accuracy of recurrent neural networks. Thus, our work contributes to both basic and applied dimensions of neuroimaging: gaining scientific insights and improving brain-based predictive models, an area of potentially high practical importance in clinical diagnosis and neurotechnology.

2021-05-26

Neural Computation (publié)

Gradient Masked Federated Optimization

Irene Tenison

Sreya Francis

2021-04-21

ArXiv (prépublication)

Towards Causal Federated Learning For Enhanced Robustness and Privacy

Sreya Francis

Irene Tenison

Federated Learning is an emerging privacy-preserving distributed machine learning approach to building a shared model by performing distribu… (voir plus)ted training locally on participating devices (clients) and aggregating the local models into a global one. As this approach prevents data collection and aggregation, it helps in reducing associated privacy risks to a great extent. However, the data samples across all participating clients are usually not independent and identically distributed (non-iid), and Out of Distribution(OOD) generalization for the learned models can be poor. Besides this challenge, federated learning also remains vulnerable to various attacks on security wherein a few malicious participating entities work towards inserting backdoors, degrading the generated aggregated model as well as inferring the data owned by participating entities. In this paper, we propose an approach for learning invariant (causal) features common to all participating clients in a federated learning setup and analyze empirically how it enhances the Out of Distribution (OOD) accuracy as well as the privacy of the final learned model.

2021-04-14

ArXiv (prépublication)

Understanding Continual Learning Settings with Data Distribution Drift Analysis

Timothee LESORT

Massimo Caccia

Classical machine learning algorithms often assume that the data are drawn i.i.d. from a stationary probability distribution. Recently, cont… (voir plus)inual learning emerged as a rapidly growing area of machine learning where this assumption is relaxed, i.e. where the data distribution is non-stationary and changes over time. This paper represents the state of data distribution by a context variable

2021-04-04

ArXiv (prépublication)

Predicting Infectiousness for Proactive Contact Tracing

Yoshua Bengio

Prateek Gupta

Tegan Maharaj

Nasim Rahaman

Martin Weiss

Tristan Deleu

Eilif Benjamin Muller

Meng Qu

Victor Schmidt

Pierre-Luc St-Charles

Hannah Alsdurf

Olexa Bilaniuk

David Buckeridge

gaetan caron

pierre luc carrier

Joumana Ghosn

satya ortiz gagne

Chris Pal

Bernhard Schölkopf … (voir 3 de plus)

Abhinav Sharma

Jian Tang

andrew williams

The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdo… (voir plus)wns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). We find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe re-opening and second-wave prevention.

2021-01-12

ICLR.cc/2021/Conference (spotlight)

Adversarial Feature Desensitization

Pouya Bashivan

Reza Bayat

Adam Ibrahim

Kartik Ahuja

Mojtaba Faramarzi

Touraj Laleh

Blake Richards

Neural networks are known to be vulnerable to adversarial attacks -- slight but carefully constructed perturbations of the inputs which can … (voir plus)drastically impair the network's performance. Many defense methods have been proposed for improving robustness of deep networks by training them on adversarially perturbed inputs. However, these models often remain vulnerable to new types of attacks not seen during training, and even to slightly stronger versions of previously seen attacks. In this work, we propose a novel approach to adversarial robustness, which builds upon the insights from the domain adaptation field. Our method, called Adversarial Feature Desensitization (AFD), aims at learning features that are invariant towards adversarial perturbations of the inputs. This is achieved through a game where we learn features that are both predictive and robust (insensitive to adversarial attacks), i.e. cannot be used to discriminate between natural and adversarial data. Empirical results on several benchmarks demonstrate the effectiveness of the proposed approach against a wide range of attack types and attack strengths. Our code is available at https://github.com/BashivanLab/afd.

Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

Kartik Ahuja

Ethan Caballero

Dinghuai Zhang

Jean-Christophe Gagnon-Audet

Yoshua Bengio

Ioannis Mitliagkas

The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address… (voir plus) out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due to the methods failing to capture the invariance? Or is the invariance principle itself insufficient? To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD. In contrast to the linear regression tasks, we show that for linear classification tasks we need much stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. Furthermore, even with appropriate restrictions on distribution shifts in place, we show that the invariance principle alone is insufficient. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not. We propose an approach that incorporates both of these principles and demonstrate its effectiveness in several experiments.