Irina Rish

sayed.mansouri-tehrani@mila.quebec

Amin Darabi

PhD - Université de Montréal

amin.darabi@mila.quebec

Amin Memarian

Independent visiting researcher

memariaa@mila.quebec

Amin Mansouri

Master's Research - Université de Montréal

andrew.williams@mila.quebec

Andrei Mircea Romascanu

PhD - Université de Montréal

PhD - Université de Montréal

arian.khorasani@mila.quebec

Arian Khorasani

Master's Research - Université de Montréal

arnav-kumar.jain@mila.quebec

Arjun Ashok

PhD

Co-supervisor :

Alexandre Drouin

arjun.ashok@mila.quebec

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher

ayush.kaushal@mila.quebec

Benjamin Therien

PhD - Université de Montréal

Co-supervisor :

benjamin.therien@mila.quebec

Collaborating researcher - Université de Montréal

connor.brennan@mila.quebec

Daria Yasafova

Research Intern - Technical University of Munich

daria.yasafova@mila.quebec

Dave Whipps

Master's Research - Université de Montréal

whippsda@mila.quebec

diganta.misra@mila.quebec

Diganta Misra

Master's Research - Université de Montréal

Postdoctorate

Principal supervisor :

Nicolas Le Roux

ekaterina.lobacheva@mila.quebec

PhD - McGill University

Principal supervisor :

Blake Richards

ethan.caballero@mila.quebec

george.adamopoulos@mila.quebec

George Adamopoulos

Research Intern

gopeshh.subbaraj@mila.quebec

Germán Abrevaya

Independent visiting researcher - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Gwen Legate

PhD - Concordia University

Principal supervisor :

gwendolyne.legate@mila.quebec

Ivan Anokhin

PhD - Université de Montréal

Co-supervisor :

Samira Ebrahimi Kahou

ivan.anokhin@mila.quebec

juan.mayor-torres@mila.quebec

Juan Manuel Mayor-Torres

Collaborating researcher

Collaborating Alumni - Université de Montréal

Co-supervisor :

Sarath Chandar Anbil Parthipan

kshitij.gupta@mila.quebec

Mahta Ramezanian

Master's Research - Université de Montréal

Co-supervisor :

mahta.ramezanian@mila.quebec

Matthew Riemer

PhD - Université de Montréal

matthew.riemer@mila.quebec

Maximilian Puelma Touzel

Collaborating researcher

PhD - Université de Montréal

arefinmr@mila.quebec

Mohammad Pezeshki

Collaborating researcher

pezeshki@mila.quebec

Mohammad-Javad Darvishi Bayazi

PhD - Université de Montréal

mohammad-javad.darvishi-bayasi@mila.quebec

PhD - Université de Montréal

faramarm@mila.quebec

Motahareh Pourrahimi

PhD - McGill University

Principal supervisor :

Pouya Bashivan

motahareh.pourrahimi@mila.quebec

nadhir.hassen@mila.quebec

Nadhir Hassen

Research Intern - Université de Montréal

Neeraj Kumar

Professional Master's - Université de Montréal

neeraj.kumar@mila.quebec

Nizar Islah

PhD - Université de Montréal

Principal supervisor :

Eilif Benjamin Muller

nizar.islah@mila.quebec

paolo.cudrano@mila.quebec

Omar Younis

Research Intern - Université de Montréal

omar.younis@mila.quebec

Collaborating researcher - Politecnico di Milano

Pascal Tikeng Notsawo

PhD - Université de Montréal

Co-supervisor :

pascal.tikeng@mila.quebec

Collaborating researcher

prateek.humane@mila.quebec

Master's Research - Université de Montréal

remus.mocanu@mila.quebec

Reza Bayat

Master's Research - Université de Montréal

Co-supervisor :

Pouya Bashivan

reza.bayat@mila.quebec

rishika.bhagwatkar@mila.quebec

Rishika Bhagwatkar

Master's Research - Université de Montréal

Collaborating researcher - Université de Montréal

roland.riachi@mila.quebec

Simon Dufort-Labbé

PhD - Université de Montréal

simon.dufort-labbe@mila.quebec

Sparsha Mishra

Master's Research - Université de Montréal

sparsha.mishra@mila.quebec

Tejas Vaidhya

Master's Research - Université de Montréal

tejas.vaidhya@mila.quebec

PhD - Université de Montréal

Co-supervisor :

Eilif Benjamin Muller

timothy.nest@mila.quebec

Vaibhav Singh

PhD - Concordia University

Principal supervisor :

vaibhav.singh@mila.quebec

Zahra Sheikhbahaee

Postdoctorate - Université de Montréal

Principal supervisor :

zahra.sheikhbahaee@mila.quebec

Publications

Towards Scaling Difference Target Propagation by Learning Backprop Targets

Maxence Ernoult

Fabrice Normandin

Abhinav Moudgil

Sean Spinney

The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to… (see more) scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks. One such algorithm is Difference Target Propagation (DTP), a biologically-plausible learning algorithm whose close relation with Gauss-Newton (GN) optimization has been recently established. However, the conditions under which this connection rigorously holds preclude layer-wise training of the feedback pathway synaptic weights (which is more biologically plausible). Moreover, good alignment between DTP weight updates and loss gradients is only loosely guaranteed and under very specific conditions for the architecture being trained. In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. Our theory is corroborated by experimental results and we report the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32

2022-06-28

Proceedings of the 39th International Conference on Machine Learning (published)

proceedings.mlr.press

Parametric Scattering Networks

Shanel Gauthier

Benjamin Thérien

Laurent Alséne-Racicot

Muawiz Chaudhary

Michael Eickenberg

Guy Wolf

The wavelet scattering transform creates geometric in-variants and deformation stability. In multiple signal do-mains, it has been shown to … (see more)yield more discriminative rep-resentations compared to other non-learned representations and to outperform learned representations in certain tasks, particularly on limited labeled data and highly structured signals. The wavelet filters used in the scattering trans-form are typically selected to create a tight frame via a pa-rameterized mother wavelet. In this work, we investigate whether this standard wavelet filterbank construction is op-timal. Focusing on Morlet wavelets, we propose to learn the scales, orientations, and aspect ratios of the filters to produce problem-specific parameterizations of the scattering transform. We show that our learned versions of the scattering transform yield significant performance gains in small-sample classification settings over the standard scat-tering transform. Moreover, our empirical results suggest that traditional filterbank constructions may not always be necessary for scattering transforms to extract effective rep-resentations.

2022-06-18

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (published)

A Remedy For Distributional Shifts Through Expected Domain Translation

Jean-Christophe Gagnon-Audet

Soroosh Shahtalebi

Frank Rudzicz

Machine learning models often fail to generalize to unseen domains due to the distributional shifts. A family of such shifts, “correlation… (see more) shifts,” is caused by spurious correlations in the data. It is studied under the overarching topic of “domain generalization.” In this work, we employ multi-modal translation networks to tackle the correlation shifts that appear when data is sampled out-of-distribution. Learning a generative model from training domains enables us to translate each training sample under the special characteristics of other possible domains. We show that by training a predictor solely on the generated samples, the spurious correlations in training domains average out, and the invariant features corresponding to true correlations emerge. Our proposed technique, Expected Domain Translation (EDT), is benchmarked on the Colored MNIST dataset and drastically improves the state-of-the-art classification accuracy by 38% with train-domain validation model selection.

2022-05-23

IEEE International Conference on Acoustics, Speech, and Signal Processing (published)

Summarizing Societies: Agent Abstraction in Multi-Agent Reinforcement Learning

Amin Memarian

Maximilian Puelma Touzel

Matthew D Riemer

Rupali Bhati

Agents cannot make sense of many-agent societies through direct consideration of small-scale, low-level agent identities, but instead must r… (see more)ecognize emergent collective identities. Here, we take a first step towards a framework for recognizing this structure in large groups of low-level agents so that they can be modeled as a much smaller number of high-level agents—a process that we call agent abstraction. We illustrate this process by extending bisimulation metrics for state abstraction in reinforcement learning to the setting of multi-agent reinforcement learning and analyze a straightforward, if crude, abstraction based on experienced joint actions. It addresses non-stationarity due to other learning agents by improving minimax regret by a intuitive factor. To test if this compression factor provides signal for higher-level agency, we applied it to a large dataset of human play of the popular social dilemma game Diplomacy. We find that it correlates strongly with the degree of ground-truth abstraction of low-level units into the human players.

2022-04-21

ICLR.cc/2022/Workshop/Cells2Societies (poster)

openreview.net

WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series Tasks

Jean-Christophe Gagnon-Audet

Kartik Ahuja

Mohammad-Javad Darvishi-Bayazi

2022-03-18

ArXiv (preprint)

Cognitive Models as Simulators: The Case of Moral Decision-Making

Ardavan S. Nobandegani

T. Shultz

2022-01-01

CogSci (published)

Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal

Sharath Chandra Raparthy

Yoshua Bengio

Guillaume Lajoie

Multi-head, key-value attention is the backbone of transformer-like model architectures which have proven to be widely successful in recent … (see more)years. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interaction, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval and is easy to implement in a variety of established network architectures.

2022-01-01

International Conference on Learning Representations (published)

openreview.net

Continual Learning In Environments With Polynomial Mixing Times

Matthew D Riemer

Sharath Chandra Raparthy

Ignacio Cases

Gopeshh Raaj Subbaraj

Maximilian Puelma Touzel

The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mi… (see more)xing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In particular, we theoretically establish that scalable MDPs have mixing times that scale polynomially with the size of the problem. We go on to demonstrate that polynomial mixing times present significant difficulties for existing approaches that suffer from myopic bias and stale bootstrapped estimates. To validate the proposed theory, we study the empirical scaling behavior of mixing times with respect to the number of tasks and task switching frequency for pretrained high performing policies on seven Atari games. Our analysis demonstrates both that polynomial mixing times do emerge in practice and how their existence may lead to unstable learning behavior like catastrophic forgetting in continual learning settings.

openreview.net

Continual Learning with Foundation Models: An Empirical Study of Latent Replay

Oleksiy Ostapenko

Timothee LESORT

Pau Rodriguez

Md Rifat Arefin

Arthur Douillard

Laurent Charlin

2022-01-01

CoLLAs (published)

Optimizing deep learning for Magnetoencephalography (MEG): From sensory perception to sex prediction and brain fingerprinting

Arthur Dehgan

Karim Jerbi

2022-01-01

2022 Conference on Cognitive Computational Neuroscience (published)

Scaling the Number of Tasks in Continual Learning

Timothee LESORT

Oleksiy Ostapenko

Diganta Misra

Md Rifat Arefin

Pau Rodriguez

Laurent Charlin

2022-01-01

arXiv.org (preprint)

Generative Models of Brain Dynamics -- A review

Mahta Ramezanian Panahi

Germán Abrevaya

Jean-Christophe Gagnon-Audet

Vikram Voleti

The principled design and discovery of biologically- and physically-informed models of neuronal dynamics has been advancing since the mid-tw… (see more)entieth century. Recent developments in artificial intelligence (AI) have accelerated this progress. This review article gives a high-level overview of the approaches across different scales of organization and levels of abstraction. The studies covered in this paper include fundamental models in computational neuroscience, nonlinear dynamics, data-driven methods, as well as emergent practices. While not all of these models span the intersection of neuroscience, AI, and system dynamics, all of them do or can work in tandem as generative models, which, as we argue, provide superior properties for the analysis of neuroscientific data. We discuss the limitations and unique dynamical traits of brain data and the complementary need for hypothesis- and data-driven modeling. By way of conclusion, we present several hybrid generative models from recent literature in scientific machine learning, which can be efficiently deployed to yield interpretable models of neural dynamics.

2021-12-22

ArXiv (preprint)