Irina Rish

sayed.mansouri-tehrani@mila.quebec

Amin Mansouri

Master's Research - Université de Montréal

andrew.williams@mila.quebec

Amin Darabi

PhD - Université de Montréal

amin.darabi@mila.quebec

Amin Memarian

Independent visiting researcher

memariaa@mila.quebec

Andrei Mircea Romascanu

PhD - Université de Montréal

PhD - Université de Montréal

arian.khorasani@mila.quebec

Arian Khorasani

Master's Research - Université de Montréal

arnav-kumar.jain@mila.quebec

Arjun Ashok

PhD

Co-supervisor :

Alexandre Drouin

arjun.ashok@mila.quebec

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher

ayush.kaushal@mila.quebec

Benjamin Therien

PhD - Université de Montréal

Co-supervisor :

Eugene Belilovsky

benjamin.therien@mila.quebec

Collaborating researcher - Université de Montréal

connor.brennan@mila.quebec

Daria Yasafova

Research Intern - Technical University of Munich

daria.yasafova@mila.quebec

Dave Whipps

Master's Research - Université de Montréal

whippsda@mila.quebec

diganta.misra@mila.quebec

Diganta Misra

Master's Research - Université de Montréal

Postdoctorate

Principal supervisor :

Nicolas Le Roux

ekaterina.lobacheva@mila.quebec

PhD - McGill University

Principal supervisor :

Blake Richards

ethan.caballero@mila.quebec

george.adamopoulos@mila.quebec

George Adamopoulos

Research Intern

gopeshh.subbaraj@mila.quebec

Germán Abrevaya

Independent visiting researcher - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

gwendolyne.legate@mila.quebec

Gwen Legate

PhD - Concordia University

Principal supervisor :

Eugene Belilovsky

Ivan Anokhin

PhD - Université de Montréal

Co-supervisor :

Samira Ebrahimi Kahou

ivan.anokhin@mila.quebec

juan.mayor-torres@mila.quebec

Juan Manuel Mayor-Torres

Collaborating researcher

Collaborating Alumni - Université de Montréal

Co-supervisor :

Sarath Chandar Anbil Parthipan

kshitij.gupta@mila.quebec

Mahta Ramezanian

Master's Research - Université de Montréal

Co-supervisor :

mahta.ramezanian@mila.quebec

Matthew Riemer

PhD - Université de Montréal

matthew.riemer@mila.quebec

Maximilian Puelma Touzel

Collaborating researcher

PhD - Université de Montréal

arefinmr@mila.quebec

Mohammad Pezeshki

Collaborating researcher

pezeshki@mila.quebec

Mohammad-Javad Darvishi Bayazi

PhD - Université de Montréal

mohammad-javad.darvishi-bayasi@mila.quebec

PhD - Université de Montréal

faramarm@mila.quebec

Motahareh Pourrahimi

PhD - McGill University

Principal supervisor :

Pouya Bashivan

motahareh.pourrahimi@mila.quebec

nadhir.hassen@mila.quebec

Nadhir Hassen

Research Intern - Université de Montréal

Neeraj Kumar

Professional Master's - Université de Montréal

neeraj.kumar@mila.quebec

Nizar Islah

PhD - Université de Montréal

Principal supervisor :

Eilif Benjamin Muller

nizar.islah@mila.quebec

paolo.cudrano@mila.quebec

Omar Younis

Research Intern - Université de Montréal

omar.younis@mila.quebec

Collaborating researcher - Politecnico di Milano

Pascal Tikeng Notsawo

PhD - Université de Montréal

Co-supervisor :

pascal.tikeng@mila.quebec

Collaborating researcher

prateek.humane@mila.quebec

Master's Research - Université de Montréal

remus.mocanu@mila.quebec

Reza Bayat

Master's Research - Université de Montréal

Co-supervisor :

Pouya Bashivan

reza.bayat@mila.quebec

rishika.bhagwatkar@mila.quebec

Rishika Bhagwatkar

Master's Research - Université de Montréal

Collaborating researcher - Université de Montréal

roland.riachi@mila.quebec

Simon Dufort-Labbé

PhD - Université de Montréal

simon.dufort-labbe@mila.quebec

Sparsha Mishra

Master's Research - Université de Montréal

sparsha.mishra@mila.quebec

Tejas Vaidhya

Master's Research - Université de Montréal

tejas.vaidhya@mila.quebec

PhD - Université de Montréal

Co-supervisor :

Eilif Benjamin Muller

timothy.nest@mila.quebec

Vaibhav Singh

PhD - Concordia University

Principal supervisor :

Eugene Belilovsky

vaibhav.singh@mila.quebec

Zahra Sheikhbahaee

Postdoctorate - Université de Montréal

Principal supervisor :

zahra.sheikhbahaee@mila.quebec

Publications

A Survey on Compositional Generalization in Applications

Baihan Lin

Djallel Bouneffouf

2023-02-02

ArXiv (preprint)

Broken Neural Scaling Laws

Ethan Caballero

Kshitij Gupta

David Scott Krueger

We present a smoothly broken power law functional form (that we refer to as a Broken Neural Scaling Law (BNSL)) that accurately models&extra… (see more)polates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, number of training steps, or upstream performance varies) for various architectures&for each of various tasks within a large&diverse set of upstream&downstream tasks, in zero-shot, prompted,&finetuned settings. This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, AI capabilities, robotics, out-of-distribution (OOD) generalization, continual learning, transfer learning, uncertainty estimation / calibration, OOD detection, adversarial robustness, distillation, sparsity, retrieval, quantization, pruning, fairness, molecules, computer programming/coding, math word problems,"emergent phase transitions", arithmetic, supervised learning, unsupervised/self-supervised learning,&reinforcement learning (single agent&multi-agent). When compared to other functional forms for neural scaling, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models&extrapolates scaling behavior that other functional forms are incapable of expressing such as the nonmonotonic transitions present in the scaling behavior of phenomena such as double descent&the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws

2023-02-01

ICLR.cc/2023/Conference (poster)

openreview.net

AI Agents Learn to Trust

Ardavan S. Nobandegani

T. Shultz

2023-01-01

Annual Meeting of the Cognitive Science Society (published)

dblp.uni-trier.de

GOKU-UI: Ubiquitous Inference through Attention and Multiple Shooting for Continuous-time Generative Models

Germán Abrevaya

Mahta Ramezanian-Panahi

Jean-Christophe Gagnon-Audet

Pablo Polosecki

Silvina Ponce Dawson

Guillermo Cecchi

Scientiﬁc Machine Learning (SciML) is a burgeoning ﬁeld that synergistically combines domain-aware and interpretable models with agnosti… (see more)c machine learning techniques. In this work, we introduce GOKU-UI, an evolution of the SciML generative model GOKU-nets. The GOKU-UI broadens the original model’s spectrum to incorporate other classes of differential equations, such as Stochastic Differential Equations (SDEs), and integrates a distributed, i.e. ubiquitous, inference through attention mechanisms and a novel multiple shooting training strategy in the latent space. These enhancements have led to a signiﬁcant increase in its performance in both reconstruction and forecast tasks, as demonstrated by our evaluation of simulated and empirical data. Speciﬁcally, GOKU-UI outperformed all baseline models on synthetic datasets even with a training set 32-fold smaller, underscoring its remarkable data efﬁciency. Furthermore, when applied to empirical human brain data, while incorporating stochastic Stuart-Landau

2023-01-01

arXiv.org (preprint)

Lag-Llama: Towards Foundation Models for Time Series Forecasting

Kashif Rasul

Arjun Ashok

Andrew Robert Williams

Arian Khorasani

George Adamopoulos

Rishika Bhagwatkar

Marin Biloš

Hena Ghonia

N. Hassen

Anderson Schneider

Sahil Garg

Alexandre Drouin

Nicolas Chapados

Yuriy Nevmyvaka

Aiming to build foundation models for time-series forecasting and study their scaling behavior, we present here our work-in-progress on Lag-… (see more)Llama , a general-purpose univariate probabilistic time-series forecasting model trained on a large collection of time-series data. The model shows good zero-shot prediction capabilities on unseen “out-of-distribution” time-series datasets, outperforming supervised baselines. We use smoothly broken power-laws [7] to fit and predict model scaling behavior. The open source code is made available at https://github

2023-01-01

arXiv.org (preprint)

Towards Continual Reinforcement Learning: A Review and Perspectives

Khimya Khetarpal

Matthew D Riemer

Doina Precup

2022-12-22

Journal of Artificial Intelligence Research (published)

Continual Learning with Foundation Models: An Empirical Study of Latent Replay

Oleksiy Ostapenko

Timothee LESORT

Pau Rodriguez

Md Rifat Arefin

Arthur Douillard

Laurent Charlin

Rapid development of large-scale pre-training has resulted in foundation models that can act as effective feature extractors on a variety of… (see more) downstream tasks and domains. Motivated by this, we study the efficacy of pre-trained vision models as a foundation for downstream continual learning (CL) scenarios. Our goal is twofold. First, we want to understand the compute-accuracy trade-off between CL in the raw-data space and in the latent space of pre-trained encoders. Second, we investigate how the characteristics of the encoder, the pre-training algorithm and data, as well as of the resulting latent space affect CL performance. For this, we compare the efficacy of various pre-trained models in large-scale benchmarking scenarios with a vanilla replay setting applied in the latent and in the raw-data space. Notably, this study shows how transfer, forgetting, task similarity and learning are dependent on the input data characteristics and not necessarily on the CL algorithms. First, we show that under some circumstances reasonable CL performance can readily be achieved with a non-parametric classifier at negligible compute. We then show how models pre-trained on broader data result in better performance for various replay sizes. We explain this with representational similarity and transfer properties of these representations. Finally, we show the effectiveness of self-supervised pre-training for downstream domains that are out-of-distribution as compared to the pre-training domain. We point out and validate several research directions that can further increase the efficacy of latent CL including representation ensembling. The diverse set of datasets used in this study can serve as a compute-efficient playground for further CL research. We will publish the code.

2022-11-28

Proceedings of The 1st Conference on Lifelong Learning Agents (published)

proceedings.mlr.press

APP: Anytime Progressive Pruning

Diganta Misra

Bharat Runwal

Tianlong Chen

Zhangyang Wang

With the latest advances in deep learning, several methods have been investigated for optimal learning settings in scenarios where the data … (see more)stream is continuous over time. However, training sparse networks in such settings has often been overlooked. In this paper, we explore the problem of training a neural network with a target sparsity in a particular case of online learning: the anytime learning at macroscale paradigm (ALMA). We propose a novel way of progressive pruning, referred to as \textit{Anytime Progressive Pruning} (APP); the proposed approach significantly outperforms the baseline dense and Anytime OSP models across multiple architectures and datasets under short, moderate, and long-sequence training. Our method, for example, shows an improvement in accuracy of

2022-11-17

ACML.org/2022/Workshop/CLL (published)

openreview.net

Knowledge Distillation for Federated Learning: a Practical Guide

Alessio Mora

Irene Tenison

Paolo Bellavista

Federated Learning (FL) enables the training of Deep Learning models without centrally collecting possibly sensitive raw data. This paves th… (see more)e way for stronger privacy guarantees when building predictive models. The most used algorithms for FL are parameter-averaging based schemes (e.g., Federated Averaging) that, however, have well known limits: (i) Clients must implement the same model architecture; (ii) Transmitting model weights and model updates implies high communication cost, which scales up with the number of model parameters; (iii) In presence of non-IID data distributions, parameter-averaging aggregation schemes perform poorly due to client model drifts. Federated adaptations of regular Knowledge Distillation (KD) can solve and/or mitigate the weaknesses of parameter-averaging FL algorithms while possibly introducing other trade-offs. In this article, we provide a review of KD-based algorithms tailored for specific FL issues.

2022-11-09

ArXiv (preprint)

Aligning MAGMA by Few-Shot Learning and Finetuning

Jean-Charles Layoun

Alexis Roger

2022-10-18

ArXiv (preprint)

Generative Models of Brain Dynamics

Mahta Ramezanian-Panahi

Germán Abrevaya

Jean-Christophe Gagnon-Audet

Vikram Voleti

2022-07-15

Frontiers in Artificial Intelligence (published)

Challenging Common Assumptions about Catastrophic Forgetting

Timothee LESORT

Oleksiy Ostapenko

Pau Rodriguez

Md Rifat Arefin

Diganta Misra

Laurent Charlin