Publications

A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix

Thang Doan

Mehdi Abbana Bennani

Pierre Alquier

Continual learning (CL) is a setting in which an agent has to learn from an incoming stream of data during its entire lifetime. Although maj… (see more)or advances have been made in the field, one recurring problem which remains unsolved is that of Catastrophic Forgetting (CF). While the issue has been extensively studied empirically, little attention has been paid from a theoretical angle. In this paper, we show that the impact of CF increases as two tasks increasingly align. We introduce a measure of task similarity called the NTK overlap matrix which is at the core of CF. We analyze common projected gradient algorithms and demonstrate how they mitigate forgetting. Then, we propose a variant of Orthogonal Gradient Descent (OGD) which leverages structure of the data through Principal Component Analysis (PCA). Experiments support our theoretical findings and show how our method reduces CF on classical CL datasets.

2020-10-07

ArXiv (preprint)

arxiv.org

Contact Graph Epidemic Modelling of COVID-19 for Transmission and Intervention Strategies

Abby Leung

Xiaoye Ding

Shenyang Huang

Reihaneh Rabbany

The coronavirus disease 2019 (COVID-19) pandemic has quickly become a global public health crisis unseen in recent years. It is known that t… (see more)he structure of the human contact network plays an important role in the spread of transmissible diseases. In this work, we study a structure aware model of COVID-19 CGEM. This model becomes similar to the classical compartment-based models in epidemiology if we assume the contact network is a Erdos-Renyi (ER) graph, i.e. everyone comes into contact with everyone else with the same probability. In contrast, CGEM is more expressive and allows for plugging in the actual contact networks, or more realistic proxies for it. Moreover, CGEM enables more precise modelling of enforcing and releasing different non-pharmaceutical intervention (NPI) strategies. Through a set of extensive experiments, we demonstrate significant differences between the epidemic curves when assuming different underlying structures. More specifically we demonstrate that the compartment-based models are overestimating the spread of the infection by a factor of 3, and under some realistic assumptions on the compliance factor, underestimating the effectiveness of some of NPIs, mischaracterizing others (e.g. predicting a later peak), and underestimating the scale of the second peak after reopening.

2020-10-06

ArXiv (preprint)

arxiv.org

COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing

Prateek Gupta

Tegan Maharaj

Martin Weiss

Nasim Rahaman

Hannah Alsdurf

Abhinav Sharma

Nanor Minoyan

Soren Harnois-Leblanc

Victor Schmidt

Pierre-Luc St-Charles

Tristan Deleu

andrew williams

Akshay Patel

Meng Qu

Olexa Bilaniuk

gaetan caron

pierre luc carrier

satya ortiz gagne

Marc-Andre Rousseau

David Buckeridge … (see 9 more)

Joumana Ghosn

Yang Zhang

Bernhard Schölkopf

Jian Tang

Irina Rish

Chris Pal

Joanna Merckx

Eilif Benjamin Muller

Yoshua Bengio

2020-10-02

OpenReview.net/Anonymous_Preprint (unknown)

openreview.net

NutriQuébec: a unique web-based prospective cohort study to monitor the population’s eating and other lifestyle behaviours in the province of Québec

Annie Lapointe

Catherine Laramée

Ariane Belanger-Gravel

David Buckeridge

Sophie Desroches

Didier Garriguet

Lise Gauvin

Simone Lemieux

Céline Plante

Benoit Lamarche

2020-10-01

BMJ Open (published)

doi.org

Deep discriminant analysis for task-dependent compact network search

Qing Tian

Tal Arbel

James J. Clark

Most of today's popular deep architectures are hand-engineered for general purpose applications. However, this design procedure usually lead… (see more)s to massive redundant, useless, or even harmful features for specific tasks. Such unnecessarily high complexities render deep nets impractical for many real-world applications, especially those without powerful GPU support. In this paper, we attempt to derive task-dependent compact models from a deep discriminant analysis perspective. We propose an iterative and proactive approach for classification tasks which alternates between (1) a pushing step, with an objective to simultaneously maximize class separation, penalize co-variances, and push deep discriminants into alignment with a compact set of neurons, and (2) a pruning step, which discards less useful or even interfering neurons. Deconvolution is adopted to reverse `unimportant' filters' effects and recover useful contributing sources. A simple network growing strategy based on the basic Inception module is proposed for challenging tasks requiring larger capacity than what the base net can offer. Experiments on the MNIST, CIFAR10, and ImageNet datasets demonstrate our approach's efficacy. On ImageNet, by pushing and pruning our grown Inception-88 model, we achieve better-performing models than smaller deep Inception nets grown, residual nets, and famous compact nets at similar sizes. We also show that our grown deep Inception nets (without hard-coded dimension alignment) can beat residual nets of similar complexities.

2020-09-29

arXiv.org (preprint)

dblp.uni-trier.de

Learning to Summarize Long Texts with Memory Compression and Transfer

Jaehong Park

Jonathan Pilault

Chris Pal

2020-09-28

ArXiv (preprint)

arxiv.org

Optimal Control of Network-Coupled Subsystems: Spectral Decomposition and Low-Dimensional Solutions

Shuang Gao

Aditya Mahajan

In this article, we investigate the optimal control of network-coupled subsystems with coupled dynamics and costs. The dynamics coupling may… (see more) be represented by the adjacency matrix, the Laplacian matrix, or any other symmetric matrix corresponding to an underlying weighted undirected graph. Cost couplings are represented by two coupling matrices which have the same eigenvectors as the coupling matrix in the dynamics. We use the spectral decomposition of these three coupling matrices to decompose the overall system into

2020-09-25

ArXiv (preprint)

doi.org

arxiv.org

Generating Multiscale Amorphous Molecular Structures Using Deep Learning: A Study in 2D.

Michael Kilgour

Nicolas Gastellu

David Y. T. Hui

Yoshua Bengio

Lena Simine

Amorphous molecular assemblies appear in a vast array of systems: from living cells to chemical plants and from everyday items to new device… (see more)s. The absence of long-range order in amorphous materials implies that precise knowledge of their underlying structures throughout is needed to rationalize and control their properties at the mesoscale. Standard computational simulations suffer from exponentially unfavorable scaling of the required compute with system size. We present a method based on deep learning that leverages the finite range of structural correlations for an autoregressive generation of disordered molecular aggregates up to arbitrary size from small-scale computational or experimental samples. We benchmark performance on self-assembled nanoparticle aggregates and proceed to simulate monolayer amorphous carbon with atomistic resolution. This method bridges the gap between the nanoscale and mesoscale simulations of amorphous molecular systems.

2020-09-24

Journal of Physical Chemistry Letters (published)

doi.org

Preface

Tal Arbel

Ismail Ben Ayed

Marleen de Bruijne

Maxime Descoteaux

Hervé Lombaert

Chris Pal

2020-09-21

Proceedings of the Third Conference on Medical Imaging with Deep Learning (published)

proceedings.mlr.press

A learning-based algorithm to quickly compute good primal solutions for Stochastic Integer Programs

Yoshua Bengio

Emma Frejinger

Andrea Lodi

Rahul Anuj Patel

Sriram Sankaranarayanan

2020-09-19

Integration of Constraint Programming, Artificial Intelligence, and Operations Research (published)

doi.org

arxiv.org

Practical Dynamic SC-Flip Polar Decoders: Algorithm and Implementation

Furkan Ercan

Thibaud Tonnellier

Nghia Doan

Warren Gross

SC-Flip (SCF) is a low-complexity polar code decoding algorithm with improved performance, and is an alternative to high-complexity (CRC)-ai… (see more)ded SC-List (CA-SCL) decoding. However, the performance improvement of SCF is limited since it can correct up to only one channel error (

2020-09-17

ArXiv (preprint)

doi.org

arxiv.org

A normative modelling approach reveals age-atypical cortical thickness in a subgroup of males with autism spectrum disorder

Richard A.I. Bethlehem

Jakob Seidlitz

Rafael Romero-Garcia

Stavros Trakoshis

Guillaume Dumas

Michael V. Lombardo

2020-09-04

Communications Biology (published)

doi.org

Mila AI Policy Fellowship

The Development of the UN Scientific Panel on AI

Mila AI Policy Fellowship

The Development of the UN Scientific Panel on AI

Publications

Mila AI Policy Fellowship

The Development of the UN Scientific Panel on AI

Mila AI Policy Fellowship

The Development of the UN Scientific Panel on AI

Popular keywords:

Publications