Min Lin

Online Fast Adaptation and Knowledge Accumulation: a New Approach to Continual Learning

Massimo Caccia

Pau Rodriguez

Oleksiy Ostapenko

Fabrice Normandin

Min Lin

Lucas Caccia

Issam Hadj Laradji

Irina Rish

Alexande Lacoste

David Vázquez

Laurent Charlin

2020-03-12

ArXiv (preprint)

arxiv.org

Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning.

Massimo Caccia

Pau Rodriguez

Oleksiy Ostapenko

Fabrice Normandin

Min Lin

Lucas Caccia

Issam Hadj Laradji

Irina Rish

Alexandre Lacoste

David Vázquez

Laurent Charlin

Online Continual Learning with Maximally Interfered Retrieval

Lucas Caccia

Massimo Caccia

Tinne Tuytelaars

Continual learning, the setting where a learning agent is faced with a never ending stream of data, continues to be a great challenge for mo… (see more)dern machine learning systems. In particular the online or "single-pass through the data" setting has gained attention recently as a natural setting that is difficult to tackle. Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks. These approaches typically rely on randomly selecting samples from the replay memory or from a generative model, which is suboptimal. In this work, we consider a controlled sampling of memories for replay. We retrieve the samples which are most interfered, i.e. whose prediction will be most negatively impacted by the foreseen parameters update. We show a formulation for this sampling criterion in both the generative replay and the experience replay setting, producing consistent gains in performance and greatly reduced forgetting. We release an implementation of our method at this https URL.

2019-08-11

ArXiv (preprint)

arxiv.org

Conditional Computation for Continual Learning

Min Lin

Jie Fu

Yoshua Bengio

Catastrophic forgetting of connectionist neural networks is caused by the global sharing of parameters among all training examples. In this … (see more)study, we analyze parameter sharing under the conditional computation framework where the parameters of a neural network are conditioned on each input example. At one extreme, if each input example uses a disjoint set of parameters, there is no sharing of parameters thus no catastrophic forgetting. At the other extreme, if the parameters are the same for every example, it reduces to the conventional neural network. We then introduce a clipped version of maxout networks which lies in the middle, i.e. parameters are shared partially among examples. Based on the parameter sharing analysis, we can locate a limited set of examples that are interfered when learning a new example. We propose to perform rehearsal on this set to prevent forgetting, which is termed as conditional rehearsal. Finally, we demonstrate the effectiveness of the proposed method in an online non-stationary setup, where updates are made after each new example and the distribution of the received example shifts over time.

2019-06-16

ArXiv (preprint)

arxiv.org

Online continual learning with no task boundaries

Continual learning is the ability of an agent to learn online with a non-stationary and never-ending stream of data. A key component for suc… (see more)h never-ending learning process is to overcome the catastrophic forgetting of previously seen data, a problem that neural networks are well known to suffer from. The solutions developed so far often relax the problem of continual learning to the easier task-incremental setting, where the stream of data is divided into tasks with clear boundaries. In this paper, we break the limits and move to the more challenging online setting where we assume no information of tasks in the data stream. We start from the idea that each learning step should not increase the losses of the previously learned examples through constraining the optimization process. This means that the number of constraints grows linearly with the number of examples, which is a serious limitation. We develop a solution to select a ﬁxed number of constraints that we use to approximate the feasible region deﬁned by the original constraints. We compare our approach against the methods that rely on task boundaries to select a ﬁxed set of examples, and show comparable or even better results, especially when the boundaries are blurry or when the data distributions are imbalanced.

2019-03-20

arXiv.org (preprint)

dblp.uni-trier.de

Gradient based sample selection for online continual learning

A continual learning agent learns online with a non-stationary and never-ending stream of data. The key to such learning process is to overc… (see more)ome the catastrophic forgetting of previously seen data, which is a well known problem of neural networks. To prevent forgetting, a replay buffer is usually employed to store the previous data for the purpose of rehearsal. Previous works often depend on task boundary and i.i.d. assumptions to properly select samples for the replay buffer. In this work, we formulate sample selection as a constraint reduction problem based on the constrained optimization view of continual learning. The goal is to select a fixed subset of constraints that best approximate the feasible region defined by the original constraints. We show that it is equivalent to maximizing the diversity of samples in the replay buffer with parameters gradient as the feature. We further develop a greedy alternative that is cheap and efficient. The advantage of the proposed method is demonstrated by comparing to other alternatives under the continual learning setting. Further comparisons are made against state of the art methods that rely on task boundaries which show comparable or even better results for our method.

arxiv.org

On the Spectral Bias of Deep Neural Networks

Nasim Rahaman

Felix Draxler

Fred Hamprecht

It is well known that over-parametrized deep neural networks (DNNs) are an overly expressive class of functions that can memorize even rando… (see more)m data with

2018-06-22

arXiv.org (preprint)

dblp.uni-trier.de

On the Spectral Bias of Neural Networks

Nasim Rahaman

Felix Draxler

Fred Hamprecht