Frederik Träuble

Low Compute Unlearning via Sparse Representations

Vedant Shah

Frederik Träuble

Ashish Malik

Hugo Larochelle

Michael Curtis Mozer

Sanjeev Arora

Yoshua Bengio

Anirudh Goyal

Machine unlearning, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible … (see more)using existing techniques. We propose a low-compute unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's performance on the rest of the dataset. We evaluate the proposed technique on the problem of class unlearning using four datasets: CIFAR-10, CIFAR-100, LACUNA-100 and ImageNet-1k. We compare the proposed technique to SCRUB, a state-of-the-art approach which uses knowledge distillation for unlearning. Across all four datasets, the proposed technique performs as well as, if not better than SCRUB while incurring almost no computational cost.

2023-11-25

arXiv (preprint)

doi.org

openreview.net

Discrete Key-Value Bottleneck

Frederik Träuble

Anirudh Goyal

Nasim Rahaman

Michael Mozer

Kenji Kawaguchi

Yoshua Bengio

Bernhard Schölkopf

Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant. Challenges emerge with… (see more) non-stationary training data streams such as continual learning. One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning. Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. In the present work, we propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes. Our paradigm will be to encode; process the representation via a discrete bottleneck; and decode. Here, the input is fed to the pre-trained encoder, the output of the encoder is used to select the nearest keys, and the corresponding values are fed to the decoder to solve the current task. The model can only fetch and re-use a sparse number of these key-value pairs during inference, enabling localized and context-dependent model updates. We theoretically investigate the ability of the discrete key-value bottleneck to minimize the effect of learning under distribution shifts and show that it reduces the complexity of the hypothesis class. We empirically verify the proposed method under challenging class-incremental learning scenarios and show that the proposed model - without any task boundaries - reduces catastrophic forgetting across a wide variety of pre-trained models, outperforming relevant baselines on this task.

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

A General-Purpose Neural Architecture for Geospatial Systems

Nasim Rahaman

Martin Weiss

Frederik Träuble

Francesco Locatello

Alexandre Lacoste

Yoshua Bengio

Chris Pal

Li Erran Li

Bernhard Schölkopf

2022-11-01

OpenReview (unknown)

doi.org

openreview.net

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

Manuel Wuthrich

Bernhard Schölkopf

Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environment… (see more)s. To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.

2021-01-11

ICLR.cc/2021/Conference (poster)

doi.org

openreview.net

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Frederik Träuble

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Frederik Träuble

Publications