Ross Goroshin

Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

Joshua Greaves

Charline Le Lan

Marc Gendron-Bellemare

Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well-und… (see more)erstood; in practice, how-ever, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent’s network are simultaneously increased. For this purpose, we derive a new family of auxiliary tasks based on the successor measure. These tasks are easy to implement and have appealing theoretical properties. Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)’s proto-value functions to deep reinforcement learning – accordingly, we call the resulting object proto-value networks. Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment’s reward function.

2023-02-01

ICLR.cc/2023/Conference (poster)

doi.org

openreview.net

Block-State Transformers

Jonathan Pilault

Mahan Fathi

Orhan Firat

Chris Pal

Pierre-Luc Bacon

Block-State Transformers

Mahan Fathi

Orhan Firat

2023-01-01

NeurIPS (published)

Learned Image Compression for Machine Perception

Felipe Codevilla

Jean Gabriel Simard

Chris Pal

2021-11-03

ArXiv (preprint)

Impact of Aliasing on Generalization in Deep Convolutional Networks

Cristina Vasconcelos

Vincent Dumoulin

Rob Romijnders

Nicolas Le Roux

We investigate the impact of aliasing on generalization in Deep Convolutional Networks and show that data augmentation schemes alone are una… (see more)ble to prevent it due to structural limitations in widely used architectures. Drawing insights from frequency analysis theory, we take a closer look at ResNet and EfficientNet architectures and review the trade-off between aliasing and information loss in each of their major components. We show how to mitigate aliasing by inserting non-trainable low-pass filters at key locations, particularly where networks lack the capacity to learn them. These simple architectural changes lead to substantial improvements in generalization on i.i.d. and even more on out-of-distribution conditions, such as image classification under natural corruptions on ImageNet-C [11] and few-shot learning on Meta-Dataset [26]. State-of-the art results are achieved on both datasets without introducing additional trainable parameters and using the default hyper-parameters of open source codebases.

2021-10-10

2021 IEEE/CVF International Conference on Computer Vision (ICCV) (published)

doi.org

Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

Vincent Dumoulin

Neil Houlsby

Utku Evci

Xiaohua Zhai

Sylvain Gelly

Meta and transfer learning are two successful families of approaches to few-shot learning. Despite highly related goals, state-of-the-art ad… (see more)vances in each family are measured largely in isolation of each other. As a result of diverging evaluation norms, a direct or thorough comparison of different approaches is challenging. To bridge this gap, we perform a cross-family study of the best transfer and meta learners on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB). We find that, on average, large-scale transfer methods (Big Transfer, BiT) outperform competing approaches on MD, even when trained only on ImageNet. In contrast, meta-learning approaches struggle to compete on VTAB when trained and validated on MD. However, BiT is not without limitations, and pushing for scale does not improve performance on highly out-of-distribution MD tasks. In performing this study, we reveal a number of discrepancies in evaluation norms and study some of these in light of the performance gap. We hope that this work facilitates sharing of insights from each community, and accelerates progress on few-shot learning.

2021-04-06

ArXiv (preprint)

A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches

Vincent Dumoulin

Neil Houlsby

Utku Evci

Xiaohua Zhai

Sylvain Gelly

Meta and transfer learning are two successful families of approaches to few-shot 1 learning. Despite highly related goals, state-of-the-art … (see more)advances in each family are 2 measured largely in isolation of each other. As a result of diverging evaluation 3 norms, a direct or thorough comparison of different approaches is challenging. 4 To bridge this gap, we introduce a few-shot classiﬁcation evaluation protocol 5 named VTAB+MD with the explicit goal of facilitating sharing of insights from 6 each community. We demonstrate its accessibility in practice by performing a 7 cross-family study of the best transfer and meta learners which report on both a 8 large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning 9 benchmark (Visual Task Adaptation Benchmark, VTAB). We ﬁnd that, on average, 10 large-scale transfer methods (Big Transfer, BiT) outperform competing approaches 11 on MD, even when trained only on ImageNet. In contrast, meta-learning approaches 12 struggle to compete on VTAB when trained and validated on MD. However, BiT 13 is not without limitations, and pushing for scale does not improve performance 14 on highly out-of-distribution MD tasks. We hope that this work contributes to 15 accelerating progress on few-shot learning research. 16

2021-01-01

NeurIPS Datasets and Benchmarks (published)

openreview.net

An Effective Anti-Aliasing Approach for Residual Networks

Cristina Vasconcelos

Vincent Dumoulin

Nicolas Le Roux

Image pre-processing in the frequency domain has traditionally played a vital role in computer vision and was even part of the standard pipe… (see more)line in the early days of deep learning. However, with the advent of large datasets, many practitioners concluded that this was unnecessary due to the belief that these priors can be learned from the data itself. Frequency aliasing is a phenomenon that may occur when sub-sampling any signal, such as an image or feature map, causing distortion in the sub-sampled output. We show that we can mitigate this effect by placing non-trainable blur filters and using smooth activation functions at key locations, particularly where networks lack the capacity to learn them. These simple architectural changes lead to substantial improvements in out-of-distribution generalization on both image classification under natural corruptions on ImageNet-C [10] and few-shot learning on Meta-Dataset [17], without introducing additional trainable parameters and using the default hyper-parameters of open source codebases.

2020-11-20

ArXiv (preprint)