Publications

Scaling Laws Do Not Scale

Michael Madaio

Recent work has proposed a power law relationship, referred to as ``scaling laws,'' between the performance of artificial intelligence (AI) … (voir plus)models and aspects of those models' design (e.g., dataset size). In other words, as the size of a dataset (or model parameters, etc) increases, the performance of a given model trained on that dataset will correspondingly increase. However, while compelling in the aggregate, this scaling law relationship overlooks the ways that metrics used to measure performance may be precarious and contested, or may not correspond with how different groups of people may perceive the quality of models' output. In this paper, we argue that as the size of datasets used to train large AI models grows, the number of distinct communities (including demographic groups) whose data is included in a given dataset is likely to grow, each of whom may have different values. As a result, there is an increased risk that communities represented in a dataset may have values or preferences not captured by (or in the worst case, at odds with) the metrics used to evaluate model performance for scaling laws. We end the paper with implications for AI scaling laws -- that models may not, in fact, continue to improve as the datasets get larger -- at least not for all people or communities impacted by those models.

2023-07-05

ArXiv (prépublication)

doi.org

arxiv.org

Generative Flow Networks: a Markov Chain Perspective

Tristan Deleu

Yoshua Bengio

2023-07-04

ArXiv (prépublication)

doi.org

arxiv.org

Better Training of GFlowNets with Local Credit and Incomplete Trajectories

Ling Pan

Nikolay Malkin

Dinghuai Zhang

Yoshua Bengio

Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov chain methods (as they sample from a distribution specified by an en… (voir plus)ergy function), reinforcement learning (as they learn a policy to sample composed objects through a sequence of steps), generative models (as they learn to represent and sample from a distribution) and amortized variational methods (as they can be used to learn to approximate and sample from an otherwise intractable posterior, given a prior and a likelihood). They are trained to generate an object

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (publié)

doi.org

openreview.net

Bidirectional Learning for Offline Model-based Biological Sequence Design

Can Chen

Yingxue Zhang

Xue (Steve) Liu

Mark Coates

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (publié)

doi.org

openreview.net

Bigger, Better, Faster: Human-level Atari with human-level efficiency

Max Schwarzer

Johan Samir Obando Ceron

Aaron Courville

Marc Gendron-Bellemare

Rishabh Agarwal

Pablo Samuel Castro

We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on sca… (voir plus)ling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (publié)

doi.org

openreview.net

Bootstrapped Representations in Reinforcement Learning

Charline Le Lan

Stephen Tu

Mark Rowland

Anna Harutyunyan

Rishabh Agarwal

Marc Gendron-Bellemare

Will Dabney

In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of… (voir plus) deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated into the learning process and help shape the learnt state representation. Bootstrapping methods are today's method of choice to make these additional predictions. Yet, it is unclear which features these algorithms capture and how they relate to those from other auxiliary-task-based approaches. In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988). Surprisingly, we find that this representation differs from the features learned by Monte Carlo and residual gradient algorithms for most transition structures of the environment in the policy evaluation setting. We describe the efficacy of these representations for policy evaluation, and use our theoretical analysis to design new auxiliary learning rules. We complement our theoretical results with an empirical comparison of these learning rules for different cumulant functions on classic domains such as the four-room domain (Sutton et al, 1999) and Mountain Car (Moore, 1990).

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (publié)

doi.org

openreview.net

Can Forward Gradient Match Backpropagation?

Louis Fournier

Stephane Rivaud

Eugene Belilovsky

Michael Eickenberg

Edouard Oyallon

Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable fo… (voir plus)r neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While current solutions rely on weighted averages over isotropic guess vector distributions, we propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks. For a standard computer vision neural network, we conduct a rigorous study systematically covering a variety of combinations of gradient targets and gradient guesses, including those previously presented in the literature. We find that using gradients obtained from a local loss as a candidate direction drastically improves on random noise in Forward Gradient methods.

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (publié)

doi.org

openreview.net

Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Boris Knyazev

DOHA HWANG

Simon Lacoste-Julien

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (publié)

doi.org

openreview.net

Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity

Eduard Gorbunov

Adrien Taylor

Samuel Horváth

Gauthier Gidel

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (publié)

proceedings.mlr.press

openreview.net

CrossSplit: Mitigating Label Noise Memorization through Data Splitting

Jihye Kim

Aristide Baratin

Yan Zhang

Simon Lacoste-Julien

We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label cor… (voir plus)rection and co-teaching methods, we propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit, which uses a pair of neural networks trained on two disjoint parts of the labeled dataset. CrossSplit combines two main ingredients: (i) Cross-split label correction. The idea is that, since the model trained on one part of the data cannot memorize example-label pairs from the other part, the training labels presented to each network can be smoothly adjusted by using the predictions of its peer network; (ii) Cross-split semi-supervised training. A network trained on one part of the data also uses the unlabeled inputs of the other part. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that our method can outperform the current state-of-the-art in a wide range of noise ratios. The project page is at https://rlawlgul.github.io/.

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (publié)

doi.org

openreview.net

Discovering Object-Centric Generalized Value Functions From Pixels

Somjit Nath

Gopeshh Raaj Subbaraj

Khimya Khetarpal

Samira Ebrahimi Kahou

Deep Reinforcement Learning has shown significant progress in extracting useful representations from high-dimensional inputs albeit using ha… (voir plus)nd-crafted auxiliary tasks and pseudo rewards. Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover meaningful features from objects, translating them to temporally coherent"question"functions and leveraging the subsequent learned general value functions for control. We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation.

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (publié)

doi.org

openreview.net

Discrete Key-Value Bottleneck

Frederik Träuble

Anirudh Goyal

Nasim Rahaman

Michael Curtis Mozer

Kenji Kawaguchi

Yoshua Bengio

Bernhard Schölkopf