Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation
Sébastien Lachapelle
Divyat Mahajan
We tackle the problems of latent variables identification and ``out-of-support'' image generation in representation learning. We show that b… (voir plus)oth are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.
Generative Flow Networks: a Markov Chain Perspective
Tristan Deleu
Better Training of GFlowNets with Local Credit and Incomplete Trajectories
Ling Pan
Nikolay Malkin
Dinghuai Zhang
Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov chain methods (as they sample from a distribution specified by an en… (voir plus)ergy function), reinforcement learning (as they learn a policy to sample composed objects through a sequence of steps), generative models (as they learn to represent and sample from a distribution) and amortized variational methods (as they can be used to learn to approximate and sample from an otherwise intractable posterior, given a prior and a likelihood). They are trained to generate an object
Bidirectional Learning for Offline Model-based Biological Sequence Design
Can Chen
Yingxue Zhang
Bigger, Better, Faster: Human-level Atari with human-level efficiency
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on sca… (voir plus)ling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
Bootstrapped Representations in Reinforcement Learning
Charline Le Lan
Stephen Tu
Mark Rowland
Anna Harutyunyan
Will Dabney
In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of… (voir plus) deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated into the learning process and help shape the learnt state representation. Bootstrapping methods are today's method of choice to make these additional predictions. Yet, it is unclear which features these algorithms capture and how they relate to those from other auxiliary-task-based approaches. In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988). Surprisingly, we find that this representation differs from the features learned by Monte Carlo and residual gradient algorithms for most transition structures of the environment in the policy evaluation setting. We describe the efficacy of these representations for policy evaluation, and use our theoretical analysis to design new auxiliary learning rules. We complement our theoretical results with an empirical comparison of these learning rules for different cumulant functions on classic domains such as the four-room domain (Sutton et al, 1999) and Mountain Car (Moore, 1990).
Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?
Boris Knyazev
DOHA HWANG
Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity
Eduard Gorbunov
Adrien Taylor
Samuel Horváth
CrossSplit: Mitigating Label Noise Memorization through Data Splitting
Jihye Kim
Aristide Baratin
Yan Zhang
We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label cor… (voir plus)rection and co-teaching methods, we propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit, which uses a pair of neural networks trained on two disjoint parts of the labeled dataset. CrossSplit combines two main ingredients: (i) Cross-split label correction. The idea is that, since the model trained on one part of the data cannot memorize example-label pairs from the other part, the training labels presented to each network can be smoothly adjusted by using the predictions of its peer network; (ii) Cross-split semi-supervised training. A network trained on one part of the data also uses the unlabeled inputs of the other part. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that our method can outperform the current state-of-the-art in a wide range of noise ratios. The project page is at https://rlawlgul.github.io/.
Discovering Object-Centric Generalized Value Functions From Pixels
Somjit Nath
Gopeshh Subbaraj
Deep Reinforcement Learning has shown significant progress in extracting useful representations from high-dimensional inputs albeit using ha… (voir plus)nd-crafted auxiliary tasks and pseudo rewards. Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover meaningful features from objects, translating them to temporally coherent"question"functions and leveraging the subsequent learned general value functions for control. We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation.
Discrete Key-Value Bottleneck
Frederik Träuble
Anirudh Goyal
Nasim Rahaman
Michael Curtis Mozer
Kenji Kawaguchi
Bernhard Schölkopf
Equivariance with Learned Canonicalization Functions
Sékou-Oumar Kaba
Arnab Kumar Mondal
Yan Zhang