Publications

Bridging State and History Representations: Understanding Self-Predictive RL

Benjamin Eysenbach

Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially obse… (voir plus)rvable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared properties among them remain unclear. In this paper, we show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction. Furthermore, we provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations. These findings together yield a minimalist algorithm to learn self-predictive representations for states and histories. We validate our theories by applying our algorithm to standard MDPs, MDPs with distractors, and POMDPs with sparse rewards. These findings culminate in a set of preliminary guidelines for RL practitioners.

2024-01-15

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Closing the Gap between TD Learning and Supervised Learning - A Generalisation Point of View

Raj Ghugare

Matthieu Geist

Glen Berseth

Benjamin Eysenbach

Some reinforcement learning (RL) algorithms can stitch pieces of experience to solve a task never seen before during training. This oft-soug… (voir plus)ht property is one of the few ways in which RL methods based on dynamic-programming differ from RL methods based on supervised-learning (SL). Yet, certain RL methods based on off-the-shelf SL algorithms achieve excellent results without an explicit mechanism for stitching; it remains unclear whether those methods forgo this important stitching property. This paper studies this question for the problems of achieving a target goal state and achieving a target return value. Our main result is to show that the stitching property corresponds to a form of combinatorial generalization: after training on a distribution of (state, goal) pairs, one would like to evaluate on (state, goal) pairs not seen together in the training data. Our analysis shows that this sort of generalization is different from i.i.d. generalization. This connection between stitching and generalisation reveals why we should not expect SL-based RL methods to perform stitching, even in the limit of large datasets and models. Based on this analysis, we construct new datasets to explicitly test for this property, revealing that SL-based methods lack this stitching property and hence fail to perform combinatorial generalization. Nonetheless, the connection between stitching and combinatorial generalisation also suggests a simple remedy for improving generalisation in SL: data augmentation. We propose a temporal data augmentation and demonstrate that adding it to SL-based methods enables them to successfully complete tasks not seen together during training. On a high level, this connection illustrates the importance of combinatorial generalization for data efficiency in time-series data beyond tasks beyond RL, like audio, video, or text.

2024-01-15

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

Mingde Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstracti… (voir plus)ons to generalize better in novel situations. It automatically decomposes the given task into smaller, more manageable subtasks, and thus enables sparse decision-making and focused computation on the relevant parts of the environment. The decomposition relies on the extraction of an abstracted proxy problem represented as a directed graph, in which vertices and edges are learned end-to-end from hindsight. Our theoretical analyses provide performance guarantees under appropriate assumptions and establish where our approach is expected to be helpful. Generalization-focused experiments validate Skipper's significant advantage in zero-shot generalization, compared to some existing state-of-the-art hierarchical planning methods.

2024-01-15

International Conference on Learning Representations (poster)

doi.org

openreview.net

Course Correcting Koopman Representations

Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space. Theor… (voir plus)etically, such features can be used to simplify many problems in modeling and control of NLDS. In this work we study autoencoder formulations of this problem, and different ways they can be used to model dynamics, specifically for future state prediction over long horizons. We discover several limitations of predicting future states in the latent space and propose an inference-time mechanism, which we refer to as Periodic Reencoding, for faithfully capturing long term dynamics. We justify this method both analytically and empirically via experiments in low and high dimensional NLDS.

2024-01-15

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Cycle Consistency Driven Object Discovery

Aniket Didolkar

Anirudh Goyal

Yoshua Bengio

Developing deep learning models that effectively learn object-centric representations, akin to human cognition, remains a challenging task. … (voir plus)Existing approaches facilitate object discovery by representing objects as fixed-size vectors, called ``slots'' or ``object files''. While these approaches have shown promise in certain scenarios, they still exhibit certain limitations. First, they rely on architectural priors which can be unreliable and usually require meticulous engineering to identify the correct objects. Second, there has been a notable gap in investigating the practical utility of these representations in downstream tasks. To address the first limitation, we introduce a method that explicitly optimizes the constraint that each object in a scene should be associated with a distinct slot. We formalize this constraint by introducing consistency objectives which are cyclic in nature. By integrating these consistency objectives into various existing slot-based object-centric methods, we showcase substantial improvements in object-discovery performance. These enhancements consistently hold true across both synthetic and real-world scenes, underscoring the effectiveness and adaptability of the proposed approach. To tackle the second limitation, we apply the learned object-centric representations from the proposed method to two downstream reinforcement learning tasks, demonstrating considerable performance enhancements compared to conventional slot-based and monolithic representation learning methods. Our results suggest that the proposed approach not only improves object discovery, but also provides richer features for downstream tasks.

2024-01-15

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

A Data-Driven Measure of Relative Uncertainty for Misclassification Detection

Eduardo Dadalto Câmara Gomes

Marco Romanelli

Georg Pichler

Pablo Piantanida

2024-01-15

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Decoupling regularization from the action space

Sobhan Mohammadpour

Emma Frejinger

Pierre-Luc Bacon

Regularized reinforcement learning (RL), particularly the entropy-regularized kind, has gained traction in optimal control and inverse RL. W… (voir plus)hile standard unregularized RL methods remain unaffected by changes in the number of actions, we show that it can severely impact their regularized counterparts. This paper demonstrates the importance of decoupling the regularizer from the action space: that is, to maintain a consistent level of regularization regardless of how many actions are involved to avoid over-regularization. Whereas the problem can be avoided by introducing a task-specific temperature parameter, it is often undesirable and cannot solve the problem when action spaces are state-dependent. In the state-dependent action context, different states with varying action spaces are regularized inconsistently. We introduce two solutions: a static temperature selection approach and a dynamic counterpart, universally applicable where this problem arises. Implementing these changes improves performance on the DeepMind control suite in static and dynamic temperature regimes and a biological design task.

2024-01-15

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Delta-AI: Local Objectives for Amortized Inference in Sparse Graphical Models

Jean-Pierre Falet

Hae Beom Lee

Nikolay Malkin

Chen Sun

We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs), which we call …

2024-01-15

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Diffusion Generative Flow Samplers: Improving Learning Signals Through Partial Trajectory Optimization

Dinghuai Zhang

Ricky T. Q. Chen

Cheng-Hao Liu

Aaron Courville

Yoshua Bengio

We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine lear… (voir plus)ning and statistics. We extend recent sampling-based approaches that leverage controlled stochastic processes to model approximate samples from these target densities. The main drawback of these approaches is that the training objective requires full trajectories to compute, resulting in sluggish credit assignment issues due to use of entire trajectories and a learning signal present only at the terminal time. In this work, we present Diffusion Generative Flow Samplers (DGFS), a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments, via parameterizing an additional "flow function". Our method takes inspiration from the theory developed for generative flow networks (GFlowNets), allowing us to make use of intermediate learning signals. Through various challenging experiments, we demonstrate that DGFS achieves more accurate estimates of the normalization constant than closely-related prior methods.

2024-01-15

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Efficient Dynamics Modeling in Interactive Environments with Koopman Theory

Arnab Kumar Mondal

Siba Smarak Panigrahi

Sai Rajeswar

Kaleem Siddiqi

Siamak Ravanbakhsh

The accurate modeling of dynamics in interactive environments is critical for successful long-range prediction. Such a capability could adva… (voir plus)nce Reinforcement Learning (RL) and Planning algorithms, but achieving it is challenging. Inaccuracies in model estimates can compound, resulting in increased errors over long horizons. We approach this problem from the lens of Koopman theory, where the nonlinear dynamics of the environment can be linearized in a high-dimensional latent space. This allows us to efficiently parallelize the sequential problem of long-range prediction using convolution while accounting for the agent's action at every time step. Our approach also enables stability analysis and better control over gradients through time. Taken together, these advantages result in significant improvement over the existing approaches, both in the efficiency and the accuracy of modeling dynamics over extended horizons. We also show that this model can be easily incorporated into dynamics modeling for model-based planning and model-free RL and report promising experimental results.

2024-01-15

International Conference on Learning Representations (poster)

doi.org

openreview.net

Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation

Divyat Mahajan

Ioannis Mitliagkas

Brady Neal

Vasilis Syrgkanis

2024-01-15

ICLR.cc/2024/Conference (spotlight)

openreview.net