Portrait de Bogdan Mazoure

Bogdan Mazoure

Membre affilié
Chercheur scientifique, Apple
Sujets de recherche
Apprentissage par renforcement
Grands modèles de langage (LLM)
Modèles de diffusion
Modèles génératifs

Publications

On the benefits of pixel-based hierarchical policies for task generalization
T. Cristea-Platon
Josh Susskind
Walter Talbott
Reinforcement learning practitioners often avoid hierarchical policies, especially in image-based observation spaces. Typically, the single-… (voir plus)task performance improvement over flat-policy counterparts does not justify the additional complexity associated with implementing a hierarchy. However, by introducing multiple decision-making levels, hierarchical policies can compose lower-level policies to more effectively generalize between tasks, highlighting the need for multi-task evaluations. We analyze the benefits of hierarchy through simulated multi-task robotic control experiments from pixels. Our results show that hierarchical policies trained with task conditioning can (1) increase performance on training tasks, (2) lead to improved reward and state-space generalizations in similar tasks, and (3) decrease the complexity of fine tuning required to solve novel tasks. Thus, we believe that hierarchical policies should be considered when building reinforcement learning architectures capable of generalizing between tasks.
Grounding Multimodal Large Language Models in Actions
Andrew Szot
Harsh Agrawal
Zsolt Kira
Alexander T Toshev
Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains, including Embodied AI. In this … (voir plus)work, we study how to best ground a MLLM into different embodiments and their associated action spaces, with the goal of leveraging the multimodal world knowledge of the MLLM. We first generalize a number of methods through a unified architecture and the lens of action space adaptors. For continuous actions, we show that a learned tokenization allows for sufficient modeling precision, yielding the best performance on downstream tasks. For discrete actions, we demonstrate that semantically aligning these actions with the native output token space of the MLLM leads to the strongest performance. We arrive at these lessons via a thorough study of seven action space adapters on five different environments, encompassing over 114 embodied tasks.
Grounding Multimodal Large Language Models in Actions
Andrew Szot
Harsh Agrawal
Zsolt Kira
Alexander T Toshev
Generative Models for Decision Making
Lisa Lee
Roberta Raileanu
Yilun Du
Walter Talbott
Katherine Metcalf
Alexander T Toshev
Generative Artificial Intelligence (AI) has made significant advancements in recent years, particularly with the development of large langua… (voir plus)ge and diffusion models. These generative models have demonstrated impressive capabilities in various tasks, such as text generation and image and audio synthesis. Concurrently, Reinforcement Learning (RL) has made significant strides in solving complex sequential decision-making problems with the help of external knowledge sources . However, there remains untapped potential in combining generative models with RL algorithms to tackle real-world challenges, particularly to improve sample efficiency of tabula rasa training by introducing priors from related domains such as visual question-answering, image captioning and image generation. This workshop aims to bring together researchers and practitioners from the fields of generative AI and reinforcement learning to explore the latest advances, methodologies, and applications. By fostering collaborations between these two domains, we intend to unlock new opportunities for addressing complex problems that lie at the intersection of both fields.
Large Language Models as Generalizable Policies for Embodied Tasks
Andrew Szot
Harsh Agrawal
Walter Talbott
Rin Metcalf
Natalie Mackraz
Alexander T Toshev
Accelerating exploration and representation learning with offline pre-training
Jake Bruce
Rob Fergus
Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement lea… (voir plus)rning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned from offline data. In this work, we follow the hypothesis that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward separately from a single collection of human demonstrations can significantly improve the sample efficiency on the challenging NetHack benchmark. We also ablate various components of our experimental setting and highlight crucial insights.
Value function estimation using conditional diffusion models for control
Walter Talbott
Miguel Ángel Bautista
Alexander T Toshev
Joshua M. Susskind
Recurrent Real-valued Neural Autoregressive Density Estimator for Online Density Estimation and Classification of Streaming Data
In contrast with the traditional offline learning, where complete data accessibility is assumed, many modern applications involve processing… (voir plus) data in a streaming fashion. This online learning setting raises various challenges, including concept drift, hardware memory constraints, etc. In this paper, we propose the Recurrent Real-valued Neural Autoregressive Density Estimator (RRNADE), a flexible density-based model for online classification and density estimation. RRNADE combines a neural Gaussian mixture density module with a recurrent module. This combination allows RRNADE to exploit possible sequential correlations in the streaming task, which are often ignored in the classical streaming setting where each input is assumed to be independent from the previous ones. We showcase the ability of RRNADE to adapt to concept drifts on synthetic density estimation tasks. We also apply RRNADE to online classification tasks on both real world and synthetic datasets and compare it with multiple density based as well as nondensity based online classification methods. In almost all of these tasks, RRNADE outperforms the other methods. Lastly, we conduct an ablation study demonstrating the complementary benefits of the density and the recurrent modules.
Low-Rank Representation of Reinforcement Learning Policies
We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional… (voir plus) embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability and convergence guarantees. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly represented in a low-dimensional space while the embedded policy incurs almost no decrease in returns.
Sequential Density Estimation via NCWFAs Sequential Density Estimation via Nonlinear Continuous Weighted Finite Automata
Weighted finite automata (WFAs) have been widely applied in many fields. One of the classic problems for WFAs is probability distribution es… (voir plus)timation over sequences of discrete symbols. Although WFAs have been extended to deal with continuous input data, namely continuous WFAs (CWFAs), it is still unclear how to approximate density functions over sequences of continuous random variables using WFA-based models, due to the limitation on the expressiveness of the model as well as the tractability of approximating density functions via CWFAs. In this paper, we propose a nonlinear extension to the CWFA model to first improve its expressiveness, we refer to it as the nonlinear continuous WFAs (NCWFAs). Then we leverage the so-called RNADE method, which is a well-known density estimator based on neural networks, and propose the RNADE-NCWFA model. The RNADE-NCWFA model computes a density function by design. We show that this model is strictly more expressive than the Gaussian HMM model, which CWFA cannot approximate. Empirically, we conduct a synthetic experiment using Gaussian HMM generated data. We focus on evaluating the model's ability to estimate densities for sequences of varying lengths (longer length than the training data). We observe that our model performs the best among the compared baseline methods.
Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Offline RL
Paul Mineiro
Pavithra Srinath
Reza Sharifi Sedeh
Adith Swaminathan
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their lo… (voir plus)ng-term utility. Optimizing a long-term metric is challenging because the learning signal (whether the recommendations achieved their desired goals) is delayed and confounded by other user interactions with the system. Immediately measurable proxies such as clicks can lead to suboptimal recommendations due to misalignment with the long-term metric. Many works have applied episodic reinforcement learning (RL) techniques for session-based recommendation but these methods do not account for policy-induced drift in user intent across sessions. We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions. By varying the horizon hyper-parameter in SHPI, we recover well-known policy improvement schemes in the RL literature. Empirical results on four recommendation tasks show that SHPI can outperform matrix factorization, offline bandits, and offline RL baselines. We also provide a stable and computationally efficient implementation using weighted regression oracles.
A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix
Thang Doan
Mehdi Abbana Bennani
Pierre Alquier