Pietro Mazzaglia

Representing Positional Information in Generative World Models for Object Manipulation

Stefano Ferraro

Tim Verbelen

Bart Dhoedt

Sai Rajeswar

Object manipulation capabilities are essential skills that set apart embodied agents engaging with the world, especially in the realm of rob… (voir plus)otics. The ability to predict outcomes of interactions with objects is paramount in this setting. While model-based control methods have started to be employed for tackling manipulation tasks, they have faced challenges in accurately manipulating objects. As we analyze the causes of this limitation, we identify the cause of underperformance in the way current world models represent crucial positional information, especially about the target's goal specification for object positioning tasks. We introduce a general approach that empowers world model-based agents to effectively solve object-positioning tasks. We propose two declinations of this approach for generative world models: position-conditioned (PCP) and latent-conditioned (LCP) policy learning. In particular, LCP employs object-centric latent representations that explicitly capture object positional information for goal specification. This naturally leads to the emergence of multimodal capabilities, enabling the specification of goals through spatial coordinates or a visual goal. Our methods are rigorously evaluated across several manipulation environments, showing favorable performance compared to current model-based control approaches.

2024-09-18

ArXiv (prépublication)

doi.org

arxiv.org

Multimodal foundation world models for generalist embodied agents

Tim Verbelen

Bart Dhoedt

Sai Rajeswar

2024-01-01

NeurIPS (publié)

doi.org

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

Sai Rajeswar

Tim Verbelen

Alexandre Piché

Bart Dhoedt

Alexandre Lacoste

2023-04-24

ICML.cc/2023/Conference (publié)

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

Sai Rajeswar

Tim Verbelen

Alexandre Piché

Bart Dhoedt

Alexandre Lacoste

Controlling artificial agents from visual sensory data is an arduous task. Reinforcement learning (RL) algorithms can succeed but require la… (voir plus)rge amounts of interactions between the agent and the environment. To alleviate the issue, unsupervised RL proposes to employ self-supervised interaction and learning, for adapting faster to future tasks. Yet, as shown in the Unsupervised RL Benchmark (URLB; Laskin et al. 2021), whether current unsupervised strategies can improve generalization capabilities is still unclear, especially in visual control settings. In this work, we study the URLB and propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent, and a task-aware fine-tuning strategy combined with a new proposed hybrid planner, Dyna-MPC, to adapt the agent for downstream tasks. On URLB, our method obtains 93.59% overall normalized performance, surpassing previous baselines by a staggering margin. The approach is empirically evaluated through a large-scale empirical study, which we use to validate our design choices and analyze our models. We also show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation. Project website: https://masteringurlb.github.io/

2023-04-24

ICML.cc/2023/Conference (poster)

proceedings.mlr.press

Unsupervised Model-based Pre-training for Data-efficient Reinforcement Learning from Pixels

Sai Rajeswar

Tim Verbelen

Alexandre Piché

Bart Dhoedt

Alexandre Lacoste

Reinforcement learning (RL) aims at autonomously performing complex tasks. To this end, a reward signal is used to steer the learning proces… (voir plus)s. While successful in many circumstances, the approach is typically data hungry, requiring large amounts of task-specific interaction between agent and environment to learn efficient behaviors. To alleviate this, unsupervised RL proposes to collect data through self-supervised interaction to accelerate task-specific adaptation. However, whether current unsupervised strategies lead to improved generalization capabilities is still unclear, more so when the input observations are high-dimensional. In this work, we advance the field by closing the performance gap in the Unsupervised RL Benchmark, a collection of tasks to be solved in a data-efficient manner, after interacting with the environment in a self-supervised way. Our approach uses unsupervised exploration for collecting experience to pre-train a world model. Then, when fine-tuning for downstream tasks, the agent leverages the learned model and a hybrid planner to efficiently adapt for the given tasks, achieving comparable results to task-specific base-lines, while using 20x less data. We extensively evaluate our work, comparing several exploration methods and improving the fine-tuning process by studying the interactions between the learned components. Furthermore, we investigate the limitations of the pre-trained agent, gaining insights into how these influence the decision process and shedding light on new research directions.

2022-06-14

ICML.cc/2022/Workshop/DARL (accepté)