Bogdan Mazoure

ClustRecNet: A Novel End-to-End Deep Learning Framework for Clustering Algorithm Recommendation

Mohammadreza Bakhtyari

Renato Cordeiro De Amorim

Guillaume Rabusseau

Vladimir Makarenkov

2025-09-28

ArXiv (prépublication)

Scaling Synthetic Task Generation for Agents via Exploration

Ram Ramrakhya

Andrew Szot

Omar Attia

Yuhao Yang

Anh Nguyen

Zhe Gan

Harsh Agrawal

Alexander T Toshev

Post-Training Multimodal Large Language Models (MLLMs) to build interactive agents holds promise across domains such as computer-use, web na… (voir plus)vigation, and robotics. A key challenge in scaling such post-training is lack of high-quality downstream agentic task datasets with tasks that are diverse, feasible, and verifiable. Existing approaches for task generation rely heavily on human annotation or prompting MLLM with limited downstream environment information, which is either costly or poorly scalable as it yield tasks with limited coverage. To remedy this, we present AutoPlay, a scalable pipeline for task generation that explicitly explores interactive environments to discover possible interactions and current state information to synthesize environment-grounded tasks. AutoPlay operates in two stages: (i) an exploration phase, where an MLLM explorer agent systematically uncovers novel environment states and functionalities, and (ii) a task generation phase, where a task generator leverages exploration trajectories and a set of task guideline prompts as context to synthesize diverse, executable, and verifiable tasks. We show AutoPlay generates 20k tasks across 20 Android applications and 10k tasks across 13 applications Ubuntu applications to train mobile-use and computer-use agents. AutoPlay generated tasks enable large-scale task demonstration synthesis without human annotation by employing an MLLM task executor and verifier. This data enables training MLLM-based UI agents that improve success rates up to

2025-09-28

ArXiv (prépublication)

On the Modeling Capabilities of Large Language Models for Sequential Decision Making

Martin Klissarov

R Devon Hjelm

Alexander T Toshev

2025-01-21

ICLR.cc/2025/Conference (poster)

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

Andrew Szot

Omar Attia

Aleksei Timofeev

Harsh Agrawal

R Devon Hjelm

Zhe Gan

Zsolt Kira

Alexander T Toshev

We examine the capability of Multimodal Large Language Models (MLLMs) to tackle diverse domains that extend beyond the traditional language … (voir plus)and vision tasks these models are typically trained on. Specifically, our focus lies in areas such as Embodied AI, Games, UI Control, and Planning. To this end, we introduce a process of adapting an MLLM to a Generalist Embodied Agent (GEA). GEA is a single unified model capable of grounding itself across these varied domains through a multi-embodiment action tokenizer. GEA is trained with supervised learning on a large dataset of embodied experiences and with online RL in interactive simulators. We explore the data and algorithmic choices necessary to develop such a model. Our findings reveal the importance of training with cross-domain data and online RL for building generalist agents. The final GEA model achieves strong generalization performance to unseen tasks across diverse benchmarks compared to other generalist models and benchmark-specific approaches.

2024-12-31

Computer Vision and Pattern Recognition (publié)

Grounding Multimodal Large Language Models in Actions

Andrew Szot

Harsh Agrawal

R Devon Hjelm

Zsolt Kira

Alexander T Toshev

Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains including Embodied AI. In this w… (voir plus)ork, we study how to best ground a MLLM into different embodiments and their associated action spaces, including both continuous and discrete actions. For continuous actions, a set of learned tokenizations that capture an action at various resolutions allows for sufficient modeling precision, yielding the best performance on downstream tasks. For discrete actions, semantically aligning these actions with the native output token space of the MLLM leads to the strongest performance. We arrive at these lessons via a thorough study of seven action grounding approaches on five different environments, encompassing over 114 embodied tasks.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

On the benefits of pixel-based hierarchical policies for task generalization

T. Cristea-Platon

Josh Susskind

Walter Talbott

Reinforcement learning practitioners often avoid hierarchical policies, especially in image-based observation spaces. Typically, the single-… (voir plus)task performance improvement over flat-policy counterparts does not justify the additional complexity associated with implementing a hierarchy. However, by introducing multiple decision-making levels, hierarchical policies can compose lower-level policies to more effectively generalize between tasks, highlighting the need for multi-task evaluations. We analyze the benefits of hierarchy through simulated multi-task robotic control experiments from pixels. Our results show that hierarchical policies trained with task conditioning can (1) increase performance on training tasks, (2) lead to improved reward and state-space generalizations in similar tasks, and (3) decrease the complexity of fine tuning required to solve novel tasks. Thus, we believe that hierarchical policies should be considered when building reinforcement learning architectures capable of generalizing between tasks.

2024-07-26

ArXiv (prépublication)

Generative Models for Decision Making

Lisa Lee

Roberta Raileanu

Yilun Du

Walter Talbott

Katherine Metcalf

R Devon Hjelm

Alexander T Toshev

Generative Artificial Intelligence (AI) has made significant advancements in recent years, particularly with the development of large langua… (voir plus)ge and diffusion models. These generative models have demonstrated impressive capabilities in various tasks, such as text generation and image and audio synthesis. Concurrently, Reinforcement Learning (RL) has made significant strides in solving complex sequential decision-making problems with the help of external knowledge sources . However, there remains untapped potential in combining generative models with RL algorithms to tackle real-world challenges, particularly to improve sample efficiency of tabula rasa training by introducing priors from related domains such as visual question-answering, image captioning and image generation. This workshop aims to bring together researchers and practitioners from the fields of generative AI and reinforcement learning to explore the latest advances, methodologies, and applications. By fostering collaborations between these two domains, we intend to unlock new opportunities for addressing complex problems that lie at the intersection of both fields.

2024-03-07

ICLR.cc/2024/Workshop_Proposals (publié)

Large Language Models as Generalizable Policies for Embodied Tasks

Andrew Szot

Max Schwarzer

Harsh Agrawal

Walter Talbott

Rin Metcalf

Natalie Mackraz

R Devon Hjelm

Alexander T Toshev

2024-01-15

ICLR.cc/2024/Conference (poster)

Accelerating exploration and representation learning with offline pre-training

Jacob Bruce

Doina Precup

Rob Fergus

Ankit Anand

Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement lea… (voir plus)rning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned from offline data. In this work, we follow the hypothesis that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward separately from a single collection of human demonstrations can significantly improve the sample efficiency on the challenging NetHack benchmark. We also ablate various components of our experimental setting and highlight crucial insights.

2023-06-19

ICML.cc/2023/Workshop/ILHF (accepté)

Value function estimation using conditional diffusion models for control

Walter Talbott

Miguel Ángel Bautista

R Devon Hjelm

Alexander T Toshev

Joshua M. Susskind

2023-06-08

ArXiv (prépublication)

Sequential Density Estimation via NCWFAs Sequential Density Estimation via Nonlinear Continuous Weighted Finite Automata

Tianyu Li

Guillaume Rabusseau

Weighted finite automata (WFAs) have been widely applied in many fields. One of the classic problems for WFAs is probability distribution es… (voir plus)timation over sequences of discrete symbols. Although WFAs have been extended to deal with continuous input data, namely continuous WFAs (CWFAs), it is still unclear how to approximate density functions over sequences of continuous random variables using WFA-based models, due to the limitation on the expressiveness of the model as well as the tractability of approximating density functions via CWFAs. In this paper, we propose a nonlinear extension to the CWFA model to first improve its expressiveness, we refer to it as the nonlinear continuous WFAs (NCWFAs). Then we leverage the so-called RNADE method, which is a well-known density estimator based on neural networks, and propose the RNADE-NCWFA model. The RNADE-NCWFA model computes a density function by design. We show that this model is strictly more expressive than the Gaussian HMM model, which CWFA cannot approximate. Empirically, we conduct a synthetic experiment using Gaussian HMM generated data. We focus on evaluating the model's ability to estimate densities for sequences of varying lengths (longer length than the training data). We observe that our model performs the best among the compared baseline methods.

2022-06-07

ArXiv (prépublication)