Marc-Alexandre Côté

Marc-Alexandre Cot'e

Xingdi Yuan

2025-10-22

ArXiv (preprint)

Learning to Solve Complex Problems via Dataset Decomposition

Wanru Zhao

Lucas Caccia

Zhengyan Shi

Minseon Kim

Xingdi Yuan

Weijia Xu

Curriculum learning is a class of training strategies that organizes the data being exposed to a model by difficulty, gradually from simpler… (see more) to more complex examples. This research explores a reverse curriculum generation approach that recursively decomposes complex datasets into simpler, more learnable components. We propose a teacher-student framework where the teacher is equipped with the ability to reason step-by-step, which is used to recursively generate easier versions of examples, enabling the student model to progressively master difficult tasks. We propose a novel scoring system to measure data difficulty based on its structural complexity and conceptual depth, allowing curriculum construction over decomposed data. Experiments on math datasets (MATH and AIME) demonstrate that models trained with curricula generated by our approach exhibit superior performance compared to standard training on original datasets.

2025-07-09

ICML.cc/2025/Workshop/AI4MATH (poster)

debug-gym: A Text-Based Environment for Interactive Debugging

Xingdi Yuan

Morgane M Moss

Charbel Feghali

Chinmay Singh

Darya Moldavskaya

Drew MacPhee

Lucas Caccia

Matheus Pereira

Minseon Kim

2025-03-27

ArXiv (preprint)

debug-gym: A Text-Based Environment for Interactive Debugging

Xingdi Yuan

Morgane M Moss

Charbel Feghali

Chinmay Singh

Darya Moldavskaya

Drew MacPhee

Lucas Caccia

Matheus Pereira

Minseon Kim

2025-03-27

ArXiv (preprint)

Sub-goal Distillation: A Method to Improve Small Language Agents

Maryam Hashemzadeh

Elias Stengel-Eskin

Sarath Chandar

While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational req… (see more)uirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. In detail, we leverage an LLM to annotate an oracle path with a sequence of sub-goals towards completing a goal. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Our code and annotated data for distillation can be found on GitHub.

2025-02-17

Proceedings of The 3rd Conference on Lifelong Learning Agents (published)

Sub-goal Distillation: A Method to Improve Small Language Agents

Maryam Hashemzadeh

Elias Stengel-Eskin

Sarath Chandar

While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational req… (see more)uirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. In detail, we leverage an LLM to annotate an oracle path with a sequence of sub-goals towards completing a goal. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Our code and annotated data for distillation can be found on GitHub.

2024-05-04

ArXiv (preprint)

Language-guided Skill Learning with Temporal Variational Inference

Haotian Fu

Pratyusha Sharma

Elias Stengel-Eskin

George Konidaris

Nicolas Le Roux

Xingdi Yuan

2024-05-01

ICML.cc/2024/Conference (poster)

Language-guided Skill Learning with Temporal Variational Inference

Haotian Fu

Pratyusha Sharma

Elias Stengel-Eskin

George Konidaris

Nicolas Le Roux

Xingdi Yuan

We present an algorithm for skill discovery from expert demonstrations. The algorithm first utilizes Large Language Models (LLMs) to propose… (see more) an initial segmentation of the trajectories. Following that, a hierarchical variational inference framework incorporates the LLM-generated segmentation information to discover reusable skills by merging trajectory segments. To further control the trade-off between compression and reusability, we introduce a novel auxiliary objective based on the Minimum Description Length principle that helps guide this skill discovery process. We test our system on BabyAI, a grid world navigation environment, as well as ALFRED, a household simulation environment.Our results demonstrate that agents equipped with our method can discover skills that help accelerate learning and outperform baseline skill learning approaches on new long-horizon tasks.

2024-03-11

ICLR.cc/2024/Workshop/LLMAgents (poster)

Joint Prompt Optimization of Stacked LLMs using Variational Inference

Eric Yuan

Xingdi Yuan

Matheus Pereira

Adam Trischler

Ziang Xiao

Arian Hosseini

Friederike Niedtner

Nicolas Le Roux

Large language models (LLMs) can be seen as atomic units of computation mapping sequences to a distribution over sequences. Thus, they can b… (see more)e seen as stochastic language layers in a language network, where the learnable parameters are the natural language prompts at each layer. By stacking two such layers and feeding the output of one layer to the next, we obtain a Deep Language Network (DLN). We first show how to effectively perform prompt optimization for a 1-Layer language network (DLN-1). Then, we present an extension that applies to 2-layer DLNs (DLN-2), where two prompts must be learned. The key idea is to consider the output of the first layer as a latent variable, which requires inference, and prompts to be learned as the parameters of the generative distribution. We first test the effectiveness of DLN-1 in multiple reasoning and natural language understanding tasks. Then, we show that DLN-2 can reach higher performance than a single layer, showing promise that we might reach comparable performance to GPT-4, even when each LLM in the network is smaller and less powerful.

From Words to Blocks: Building Objects by Grounding Language Models with Reinforcement Learning

Michael Ahn

Anthony Brohan

Noah Brown

liang Dai

Dan Su

Holy Lovenia Ziwei Bryan Wilie

Tiezheng Yu

Willy Chung

Quyet V. Do

Paul Barde

Tristan Karch

Derek Nowrouzezahrai

C. Bonial

Mitchell Abrams

David R. Traum

Hyung Won

Le Hou

Shayne Longpre

Yi Zoph

William Tay … (see 32 more)

Eric Fedus

Xuezhi Li

Lasse Espeholt

Hubert Soyer

Remi Munos

Karen Si-801

Vlad Mnih

Tom Ward

Yotam Doron

Wenlong Huang

Pieter Abbeel

Deepak Pathak

Julia Kiseleva

Ziming Li

Mohammad Aliannejadi

Shrestha Mohanty

Maartje Ter Hoeve

Mikhail Burtsev

Alexey Skrynnik

Artem Zholus

A. Panov

Kavya Srinet

A. Szlam

Yuxuan Sun

Katja Hofmann

Ahmed Hamid Awadallah

Linar Abdrazakov

Igor Churin

Putra Manggala

Kata Naszádi

Michiel Van Der Meer

Leveraging pre-trained language models to gen-001 erate action plans for embodied agents is an 002 emerging research direction. However, exe… (see more)-003 cuting instructions in real or simulated envi-004 ronments necessitates verifying the feasibility 005 of actions and their relevance in achieving a 006 goal. We introduce a novel method that in-007 tegrates a language model and reinforcement 008 learning for constructing objects in a Minecraft-009 like environment, based on natural language 010 instructions. Our method generates a set of 011 consistently achievable sub-goals derived from 012 the instructions and subsequently completes the 013 associated sub-tasks using a pre-trained RL pol-014 icy. We employ the IGLU competition, which 015 is based on the Minecraft-like simulator, as our 016 test environment, and compare our approach 017 to the competition’s top-performing solutions. 018 Our approach outperforms existing solutions in 019 terms of both the quality of the language model 020 and the quality of the structures built within the 021 IGLU environment. 022

Interactive Machine Comprehension with Information Seeking Agents

Xingdi Yuan

Jie Fu

Yi Tay

Chris Pal

Adam Trischler

Existing machine reading comprehension (MRC) models do not scale effectively to real-world applications like web-level information retrieval… (see more) and question answering (QA). We argue that this stems from the nature of MRC datasets: most of these are static environments wherein the supporting documents and all necessary information are fully observed. In this paper, we propose a simple method that reframes existing MRC datasets as interactive, partially observable environments. Specifically, we “occlude” the majority of a document’s text and add context-sensitive commands that reveal “glimpses” of the hidden text to a model. We repurpose SQuAD and NewsQA as an initial case study, and then show how the interactive corpora can be used to train a model that seeks relevant information through sequential decision making. We believe that this setting can contribute in scaling models to web-level QA scenarios.

2020-07-01

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (published)

Interactive Language Learning by Question Answering

Xingdi Yuan

Jie Fu

Zhouhan Lin

Chris Pal

Yoshua Bengio

Adam Trischler

Humans observe and interact with the world to acquire knowledge. However, most existing machine reading comprehension (MRC) tasks miss the i… (see more)nteractive, information-seeking component of comprehension. Such tasks present models with static documents that contain all necessary information, usually concentrated in a single short substring. Thus, models can achieve strong performance through simple word- and phrase-based pattern matching. We address this problem by formulating a novel text-based question answering task: Question Answering with Interactive Text (QAit). In QAit, an agent must interact with a partially observable text-based environment to gather information required to answer questions. QAit poses questions about the existence, location, and attributes of objects found in the environment. The data is built using a text-based game generator that defines the underlying dynamics of interaction with the environment. We propose and evaluate a set of baseline models for the QAit task that includes deep reinforcement learning agents. Experiments show that the task presents a major challenge for machine reading systems, while humans solve it with relative ease.

2019-11-01

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (published)