Michael Rabbat

Yuandong Tian

Qinqing Zheng

2025-01-22

ICLR.cc/2025/Conference (poster)

Towards General-Purpose Model-Free Reinforcement Learning

Scott Fujimoto

Pierluca D'Oro

Amy Zhang

Yuandong Tian

Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored t… (see more)o specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.

2025-01-22

ICLR.cc/2025/Conference (spotlight)

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

Shengbang Tong

David Fan

Jiachen Zhu

Yunyang Xiong

Xinlei Chen

Koustuv Sinha

Yann LeCun

Saining Xie

Zhuang Liu

In this work, we propose Visual-Predictive Instruction Tuning (VPiT) - a simple and effective extension to visual instruction tuning that en… (see more)ables a pretrained LLM to quickly morph into an unified autoregressive model capable of generating both text and visual tokens. VPiT teaches an LLM to predict discrete text tokens and continuous visual tokens from any input sequence of image and text data curated in an instruction-following format. Our empirical investigation reveals several intriguing properties of VPiT: (1) visual generation ability emerges as a natural byproduct of improved visual understanding, and can be unlocked efficiently with a small amount of generation data; (2) while we find understanding and generation to be mutually beneficial, understanding data contributes to both capabilities more effectively than generation data. Building upon these findings, we train our MetaMorph model and achieve competitive performance on both visual understanding and generation. In visual generation, MetaMorph can leverage the world knowledge and reasoning abilities gained from LLM pretraining, and overcome common failure modes exhibited by other generation models. Our results suggest that LLMs may have strong"prior"vision capabilities that can be efficiently adapted to both visual understanding and generation with a relatively simple instruction tuning process.

2024-12-18

ArXiv (preprint)

EvalGIM: A Library for Evaluating Generative Image Models

Melissa Hall

Oscar Mañas

Reyhane Askari Hemmat

Mark Ibrahim

Candace Ross

Pietro Astolfi

Tariq Berrada

Marton Havasi

Yohann Benchetrit

Karen Ullrich

Carolina Braga

Abhishek Charnalia

Maeve Ryan

Michal Drozdzal

Jakob Verbeek

Adriana Romero Soriano

As the use of text-to-image generative models increases, so does the adoption of automatic benchmarking methods used in their evaluation. Ho… (see more)wever, while metrics and datasets abound, there are few unified benchmarking libraries that provide a framework for performing evaluations across many datasets and metrics. Furthermore, the rapid introduction of increasingly robust benchmarking methods requires that evaluation libraries remain flexible to new datasets and metrics. Finally, there remains a gap in synthesizing evaluations in order to deliver actionable takeaways about model performance. To enable unified, flexible, and actionable evaluations, we introduce EvalGIM (pronounced ''EvalGym''), a library for evaluating generative image models. EvalGIM contains broad support for datasets and metrics used to measure quality, diversity, and consistency of text-to-image generative models. In addition, EvalGIM is designed with flexibility for user customization as a top priority and contains a structure that allows plug-and-play additions of new datasets and metrics. To enable actionable evaluation insights, we introduce ''Evaluation Exercises'' that highlight takeaways for specific evaluation questions. The Evaluation Exercises contain easy-to-use and reproducible implementations of two state-of-the-art evaluation methods of text-to-image generative models: consistency-diversity-realism Pareto Fronts and disaggregated measurements of performance disparities across groups. EvalGIM also contains Evaluation Exercises that introduce two new analysis methods for text-to-image generative models: robustness analyses of model rankings and balanced evaluations across different prompt styles. We encourage text-to-image model exploration with EvalGIM and invite contributions at https://github.com/facebookresearch/EvalGIM/.

2024-12-13

ArXiv (preprint)

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

DiJia Su

Sainbayar Sukhbaatar

Yuandong Tian

Qinqing Zheng

In human cognition theory, human thinking is governed by two systems: the fast and intuitive System 1 and the slower but more deliberative S… (see more)ystem 2. Recent studies have shown that incorporating System 2 process into Transformers including large language models (LLMs), significantly enhances their reasoning capabilities. Nevertheless, models that purely resemble System 2 thinking require substantially higher computational costs and are much slower to respond. To address this challenge, we present Dualformer, a single Transformer model that seamlessly integrates both the fast and slow reasoning modes. Dualformer is obtained by training on data with randomized reasoning traces, where different parts of the traces are dropped during training. The dropping strategies are specifically tailored according to the trace structure, analogous to analyzing our thinking process and creating shortcuts with patterns. At inference time, our model can be configured to output only the solutions (fast mode) or both the reasoning chain and the final solution (slow mode), or automatically decide which mode to engage (auto mode). In all cases, Dualformer outperforms the corresponding baseline models in both performance and computational efficiency: (1) in slow mode, Dualformer optimally solves unseen 30 x 30 maze navigation tasks 97.6% of the time, surpassing the Searchformer (trained on data with complete reasoning traces) baseline performance of 93.3%, while only using 45.5% fewer reasoning steps; (2) in fast mode, Dualformer completes those tasks with an 80% optimal rate, significantly outperforming the Solution-Only model (trained on solution-only data), which has an optimal rate of only 30%. For math problems, our techniques have also achieved improved performance with LLM fine-tuning, showing its generalization beyond task-specific models.

2024-10-13

ArXiv (preprint)

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Ouail Kitouni

Niklas Nolte

Adina Williams

Diane Bouchacourt

Mark Ibrahim

2024-09-25

NeurIPS.cc/2024/Conference (poster)

Revisiting Feature Prediction for Learning Visual Representations from Video

Adrien Bardes

Quentin Garrido

Jean Ponce

Xinlei Chen

Yann LeCun

Mahmoud Assran

Nicolas Ballas

2024-08-09

TMLR (accepted)

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Lucas Lehnert

Sainbayar Sukhbaatar

DiJia Su

Paul McVay

Qinqing Zheng

Yuandong Tian

While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symboli… (see more)c planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the _search dynamics_ of the

2024-07-10

colmweb.org/COLM/2024/Conference (accepted)

DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

Jonathan Lebensold

Maziar Sanjabi

Pietro Astolfi

Adriana Romero Soriano

Kamalika Chaudhuri

Chuan Guo

2024-03-21

ArXiv (preprint)

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Lucas Lehnert

Sainbayar Sukhbaatar

Paul McVay

Yuandong Tian

While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symboli… (see more)c planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than standard

2024-02-21

ArXiv (preprint)

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Lucas Lehnert

Sainbayar Sukhbaatar

Paul McVay

Yuandong Tian

While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbo… (see more)lic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the search dynamics of the

2024-02-21

ArXiv (preprint)

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Lucas Lehnert

Sainbayar Sukhbaatar

Paul McVay

Yuandong Tian

While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbo… (see more)lic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the search dynamics of the

2024-02-21

ArXiv (preprint)