Portrait of Michael Rabbat is unavailable

Michael Rabbat

Associate Industry Member
Associate professor, McGill University, Department of Electrical and Computer Engineering
Research Scientist, Facebook AI Research
Research Topics
Distributed Systems
Optimization
Representation Learning

Biography

Mike Rabbat is an associate industry member of Mila – Quebec Artificial Intelligence Institute and director of research science in the Fundamental AI Research (FAIR) team at Meta.

Rabbat’s research interests include efficient and robust representation learning, in particular self-supervised learning. He is also interested in optimization for efficient model training.

Publications

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
DiJia Su
Sainbayar Sukhbaatar
Yuandong Tian
Qinqing Zheng
Towards General-Purpose Model-Free Reinforcement Learning
Scott Fujimoto
Pierluca D'Oro
Amy Zhang
Yuandong Tian
Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored t… (see more)o specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Shengbang Tong
David Fan
Jiachen Zhu
Yunyang Xiong
Xinlei Chen
Koustuv Sinha
Yann LeCun
Saining Xie
Zhuang Liu
In this work, we propose Visual-Predictive Instruction Tuning (VPiT) - a simple and effective extension to visual instruction tuning that en… (see more)ables a pretrained LLM to quickly morph into an unified autoregressive model capable of generating both text and visual tokens. VPiT teaches an LLM to predict discrete text tokens and continuous visual tokens from any input sequence of image and text data curated in an instruction-following format. Our empirical investigation reveals several intriguing properties of VPiT: (1) visual generation ability emerges as a natural byproduct of improved visual understanding, and can be unlocked efficiently with a small amount of generation data; (2) while we find understanding and generation to be mutually beneficial, understanding data contributes to both capabilities more effectively than generation data. Building upon these findings, we train our MetaMorph model and achieve competitive performance on both visual understanding and generation. In visual generation, MetaMorph can leverage the world knowledge and reasoning abilities gained from LLM pretraining, and overcome common failure modes exhibited by other generation models. Our results suggest that LLMs may have strong"prior"vision capabilities that can be efficiently adapted to both visual understanding and generation with a relatively simple instruction tuning process.
EvalGIM: A Library for Evaluating Generative Image Models
Melissa Hall
Oscar Mañas
Reyhane Askari Hemmat
Mark Ibrahim
Candace Ross
Pietro Astolfi
Tariq Berrada
Marton Havasi
Yohann Benchetrit
Karen Ullrich
Carolina Braga
Abhishek Charnalia
Maeve Ryan
Michal Drozdzal
Jakob Verbeek
As the use of text-to-image generative models increases, so does the adoption of automatic benchmarking methods used in their evaluation. Ho… (see more)wever, while metrics and datasets abound, there are few unified benchmarking libraries that provide a framework for performing evaluations across many datasets and metrics. Furthermore, the rapid introduction of increasingly robust benchmarking methods requires that evaluation libraries remain flexible to new datasets and metrics. Finally, there remains a gap in synthesizing evaluations in order to deliver actionable takeaways about model performance. To enable unified, flexible, and actionable evaluations, we introduce EvalGIM (pronounced ''EvalGym''), a library for evaluating generative image models. EvalGIM contains broad support for datasets and metrics used to measure quality, diversity, and consistency of text-to-image generative models. In addition, EvalGIM is designed with flexibility for user customization as a top priority and contains a structure that allows plug-and-play additions of new datasets and metrics. To enable actionable evaluation insights, we introduce ''Evaluation Exercises'' that highlight takeaways for specific evaluation questions. The Evaluation Exercises contain easy-to-use and reproducible implementations of two state-of-the-art evaluation methods of text-to-image generative models: consistency-diversity-realism Pareto Fronts and disaggregated measurements of performance disparities across groups. EvalGIM also contains Evaluation Exercises that introduce two new analysis methods for text-to-image generative models: robustness analyses of model rankings and balanced evaluations across different prompt styles. We encourage text-to-image model exploration with EvalGIM and invite contributions at https://github.com/facebookresearch/EvalGIM/.
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
DiJia Su
Sainbayar Sukhbaatar
Yuandong Tian
Qinqing Zheng
In human cognition theory, human thinking is governed by two systems: the fast and intuitive System 1 and the slower but more deliberative S… (see more)ystem 2. Recent studies have shown that incorporating System 2 process into Transformers including large language models (LLMs), significantly enhances their reasoning capabilities. Nevertheless, models that purely resemble System 2 thinking require substantially higher computational costs and are much slower to respond. To address this challenge, we present Dualformer, a single Transformer model that seamlessly integrates both the fast and slow reasoning modes. Dualformer is obtained by training on data with randomized reasoning traces, where different parts of the traces are dropped during training. The dropping strategies are specifically tailored according to the trace structure, analogous to analyzing our thinking process and creating shortcuts with patterns. At inference time, our model can be configured to output only the solutions (fast mode) or both the reasoning chain and the final solution (slow mode), or automatically decide which mode to engage (auto mode). In all cases, Dualformer outperforms the corresponding baseline models in both performance and computational efficiency: (1) in slow mode, Dualformer optimally solves unseen 30 x 30 maze navigation tasks 97.6% of the time, surpassing the Searchformer (trained on data with complete reasoning traces) baseline performance of 93.3%, while only using 45.5% fewer reasoning steps; (2) in fast mode, Dualformer completes those tasks with an 80% optimal rate, significantly outperforming the Solution-Only model (trained on solution-only data), which has an optimal rate of only 30%. For math problems, our techniques have also achieved improved performance with LLM fine-tuning, showing its generalization beyond task-specific models.
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
Ouail Kitouni
Niklas Nolte
Adina Williams
Diane Bouchacourt
Mark Ibrahim
Revisiting Feature Prediction for Learning Visual Representations from Video
Adrien Bardes
Quentin Garrido
Jean Ponce
Xinlei Chen
Yann LeCun
Mahmoud Assran
Nicolas Ballas
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Lucas Lehnert
Sainbayar Sukhbaatar
DiJia Su
Paul McVay
Qinqing Zheng
Yuandong Tian
While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symboli… (see more)c planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the _search dynamics_ of the
DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning
Jonathan Lebensold
Maziar Sanjabi
Pietro Astolfi
Kamalika Chaudhuri
Chuan Guo
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Lucas Lehnert
Sainbayar Sukhbaatar
Paul McVay
Yuandong Tian
While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symboli… (see more)c planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than standard
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Lucas Lehnert
Sainbayar Sukhbaatar
Paul McVay
Yuandong Tian
While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbo… (see more)lic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the search dynamics of the
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Lucas Lehnert
Sainbayar Sukhbaatar
Paul McVay
Yuandong Tian
While Transformers have enabled tremendous progress in various application settings, such architectures still trail behind traditional symbo… (see more)lic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the search dynamics of the