Portrait of Michael Rabbat is unavailable

Michael Rabbat

Associate Industry Member
Associate professor, McGill University, Department of Electrical and Computer Engineering
Research Scientist, Facebook AI Research

Biography

Mike Rabbat is an associate industry member of Mila – Quebec Artificial Intelligence Institute and director of research science in the Fundamental AI Research (FAIR) team at Meta.

Rabbat’s research interests include efficient and robust representation learning, in particular self-supervised learning. He is also interested in optimization for efficient model training.

Publications

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Lucas Lehnert
Sainbayar Sukhbaatar
Paul McVay
Yuandong Tian
While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symboli… (see more)c planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the _search dynamics_ of the
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Lucas Lehnert
Sainbayar Sukhbaatar
Paul McVay
Yuandong Tian
While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symboli… (see more)c planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than standard
Revisiting Feature Prediction for Learning Visual Representations from Video
Adrien Bardes
Quentin Garrido
Jean Ponce
Xinlei Chen
Yann LeCun
Mahmoud Assran
Nicolas Ballas
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy V. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel HAZIZA
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russell Howes
Po-Yao Huang
Shang-Wen Li
Ishan Misra
Vasu Sharma
Gabriel Synnaeve … (see 7 more)
Hu Xu
Huijiao Xu
Herve Jegou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar fo… (see more)undation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. In terms of models, we train a ViT model with 1B parameters and distill it into a series of smaller models that surpass the best available all-purpose features, OpenCLIP on most of the benchmarks at image and pixel levels.
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Hao-Jun Michael Shi
Tsung-Hsien Lee
Shintaro Iwasaki
Jose Gallego-Posada
Zhijing Li
Kaushik Rangadurai
Dheevatsa Mudigere
Green Federated Learning
Ashkan Yousefpour
Shen Guo
Ashish Shenoy
Sayan Ghosh
Pierre Stock
Kiwan Maeng
Schalk-Willem Kruger
Carole-Jean Wu
Ilya Mironov
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Mahmoud Assran
Quentin Duval
Ishan Misra
Piotr Bojanowski
Yann LeCun
Nicolas Ballas
This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. W… (see more)e introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semantic representations is the masking strategy; specifically, it is crucial to (a) sample target blocks with sufficiently large scale (semantic), and to (b) use a sufficiently informative (spatially distributed) context block. Empirically, when combined with Vision Transformers, we find I-JEPA to be highly scalable. For instance, we train a ViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strong downstream performance across a wide range of tasks, from linear classification to object counting and depth prediction.
Benchmarking Neural Network Training Algorithms
George Edward Dahl
Frank Schneider
Zachary Nado
Naman Agarwal
Chandramouli Shama Sastry
Philipp Hennig
Sourabh Medapati
Runa Eschenhagen
Priya Kasimbeg
Daniel Suo
Juhan Bae
Justin M. Gilmer
A. L. Peirson
Bilal Muhammad Khan
Rohan Anil
Shankar Krishnan
Daniel Snider
Ehsan Amid
Kongtao Chen … (see 5 more)
Chris J. Maddison
R. Vasudev
Michal Badura
Ankush Garg
Peter Mattson
Green Federated Learning
Ashkan Yousefpour
Sheng Guo
Ashish V. Shenoy
Sayan Ghosh
Pierre Stock
Kiwan Maeng
Schalk-Willem Kruger
Carole-Jean Wu
Ilya Mironov
The rapid progress of AI is fueled by increasingly large and computationally intensive machine learning models and datasets. As a consequenc… (see more)e, the amount of compute used in training state-of-the-art models is exponentially increasing (doubling every 10 months between 2015 and 2022), resulting in a large carbon footprint. Federated Learning (FL) - a collaborative machine learning technique for training a centralized model using data of decentralized entities - can also be resource-intensive and have a significant carbon footprint, particularly when deployed at scale. Unlike centralized AI that can reliably tap into renewables at strategically placed data centers, cross-device FL may leverage as many as hundreds of millions of globally distributed end-user devices with diverse energy sources. Green AI is a novel and important research area where carbon footprint is regarded as an evaluation criterion for AI, alongside accuracy, convergence speed, and other metrics. In this paper, we propose the concept of Green FL, which involves optimizing FL parameters and making design choices to minimize carbon emissions consistent with competitive performance and training time. The contributions of this work are two-fold. First, we adopt a data-driven approach to quantify the carbon emissions of FL by directly measuring real-world at-scale FL tasks running on millions of phones. Second, we present challenges, guidelines, and lessons learned from studying the trade-off between energy efficiency, performance, and time-to-train in a production FL system. Our findings offer valuable insights into how FL can reduce its carbon footprint, and they provide a foundation for future research in the area of Green AI.
The Hidden Uniform Cluster Prior in Self-Supervised Learning
Mahmoud Assran
Randall Balestriero
Quentin Duval
Florian Bordes
Ishan Misra
Piotr Bojanowski
Nicolas Ballas
A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e.g.,… (see more) SimCLR, VICReg, SwAV, MSN). We show that in the formulation of all these methods is an overlooked prior to learn features that enable uniform clustering of the data. While this prior has led to remarkably semantic representations when pretraining on class-balanced data, such as ImageNet, we demonstrate that it can hamper performance when pretraining on class-imbalanced data. By moving away from conventional uniformity priors and instead preferring power-law distributed feature clusters, we show that one can improve the quality of the learned representations on real-world class-imbalanced datasets. To demonstrate this, we develop an extension of the Masked Siamese Networks (MSN) method to support the use of arbitrary features priors.
Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning
John Nguyen
Jianyu Wang
Kshitiz Malik
Maziar Sanjabi
lo-fi: distributed fine-tuning without communication
Mitchell Wortsman
Suchin Gururangan
Shen Li
Ali Farhadi
Ludwig Schmidt
Ari S. Morcos
When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contra… (see more)st, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node fine-tunes independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step. We also observe that lo-fi matches the baseline's performance when fine-tuning OPT language models (up to 1.3B parameters) on Common Crawl. By removing the communication requirement, lo-fi reduces resource barriers for fine-tuning large models and enables fine-tuning in settings with prohibitive communication cost.