Portrait of Michael Rabbat is unavailable

Michael Rabbat

Associate Industry Member
Associate professor, McGill University, Department of Electrical and Computer Engineering
Research Scientist, Facebook AI Research
Research Topics
Distributed Systems
Optimization
Representation Learning

Biography

Mike Rabbat is an associate industry member of Mila – Quebec Artificial Intelligence Institute and director of research science in the Fundamental AI Research (FAIR) team at Meta.

Rabbat’s research interests include efficient and robust representation learning, in particular self-supervised learning. He is also interested in optimization for efficient model training.

Publications

Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity
Kiwan Maeng
Haiyu Lu
Luca Melis
John Nguyen
Carole-Jean Wu
Privacy-Aware Compression for Federated Data Analysis
Kamalika Chaudhuri
Chuan Guo
Federated data analytics is a framework for distributed data analysis where a server compiles noisy responses from a group of distributed lo… (see more)w-bandwidth user devices to estimate aggregate statistics. Two major challenges in this framework are privacy, since user data is often sensitive, and compression, since the user devices have low network bandwidth. Prior work has addressed these challenges separately by combining standard compression algorithms with known privacy mechanisms. In this work, we take a holistic look at the problem and design a family of privacy-aware compression mechanisms that work for any given communication budget. We first propose a mechanism for transmitting a single real number that has optimal variance under certain conditions. We then show how to extend it to metric differential privacy for location privacy use-cases, as well as vectors, for application to federated learning. Our experiments illustrate that our mechanism can lead to better utility vs. compression trade-offs for the same privacy loss in a number of settings.
Masked Siamese Networks for Label-Efficient Learning
Mahmoud Assran
Mathilde Caron
Ishan Misra
Piotr Bojanowski
Armand Joulin
We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the … (see more)representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are processed by the network. As a result, MSNs improve the scalability of joint-embedding architectures, while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 5,000 annotated images, our base MSN model achieves 72.4% top-1 accuracy, and with 1% of ImageNet-1K labels, we achieve 75.7% top-1 accuracy, setting a new state-of-the-art for self-supervised learning on this benchmark. Our code is publicly available.
Masked Siamese Networks for Label-Efficient Learning
Mahmoud Assran
Mathilde Caron
Ishan Misra
Piotr Bojanowski
Armand Joulin
We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the … (see more)representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are processed by the network. As a result, MSNs improve the scalability of joint-embedding architectures, while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 5,000 annotated images, our base MSN model achieves 72.4% top-1 accuracy, and with 1% of ImageNet-1K labels, we achieve 75.7% top-1 accuracy, setting a new state-of-the-art for self-supervised learning on this benchmark. Our code is publicly available.
Privacy-aware compression for federated data analysis
Kamalika Chaudhuri
Chuan Guo
Federated data analytics is a framework for distributed data analysis where a server compiles noisy responses from a group of distributed lo… (see more)w-bandwidth user devices to estimate aggregate statistics. Two major challenges in this framework are privacy, since user data is often sensitive, and compression, since the user devices have low network bandwidth. Prior work has addressed these challenges separately by combining standard compression algorithms with known privacy mechanisms. In this work, we take a holistic look at the problem and design a family of privacy-aware compression mechanisms that work for any given communication budget. We first propose a mechanism for transmitting a single real number that has optimal variance under certain conditions. We then show how to extend it to metric differential privacy for location privacy use-cases, as well as vectors, for application to federated learning. Our experiments illustrate that our mechanism can lead to better utility vs. compression trade-offs for the same privacy loss in a number of settings.
Learning with Gradient Descent and Weakly Convex Losses
Dominic Richards
We study the learning performance of gradient descent when the empirical risk is weakly convex, namely, the smallest negative eigenvalue of … (see more)the empirical risk's Hessian is bounded in magnitude. By showing that this eigenvalue can control the stability of gradient descent, generalisation error bounds are proven that hold under a wider range of step sizes compared to previous work. Out of sample guarantees are then achieved by decomposing the test error into generalisation, optimisation and approximation errors, each of which can be bounded and traded off with respect to algorithmic parameters, sample size and magnitude of this eigenvalue. In the case of a two layer neural network, we demonstrate that the empirical risk can satisfy a notion of local weak convexity, specifically, the Hessian's smallest eigenvalue during training can be controlled by the normalisation of the layers, i.e., network scaling. This allows test error guarantees to then be achieved when the population risk minimiser satisfies a complexity assumption. By trading off the network complexity and scaling, insights are gained into the implicit bias of neural network scaling, which are further supported by experimental findings.
Learning with Gradient Descent and Weakly Convex Losses
Dominic Richards
We study the learning performance of gradient descent when the empirical risk is weakly convex, namely, the smallest negative eigenvalue of … (see more)the empirical risk's Hessian is bounded in magnitude. By showing that this eigenvalue can control the stability of gradient descent, generalisation error bounds are proven that hold under a wider range of step sizes compared to previous work. Out of sample guarantees are then achieved by decomposing the test error into generalisation, optimisation and approximation errors, each of which can be bounded and traded off with respect to algorithmic parameters, sample size and magnitude of this eigenvalue. In the case of a two layer neural network, we demonstrate that the empirical risk can satisfy a notion of local weak convexity, specifically, the Hessian's smallest eigenvalue during training can be controlled by the normalisation of the layers, i.e., network scaling. This allows test error guarantees to then be achieved when the population risk minimiser satisfies a complexity assumption. By trading off the network complexity and scaling, insights are gained into the implicit bias of neural network scaling, which are further supported by experimental findings.
A Multisensor Multi-Bernoulli Filter
Augustin-Alexandru Saucan
In this paper, we derive a multisensor multi-Bernoulli (MS-MeMBer) filter for multitarget tracking. Measurements from multiple sensors are e… (see more)mployed by the proposed filter to update a set of tracks modeled as a multi-Bernoulli random finite set. An exact implementation of the MS-MeMBer update procedure is computationally intractable. We propose an efficient approximate implementation by using a greedy measurement partitioning mechanism. The proposed filter allows for Gaussian mixture or particle filter implementations. Numerical simulations conducted for both linear-Gaussian and nonlinear models highlight the improved accuracy of the MS-MeMBer filter and its reduced computational load with respect to the multisensor cardinalized probability hypothesis density filter and the iterated-corrector cardinality-balanced multi-Bernoulli filter especially for low probabilities of detection.
Time-Varying Mixtures of Markov Chains: An Application to Road Traffic Modeling
Sean F. Lawlor
Time-varying mixture models are useful for representing complex, dynamic distributions. Components in the mixture model can appear and disap… (see more)pear, and persisting components can evolve. This allows great flexibility in streaming data applications where the model can be adjusted as new data arrives. Fitting a mixture model with computational guarantees which can meet real-time requirements is challenging with existing algorithms, especially when the model order can vary with time. Existing approximate inference methods may require multiple restarts to search for a good local solution. Monte-Carlo methods can be used to jointly estimate the model order and model parameters, but when the distribution of each mixand has a high-dimensional parameter space, they suffer from the curse of dimensionality and and from slow convergence. This paper proposes a generative model for time-varying mixture models, tailored for mixtures of discrete-time Markov chains. A novel, deterministic inference procedure is introduced and is shown to be suitable for applications requiring real-time estimation, and the method is guaranteed to converge at each time step. As a motivating application, we model and predict traffic patterns in a transportation network. Experiments illustrate the performance of the scheme and offer insights regarding tuning of the algorithm parameters. The experiments also investigate the predictive power of the proposed model compared to less complex models and demonstrate the superiority of the mixture model approach for prediction of traffic routes in real data.
Time-Varying Mixtures of Markov Chains: An Application to Road Traffic Modeling
Sean Lawlor
Time-varying mixture models are useful for representing complex, dynamic distributions. Components in the mixture model can appear and disap… (see more)pear, and persisting components can evolve. This allows great flexibility in streaming data applications where the model can be adjusted as new data arrives. Fitting a mixture model with computational guarantees which can meet real-time requirements is challenging with existing algorithms, especially when the model order can vary with time. Existing approximate inference methods may require multiple restarts to search for a good local solution. Monte-Carlo methods can be used to jointly estimate the model order and model parameters, but when the distribution of each mixand has a high-dimensional parameter space, they suffer from the curse of dimensionality and and from slow convergence. This paper proposes a generative model for time-varying mixture models, tailored for mixtures of discrete-time Markov chains. A novel, deterministic inference procedure is introduced and is shown to be suitable for applications requiring real-time estimation, and the method is guaranteed to converge at each time step. As a motivating application, we model and predict traffic patterns in a transportation network. Experiments illustrate the performance of the scheme and offer insights regarding tuning of the algorithm parameters. The experiments also investigate the predictive power of the proposed model compared to less complex models and demonstrate the superiority of the mixture model approach for prediction of traffic routes in real data.
A Multisensor Multi-Bernoulli Filter
Augustin-Alexandru Saucan
In this paper, we derive a multisensor multi-Bernoulli (MS-MeMBer) filter for multitarget tracking. Measurements from multiple sensors are e… (see more)mployed by the proposed filter to update a set of tracks modeled as a multi-Bernoulli random finite set. An exact implementation of the MS-MeMBer update procedure is computationally intractable. We propose an efficient approximate implementation by using a greedy measurement partitioning mechanism. The proposed filter allows for Gaussian mixture or particle filter implementations. Numerical simulations conducted for both linear-Gaussian and nonlinear models highlight the improved accuracy of the MS-MeMBer filter and its reduced computational load with respect to the multisensor cardinalized probability hypothesis density filter and the iterated-corrector cardinality-balanced multi-Bernoulli filter especially for low probabilities of detection.
Fault-Tolerant Associative Memories Based on $c$-Partite Graphs
François Leduc-Primeau
Associative memories allow the retrieval of previously stored messages given a part of their content. In this paper, we are interested in as… (see more)sociative memories based on c-partite graphs that were recently introduced. These memories are almost optimal in terms of the amount of storage they require (efficiency) and allow retrieving messages with low complexity. We propose a generic implementation model for the retrieval algorithm that can be readily mapped to an integrated circuit and study the retrieval performance when hardware components are affected by faults. We show using analytical and simulation results that these associative memories can be made resilient to circuit faults with a minor modification of the retrieval algorithm. In one example, the memory retains 88% of its efficiency when 1% of the storage cells are faulty, or 98% when 0.1% of the binary outputs of the retrieval algorithm are faulty. When considering storage faults, the fault tolerance exhibited by the proposed associative memory can be comparable to using a capacity-achieving error correction code for protecting the stored information.