Portrait de Michael Rabbat n'est pas disponible

Michael Rabbat

Membre industriel associé
Professeur associé, McGill University, Département de génie électrique et informatique
Chercheur scientifique, Facebook AI Research
Sujets de recherche
Apprentissage de représentations
Optimisation
Systèmes distribués

Biographie

Mike Rabbat est membre affilié de Mila – Institut québécois d’intelligence artificielle et directeur de la recherche scientifique au sein de l'équipe FAIR (Fundamental AI Research) de Meta. Ses recherches portent sur l'apprentissage efficace et robuste des représentations, en particulier l'apprentissage autosupervisé. Il s'intéresse également à l'optimisation pour un apprentissage efficace des modèles.

Publications

Optimization and Analysis of Distributed Averaging With Short Node Memory
Boris Oreshkin
Distributed averaging describes a class of network algorithms for the decentralized computation of aggregate statistics. Initially, each nod… (voir plus)e has a scalar data value, and the goal is to compute the average of these values at every node (the so-called average consensus problem). Nodes iteratively exchange information with their neighbors and perform local updates until the value at every node converges to the initial network average. Much previous work has focused on algorithms where each node maintains and updates a single value; every time an update is performed, the previous value is forgotten. Convergence to the average consensus is achieved asymptotically. The convergence rate is fundamentally limited by network connectivity, and it can be prohibitively slow on topologies such as grids and random geometric graphs, even if the update rules are optimized. In this paper, we provide the first theoretical demonstration that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's current and previous values (i.e., two memory taps), and our update rule computes a combination of the predictor and the usual weighted linear combination of values received from neighboring nodes. We derive the optimal mixing parameter for combining the predictor with the neighbors' values, and conduct a theoretical analysis of the improvement in convergence rate that can be achieved using this acceleration methodology. For a chain topology on N nodes, this leads to a factor of N improvement over standard consensus, and for a two-dimensional grid, our approach achieves a factor of ¿N improvement.
Optimization and Analysis of Distributed Averaging With Short Node Memory
Boris Oreshkin
Distributed averaging describes a class of network algorithms for the decentralized computation of aggregate statistics. Initially, each nod… (voir plus)e has a scalar data value, and the goal is to compute the average of these values at every node (the so-called average consensus problem). Nodes iteratively exchange information with their neighbors and perform local updates until the value at every node converges to the initial network average. Much previous work has focused on algorithms where each node maintains and updates a single value; every time an update is performed, the previous value is forgotten. Convergence to the average consensus is achieved asymptotically. The convergence rate is fundamentally limited by network connectivity, and it can be prohibitively slow on topologies such as grids and random geometric graphs, even if the update rules are optimized. In this paper, we provide the first theoretical demonstration that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's current and previous values (i.e., two memory taps), and our update rule computes a combination of the predictor and the usual weighted linear combination of values received from neighboring nodes. We derive the optimal mixing parameter for combining the predictor with the neighbors' values, and conduct a theoretical analysis of the improvement in convergence rate that can be achieved using this acceleration methodology. For a chain topology on N nodes, this leads to a factor of N improvement over standard consensus, and for a two-dimensional grid, our approach achieves a factor of ¿N improvement.
Distributed Average Consensus With Dithered Quantization
Tuncer Can Aysal
In this paper, we develop algorithms for distributed computation of averages of the node data over networks with bandwidth/power constraints… (voir plus) or large volumes of data. Distributed averaging algorithms fail to achieve consensus when deterministic uniform quantization is adopted. We propose a distributed algorithm in which the nodes utilize probabilistically quantized information, i.e., dithered quantization, to communicate with each other. The algorithm we develop is a dynamical system that generates sequences achieving a consensus at one of the quantization values almost surely. In addition, we show that the expected value of the consensus is equal to the average of the original sensor data. We derive an upper bound on the mean-square-error performance of the probabilistically quantized distributed averaging (PQDA). Moreover, we show that the convergence of the PQDA is monotonic by studying the evolution of the minimum-length interval containing the node values. We reveal that the length of this interval is a monotonically nonincreasing function with limit zero. We also demonstrate that all the node values, in the worst case, converge to the final two quantization bins at the same rate as standard unquantized consensus. Finally, we report the results of simulations conducted to evaluate the behavior and the effectiveness of the proposed algorithm in various scenarios.
Distributed Average Consensus With Dithered Quantization
In this paper, we develop algorithms for distributed computation of averages of the node data over networks with bandwidth/power constraints… (voir plus) or large volumes of data. Distributed averaging algorithms fail to achieve consensus when deterministic uniform quantization is adopted. We propose a distributed algorithm in which the nodes utilize probabilistically quantized information, i.e., dithered quantization, to communicate with each other. The algorithm we develop is a dynamical system that generates sequences achieving a consensus at one of the quantization values almost surely. In addition, we show that the expected value of the consensus is equal to the average of the original sensor data. We derive an upper bound on the mean-square-error performance of the probabilistically quantized distributed averaging (PQDA). Moreover, we show that the convergence of the PQDA is monotonic by studying the evolution of the minimum-length interval containing the node values. We reveal that the length of this interval is a monotonically nonincreasing function with limit zero. We also demonstrate that all the node values, in the worst case, converge to the final two quantization bins at the same rate as standard unquantized consensus. Finally, we report the results of simulations conducted to evaluate the behavior and the effectiveness of the proposed algorithm in various scenarios.
Greedy Gossip With Eavesdropping
Deniz Ustebay
Boris Oreshkin
This paper presents greedy gossip with eavesdropping (GGE), a novel randomized gossip algorithm for distributed computation of the average c… (voir plus)onsensus problem. In gossip algorithms, nodes in the network randomly communicate with their neighbors and exchange information iteratively. The algorithms are simple and decentralized, making them attractive for wireless network applications. In general, gossip algorithms are robust to unreliable wireless conditions and time varying network topologies. In this paper, we introduce GGE and demonstrate that greedy updates lead to rapid convergence. We do not require nodes to have any location information. Instead, greedy updates are made possible by exploiting the broadcast nature of wireless communications. During the operation of GGE, when a node decides to gossip, instead of choosing one of its neighbors at random, it makes a greedy selection, choosing the node which has the value most different from its own. In order to make this selection, nodes need to know their neighbors' values. Therefore, we assume that all transmissions are wireless broadcasts and nodes keep track of their neighbors' values by eavesdropping on their communications. We show that the convergence of GGE is guaranteed for connected network topologies. We also study the rates of convergence and illustrate, through theoretical bounds and numerical simulations, that GGE consistently outperforms randomized gossip and performs comparably to geographic gossip on moderate-sized random geometric graph topologies.