Portrait of Michael Rabbat is unavailable

Michael Rabbat

Associate Industry Member
Associate professor, McGill University, Department of Electrical and Computer Engineering
Research Scientist, Facebook AI Research
Research Topics
Distributed Systems
Optimization
Representation Learning

Biography

Mike Rabbat is an associate industry member of Mila – Quebec Artificial Intelligence Institute and director of research science in the Fundamental AI Research (FAIR) team at Meta.

Rabbat’s research interests include efficient and robust representation learning, in particular self-supervised learning. He is also interested in optimization for efficient model training.

Publications

Multiscale Gossip for Efficient Decentralized Averaging in Wireless Packet Networks
Konstantinos I. Tsianos
This paper describes and analyzes a hierarchical algorithm called Multiscale Gossip for solving the distributed average consensus problem in… (see more) wireless sensor networks. The algorithm proceeds by recursively partitioning a given network. Initially, nodes at the finest scale gossip to compute local averages. Then, using multi-hop communication and geographic routing to communicate between nodes that are not directly connected, these local averages are progressively fused up the hierarchy until the global average is computed. We show that the proposed hierarchical scheme with k=Θ(loglogn) levels of hierarchy is competitive with state-of-the-art randomized gossip algorithms in terms of message complexity, achieving ε-accuracy with high probability after O(n loglogn log[1/(ε)] ) single-hop messages. Key to our analysis is the way in which the network is recursively partitioned. We find that the above scaling law is achieved when subnetworks at scale j contain O(n(2/3)j) nodes; then the message complexity at any individual scale is O(n log[1/ε]). Another important consequence of the hierarchical construction is that the longest distance over which messages are exchanged is O(n1/3) hops (at the highest scale), and most messages (at lower scales) travel shorter distances. In networks that use link-level acknowledgements, this results in less congestion and resource usage by reducing message retransmissions. Simulations illustrate that the proposed scheme is more efficient than state-of-the-art randomized gossip algorithms based on averaging along paths.
Active learning of multiple source multiple destination topologies
Pegah Sattari
Maciej Kurant
Anima Anandkumar
Athina P. Markopoulou
We consider the problem of inferring the topology of an M-by-N network by sending probes between M sources and N receivers. Prior work has s… (see more)hown that this problem can be decomposed into two parts: first, infer smaller subnetwork components (i.e., 1-by-N's or 2-by-2's) and then merge these components to identify the M-by-N topology. In this paper, we focus on the second part. In particular, we assume that a 1by-N topology is given and that all 2-by-2 components can be queried and learned using end-to-end probes. The problem is which 2-by-2's to query and how to merge them with the 1-byN, so as to exactly identify the 2-by-N topology, and optimize a number of performance metrics including measurement traffic, time complexity, and memory usage. We provide a lower bound, ⌈N/2⌉, on the number of 2-by-2's required by any active learning algorithm and we also propose a greedy algorithm that is nearoptimal and efficient in practice. It follows a bottom-up approach: at every step, it selects two receivers, queries the corresponding 2-by-2, and merges it with the given 1-by-N. The algorithm requires exactly N - 1 steps, which is much less than all (N:2) possible 2-by-2's, and it correctly identifies the 2-by-N topology.
Active learning of multiple source multiple destination topologies
Pegah Sattari
Maciej Kurant
Animashree Anandkumar
Athina Markopoulou
We consider the problem of inferring the topology of an M-by-N network by sending probes between M sources and N receivers. Prior work has s… (see more)hown that this problem can be decomposed into two parts: first, infer smaller subnetwork components (i.e., 1-by-N's or 2-by-2's) and then merge these components to identify the M-by-N topology. In this paper, we focus on the second part. In particular, we assume that a 1by-N topology is given and that all 2-by-2 components can be queried and learned using end-to-end probes. The problem is which 2-by-2's to query and how to merge them with the 1-byN, so as to exactly identify the 2-by-N topology, and optimize a number of performance metrics including measurement traffic, time complexity, and memory usage. We provide a lower bound, ⌈N/2⌉, on the number of 2-by-2's required by any active learning algorithm and we also propose a greedy algorithm that is nearoptimal and efficient in practice. It follows a bottom-up approach: at every step, it selects two receivers, queries the corresponding 2-by-2, and merges it with the given 1-by-N. The algorithm requires exactly N - 1 steps, which is much less than all (N:2) possible 2-by-2's, and it correctly identifies the 2-by-N topology.
Multiscale Gossip for Efficient Decentralized Averaging in Wireless Packet Networks
Konstantinos I. Tsianos
This paper describes and analyzes a hierarchical algorithm called Multiscale Gossip for solving the distributed average consensus problem in… (see more) wireless sensor networks. The algorithm proceeds by recursively partitioning a given network. Initially, nodes at the finest scale gossip to compute local averages. Then, using multi-hop communication and geographic routing to communicate between nodes that are not directly connected, these local averages are progressively fused up the hierarchy until the global average is computed. We show that the proposed hierarchical scheme with
Greedy Gossip With Eavesdropping
Deniz Ustebay
Boris Oreshkin
This paper presents greedy gossip with eavesdropping (GGE), a novel randomized gossip algorithm for distributed computation of the average c… (see more)onsensus problem. In gossip algorithms, nodes in the network randomly communicate with their neighbors and exchange information iteratively. The algorithms are simple and decentralized, making them attractive for wireless network applications. In general, gossip algorithms are robust to unreliable wireless conditions and time varying network topologies. In this paper, we introduce GGE and demonstrate that greedy updates lead to rapid convergence. We do not require nodes to have any location information. Instead, greedy updates are made possible by exploiting the broadcast nature of wireless communications. During the operation of GGE, when a node decides to gossip, instead of choosing one of its neighbors at random, it makes a greedy selection, choosing the node which has the value most different from its own. In order to make this selection, nodes need to know their neighbors' values. Therefore, we assume that all transmissions are wireless broadcasts and nodes keep track of their neighbors' values by eavesdropping on their communications. We show that the convergence of GGE is guaranteed for connected network topologies. We also study the rates of convergence and illustrate, through theoretical bounds and numerical simulations, that GGE consistently outperforms randomized gossip and performs comparably to geographic gossip on moderate-sized random geometric graph topologies.
Optimization and Analysis of Distributed Averaging With Short Node Memory
Boris Oreshkin
Distributed averaging describes a class of network algorithms for the decentralized computation of aggregate statistics. Initially, each nod… (see more)e has a scalar data value, and the goal is to compute the average of these values at every node (the so-called average consensus problem). Nodes iteratively exchange information with their neighbors and perform local updates until the value at every node converges to the initial network average. Much previous work has focused on algorithms where each node maintains and updates a single value; every time an update is performed, the previous value is forgotten. Convergence to the average consensus is achieved asymptotically. The convergence rate is fundamentally limited by network connectivity, and it can be prohibitively slow on topologies such as grids and random geometric graphs, even if the update rules are optimized. In this paper, we provide the first theoretical demonstration that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's current and previous values (i.e., two memory taps), and our update rule computes a combination of the predictor and the usual weighted linear combination of values received from neighboring nodes. We derive the optimal mixing parameter for combining the predictor with the neighbors' values, and conduct a theoretical analysis of the improvement in convergence rate that can be achieved using this acceleration methodology. For a chain topology on N nodes, this leads to a factor of N improvement over standard consensus, and for a two-dimensional grid, our approach achieves a factor of ¿N improvement.
Optimization and Analysis of Distributed Averaging With Short Node Memory
Boris Oreshkin
Distributed averaging describes a class of network algorithms for the decentralized computation of aggregate statistics. Initially, each nod… (see more)e has a scalar data value, and the goal is to compute the average of these values at every node (the so-called average consensus problem). Nodes iteratively exchange information with their neighbors and perform local updates until the value at every node converges to the initial network average. Much previous work has focused on algorithms where each node maintains and updates a single value; every time an update is performed, the previous value is forgotten. Convergence to the average consensus is achieved asymptotically. The convergence rate is fundamentally limited by network connectivity, and it can be prohibitively slow on topologies such as grids and random geometric graphs, even if the update rules are optimized. In this paper, we provide the first theoretical demonstration that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's current and previous values (i.e., two memory taps), and our update rule computes a combination of the predictor and the usual weighted linear combination of values received from neighboring nodes. We derive the optimal mixing parameter for combining the predictor with the neighbors' values, and conduct a theoretical analysis of the improvement in convergence rate that can be achieved using this acceleration methodology. For a chain topology on N nodes, this leads to a factor of N improvement over standard consensus, and for a two-dimensional grid, our approach achieves a factor of ¿N improvement.
Distributed Average Consensus With Dithered Quantization
Tuncer Can Aysal
In this paper, we develop algorithms for distributed computation of averages of the node data over networks with bandwidth/power constraints… (see more) or large volumes of data. Distributed averaging algorithms fail to achieve consensus when deterministic uniform quantization is adopted. We propose a distributed algorithm in which the nodes utilize probabilistically quantized information, i.e., dithered quantization, to communicate with each other. The algorithm we develop is a dynamical system that generates sequences achieving a consensus at one of the quantization values almost surely. In addition, we show that the expected value of the consensus is equal to the average of the original sensor data. We derive an upper bound on the mean-square-error performance of the probabilistically quantized distributed averaging (PQDA). Moreover, we show that the convergence of the PQDA is monotonic by studying the evolution of the minimum-length interval containing the node values. We reveal that the length of this interval is a monotonically nonincreasing function with limit zero. We also demonstrate that all the node values, in the worst case, converge to the final two quantization bins at the same rate as standard unquantized consensus. Finally, we report the results of simulations conducted to evaluate the behavior and the effectiveness of the proposed algorithm in various scenarios.
Distributed Average Consensus With Dithered Quantization
In this paper, we develop algorithms for distributed computation of averages of the node data over networks with bandwidth/power constraints… (see more) or large volumes of data. Distributed averaging algorithms fail to achieve consensus when deterministic uniform quantization is adopted. We propose a distributed algorithm in which the nodes utilize probabilistically quantized information, i.e., dithered quantization, to communicate with each other. The algorithm we develop is a dynamical system that generates sequences achieving a consensus at one of the quantization values almost surely. In addition, we show that the expected value of the consensus is equal to the average of the original sensor data. We derive an upper bound on the mean-square-error performance of the probabilistically quantized distributed averaging (PQDA). Moreover, we show that the convergence of the PQDA is monotonic by studying the evolution of the minimum-length interval containing the node values. We reveal that the length of this interval is a monotonically nonincreasing function with limit zero. We also demonstrate that all the node values, in the worst case, converge to the final two quantization bins at the same rate as standard unquantized consensus. Finally, we report the results of simulations conducted to evaluate the behavior and the effectiveness of the proposed algorithm in various scenarios.
Greedy Gossip With Eavesdropping
Deniz Ustebay
Boris Oreshkin
This paper presents greedy gossip with eavesdropping (GGE), a novel randomized gossip algorithm for distributed computation of the average c… (see more)onsensus problem. In gossip algorithms, nodes in the network randomly communicate with their neighbors and exchange information iteratively. The algorithms are simple and decentralized, making them attractive for wireless network applications. In general, gossip algorithms are robust to unreliable wireless conditions and time varying network topologies. In this paper, we introduce GGE and demonstrate that greedy updates lead to rapid convergence. We do not require nodes to have any location information. Instead, greedy updates are made possible by exploiting the broadcast nature of wireless communications. During the operation of GGE, when a node decides to gossip, instead of choosing one of its neighbors at random, it makes a greedy selection, choosing the node which has the value most different from its own. In order to make this selection, nodes need to know their neighbors' values. Therefore, we assume that all transmissions are wireless broadcasts and nodes keep track of their neighbors' values by eavesdropping on their communications. We show that the convergence of GGE is guaranteed for connected network topologies. We also study the rates of convergence and illustrate, through theoretical bounds and numerical simulations, that GGE consistently outperforms randomized gossip and performs comparably to geographic gossip on moderate-sized random geometric graph topologies.