Publications

Fully Parallel Stochastic LDPC Decoders
Saeed Sharifi Tehrani
Shie Mannor
Stochastic decoding is a new approach to iterative decoding on graphs. This paper presents a hardware architecture for fully parallel stocha… (see more)stic low-density parity-check (LDPC) decoders. To obtain the characteristics of the proposed architecture, we apply this architecture to decode an irregular state-of-the-art (1056,528) LDPC code on a Xilinx Virtex-4 LX200 field-programmable gate-array (FPGA) device. The implemented decoder achieves a clock frequency of 222 MHz and a throughput of about 1.66 Gb/s at
Fully Parallel Stochastic LDPC Decoders
S. Tehrani
Shie Mannor
Stochastic decoding is a new approach to iterative decoding on graphs. This paper presents a hardware architecture for fully parallel stocha… (see more)stic low-density parity-check (LDPC) decoders. To obtain the characteristics of the proposed architecture, we apply this architecture to decode an irregular state-of-the-art (1056,528) LDPC code on a Xilinx Virtex-4 LX200 field-programmable gate-array (FPGA) device. The implemented decoder achieves a clock frequency of 222 MHz and a throughput of about 1.66 Gb/s at Eb/N0=4.25 dB (a bit error rate of 10-8). It provides decoding performance within 0.5 and 0.25 dB of the floating-point sum-product algorithm with 32 and 16 iterations, respectively, and similar error-floor behavior. The decoder uses less than 40% of the lookup tables, flip-flops, and IO ports available on the FPGA device. The results provided in this paper validate the potential of stochastic LDPC decoding as a practical and competitive fully parallel decoding approach.
Stochastic Decoding of Linear Block Codes With High-Density Parity-Check Matrices
S. Tehrani
Christophe Jego
Bo Zhu
This correspondence extends the application of the recently proposed stochastic decoding approach to decode linear block codes with high-den… (see more)sity parity-check matrices and discusses its hardware complexity. Results demonstrate decoding performance close to floating-point iterative soft-input soft-output (SISO) decoding while offering nodes with considerably lower complexity compared to fixed-point SISO decoding.
Stochastic Decoding of Linear Block Codes With High-Density Parity-Check Matrices
Saeed Sharifi Tehrani
Christophe Jego
Bo Zhu
This correspondence extends the application of the recently proposed stochastic decoding approach to decode linear block codes with high-den… (see more)sity parity-check matrices and discusses its hardware complexity. Results demonstrate decoding performance close to floating-point iterative soft-input soft-output (SISO) decoding while offering nodes with considerably lower complexity compared to fixed-point SISO decoding.
Distributed Average Consensus With Dithered Quantization
Tuncer Can Aysal
In this paper, we develop algorithms for distributed computation of averages of the node data over networks with bandwidth/power constraints… (see more) or large volumes of data. Distributed averaging algorithms fail to achieve consensus when deterministic uniform quantization is adopted. We propose a distributed algorithm in which the nodes utilize probabilistically quantized information, i.e., dithered quantization, to communicate with each other. The algorithm we develop is a dynamical system that generates sequences achieving a consensus at one of the quantization values almost surely. In addition, we show that the expected value of the consensus is equal to the average of the original sensor data. We derive an upper bound on the mean-square-error performance of the probabilistically quantized distributed averaging (PQDA). Moreover, we show that the convergence of the PQDA is monotonic by studying the evolution of the minimum-length interval containing the node values. We reveal that the length of this interval is a monotonically nonincreasing function with limit zero. We also demonstrate that all the node values, in the worst case, converge to the final two quantization bins at the same rate as standard unquantized consensus. Finally, we report the results of simulations conducted to evaluate the behavior and the effectiveness of the proposed algorithm in various scenarios.
Distributed Average Consensus With Dithered Quantization
In this paper, we develop algorithms for distributed computation of averages of the node data over networks with bandwidth/power constraints… (see more) or large volumes of data. Distributed averaging algorithms fail to achieve consensus when deterministic uniform quantization is adopted. We propose a distributed algorithm in which the nodes utilize probabilistically quantized information, i.e., dithered quantization, to communicate with each other. The algorithm we develop is a dynamical system that generates sequences achieving a consensus at one of the quantization values almost surely. In addition, we show that the expected value of the consensus is equal to the average of the original sensor data. We derive an upper bound on the mean-square-error performance of the probabilistically quantized distributed averaging (PQDA). Moreover, we show that the convergence of the PQDA is monotonic by studying the evolution of the minimum-length interval containing the node values. We reveal that the length of this interval is a monotonically nonincreasing function with limit zero. We also demonstrate that all the node values, in the worst case, converge to the final two quantization bins at the same rate as standard unquantized consensus. Finally, we report the results of simulations conducted to evaluate the behavior and the effectiveness of the proposed algorithm in various scenarios.
Greedy Gossip With Eavesdropping
Deniz Ustebay
Boris Oreshkin
This paper presents greedy gossip with eavesdropping (GGE), a novel randomized gossip algorithm for distributed computation of the average c… (see more)onsensus problem. In gossip algorithms, nodes in the network randomly communicate with their neighbors and exchange information iteratively. The algorithms are simple and decentralized, making them attractive for wireless network applications. In general, gossip algorithms are robust to unreliable wireless conditions and time varying network topologies. In this paper, we introduce GGE and demonstrate that greedy updates lead to rapid convergence. We do not require nodes to have any location information. Instead, greedy updates are made possible by exploiting the broadcast nature of wireless communications. During the operation of GGE, when a node decides to gossip, instead of choosing one of its neighbors at random, it makes a greedy selection, choosing the node which has the value most different from its own. In order to make this selection, nodes need to know their neighbors' values. Therefore, we assume that all transmissions are wireless broadcasts and nodes keep track of their neighbors' values by eavesdropping on their communications. We show that the convergence of GGE is guaranteed for connected network topologies. We also study the rates of convergence and illustrate, through theoretical bounds and numerical simulations, that GGE consistently outperforms randomized gossip and performs comparably to geographic gossip on moderate-sized random geometric graph topologies.
Recent Advances in Reinforcement Learning
Recent Advances in Reinforcement Learning
Advances in Information Retrieval
Diane Kelly
Nicholas J. Belkin
James Allan
Advances in Information Retrieval
Diane Kelly
Nicholas J. Belkin
James Allan
L AUGHING H YENA D ISTILLERY Extracting Compact Recurrences From Convolutions
∗. StefanoMassaroli
∗. MichaelPoli
∗. DanielY.Fu
Hermann Kumbong
Rom N. Parnichkun
Aman Timalsina
David W. Romero
Quinn McIntyre
Beidi Chen
Atri Rudra
Ce Zhang
Christopher Re
Stefano Ermon
Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers… (see more). In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads – naively requiring a full pass (or caching of activations) over the input sequence for each generated token – similarly to attention-based models. In this paper, we seek to enable O (1) compute and memory cost per token in any pre-trained long convolution architecture to reduce memory footprint and increase throughput during generation. Concretely, our methods consist in extracting low-dimensional linear state-space models from each convolution layer, building upon rational interpolation and model-order reduction techniques. We further introduce architectural improvements to convolution-based layers such as Hyena : by weight-tying the filters across channels into heads , we achieve higher pre-training quality and reduce the number of filters to be distilled. The resulting model achieves 10 × higher throughput than Transformers and 1 . 5 × higher than Hyena at 1 . 3 B parameters, without any loss in quality after distillation.