Publications

Stochastic Decoding of Turbo Codes
Quang Trung Dong
Matthieu Arzel
Christophe Jego
Stochastic computation is a technique in which operations on probabilities are performed on random bit streams. Stochastic decoding of forwa… (voir plus)rd error-correction (FEC) codes is inspired by this technique. This paper extends the application of the stochastic decoding approach to the families of convolutional codes and turbo codes. It demonstrates that stochastic computation is a promising solution to improve the data throughput of turbo decoders with very simple implementations. Stochastic fully-parallel turbo decoders are shown to achieve the error correction performance of conventional a posteriori probability (APP) decoders. To our knowledge, this is the first stochastic turbo decoder which decodes a state-of-the-art turbo code. Additionally, an innovative systematic technique is proposed to cope with stochastic additions, responsible for the throughput bottleneck.
Stochastic Decoding of Turbo Codes
Q. Dong
Matthieu Arzel
Christophe Jego
Stochastic computation is a technique in which operations on probabilities are performed on random bit streams. Stochastic decoding of forwa… (voir plus)rd error-correction (FEC) codes is inspired by this technique. This paper extends the application of the stochastic decoding approach to the families of convolutional codes and turbo codes. It demonstrates that stochastic computation is a promising solution to improve the data throughput of turbo decoders with very simple implementations. Stochastic fully-parallel turbo decoders are shown to achieve the error correction performance of conventional a posteriori probability (APP) decoders. To our knowledge, this is the first stochastic turbo decoder which decodes a state-of-the-art turbo code. Additionally, an innovative systematic technique is proposed to cope with stochastic additions, responsible for the throughput bottleneck.
Multiscale Gossip for Efficient Decentralized Averaging in Wireless Packet Networks
Konstantinos I. Tsianos
This paper describes and analyzes a hierarchical algorithm called Multiscale Gossip for solving the distributed average consensus problem in… (voir plus) wireless sensor networks. The algorithm proceeds by recursively partitioning a given network. Initially, nodes at the finest scale gossip to compute local averages. Then, using multi-hop communication and geographic routing to communicate between nodes that are not directly connected, these local averages are progressively fused up the hierarchy until the global average is computed. We show that the proposed hierarchical scheme with
Relaxation Dynamics in Stochastic Iterative Decoders
Saeed Sharifi Tehrani
Chris Winstead
Shie Mannor
Sheryl L. Howard
Vincent C. Gaudet
Stochastic decoding is a recently proposed approach for graph-based iterative error control decoding. We present and investigate three hyste… (voir plus)resis methods for stochastic decoding on graphs with cycles and show their close relationship with the successive relaxation method. Implementation results demonstrate the tradeoff in bit error rate performance with circuit complexity.
Relaxation Dynamics in Stochastic Iterative Decoders
S. Tehrani
C. Winstead
Shie Mannor
S. Howard
Vincent C. Gaudet
Stochastic decoding is a recently proposed approach for graph-based iterative error control decoding. We present and investigate three hyste… (voir plus)resis methods for stochastic decoding on graphs with cycles and show their close relationship with the successive relaxation method. Implementation results demonstrate the tradeoff in bit error rate performance with circuit complexity.
Majority-Based Tracking Forecast Memories for Stochastic LDPC Decoding
S. Tehrani
Ali Naderi
Guy-Armand Kamendje
Saied Hemati
Shie Mannor
This paper proposes majority-based tracking forecast memories (MTFMs) for area efficient high throughput ASIC implementation of stochastic L… (voir plus)ow-Density Parity-Check (LDPC) decoders. The proposed method is applied for ASIC implementation of a fully parallel stochastic decoder that decodes the (2048, 1723) LDPC code from the IEEE 802.3an (10GBASE-T) standard. The decoder occupies a silicon core area of 6.38 mm2 in CMOS 90 nm technology, achieves a maximum clock frequency of 500 MHz, and provides a maximum core throughput of 61.3 Gb/s. The decoder also has good decoding performance and error-floor behavior and provides a bit error rate (BER) of about 4 × 10-13 at Eb/N0=5.15 dB. To the best of our knowledge, the implemented decoder is the most area efficient fully parallel soft -decision LDPC decoder reported in the literature.
Majority-Based Tracking Forecast Memories for Stochastic LDPC Decoding
Saeed Sharifi Tehrani
Ali Naderi
Guy-Armand Kamendje
Saied Hemati
Shie Mannor
This paper proposes majority-based tracking forecast memories (MTFMs) for area efficient high throughput ASIC implementation of stochastic L… (voir plus)ow-Density Parity-Check (LDPC) decoders. The proposed method is applied for ASIC implementation of a fully parallel stochastic decoder that decodes the (2048, 1723) LDPC code from the IEEE 802.3an (10GBASE-T) standard. The decoder occupies a silicon core area of 6.38 mm2 in CMOS 90 nm technology, achieves a maximum clock frequency of 500 MHz, and provides a maximum core throughput of 61.3 Gb/s. The decoder also has good decoding performance and error-floor behavior and provides a bit error rate (BER) of about 4 × 10-13 at Eb/N0=5.15 dB. To the best of our knowledge, the implemented decoder is the most area efficient fully parallel soft -decision LDPC decoder reported in the literature.
Greedy Gossip With Eavesdropping
Deniz Ustebay
Boris Oreshkin
This paper presents greedy gossip with eavesdropping (GGE), a novel randomized gossip algorithm for distributed computation of the average c… (voir plus)onsensus problem. In gossip algorithms, nodes in the network randomly communicate with their neighbors and exchange information iteratively. The algorithms are simple and decentralized, making them attractive for wireless network applications. In general, gossip algorithms are robust to unreliable wireless conditions and time varying network topologies. In this paper, we introduce GGE and demonstrate that greedy updates lead to rapid convergence. We do not require nodes to have any location information. Instead, greedy updates are made possible by exploiting the broadcast nature of wireless communications. During the operation of GGE, when a node decides to gossip, instead of choosing one of its neighbors at random, it makes a greedy selection, choosing the node which has the value most different from its own. In order to make this selection, nodes need to know their neighbors' values. Therefore, we assume that all transmissions are wireless broadcasts and nodes keep track of their neighbors' values by eavesdropping on their communications. We show that the convergence of GGE is guaranteed for connected network topologies. We also study the rates of convergence and illustrate, through theoretical bounds and numerical simulations, that GGE consistently outperforms randomized gossip and performs comparably to geographic gossip on moderate-sized random geometric graph topologies.
Optimization and Analysis of Distributed Averaging With Short Node Memory
Boris Oreshkin
Distributed averaging describes a class of network algorithms for the decentralized computation of aggregate statistics. Initially, each nod… (voir plus)e has a scalar data value, and the goal is to compute the average of these values at every node (the so-called average consensus problem). Nodes iteratively exchange information with their neighbors and perform local updates until the value at every node converges to the initial network average. Much previous work has focused on algorithms where each node maintains and updates a single value; every time an update is performed, the previous value is forgotten. Convergence to the average consensus is achieved asymptotically. The convergence rate is fundamentally limited by network connectivity, and it can be prohibitively slow on topologies such as grids and random geometric graphs, even if the update rules are optimized. In this paper, we provide the first theoretical demonstration that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's current and previous values (i.e., two memory taps), and our update rule computes a combination of the predictor and the usual weighted linear combination of values received from neighboring nodes. We derive the optimal mixing parameter for combining the predictor with the neighbors' values, and conduct a theoretical analysis of the improvement in convergence rate that can be achieved using this acceleration methodology. For a chain topology on N nodes, this leads to a factor of N improvement over standard consensus, and for a two-dimensional grid, our approach achieves a factor of ¿N improvement.
Theano: A CPU and GPU Math Compiler in Python
James Bergstra
Olivier Breuleux
Frédéric Bastien
Pascal Lamblin
Razvan Pascanu
Guillaume Desjardins
Joseph Turian
David Warde-Farley
Theano: A CPU and GPU Math Compiler in Python
James Bergstra
Olivier Breuleux
Frédéric Bastien
Pascal Lamblin
Razvan Pascanu
Guillaume Desjardins
Joseph P. Turian
David Warde-Farley
Theano is a compiler for mathematical expressions in Python that combines the convenience of NumPy's syntax with the speed of optimized nati… (voir plus)ve machine language. The user composes mathematical expressions in a high-level description that mimics NumPy's syntax and semantics, while being statically typed and functional (as opposed to imperative). These expressions allow Theano to provide symbolic differentiation. Before performing computation, Theano optimizes the choice of expressions, translates them into C++ (or CUDA for GPU), compiles them into dynamically loaded Python modules, all automatically. Common machine learn- ing algorithms implemented with Theano are from 1:6 to 7:5 faster than competitive alternatives (including those implemented with C/C++, NumPy/SciPy and MATLAB) when compiled for the CPU and between 6:5 and 44 faster when compiled for the GPU. This paper illustrates how to use Theano, outlines the scope of the compiler, provides benchmarks on both CPU and GPU processors, and explains its overall design.
Optimization and Analysis of Distributed Averaging With Short Node Memory
Boris Oreshkin
Distributed averaging describes a class of network algorithms for the decentralized computation of aggregate statistics. Initially, each nod… (voir plus)e has a scalar data value, and the goal is to compute the average of these values at every node (the so-called average consensus problem). Nodes iteratively exchange information with their neighbors and perform local updates until the value at every node converges to the initial network average. Much previous work has focused on algorithms where each node maintains and updates a single value; every time an update is performed, the previous value is forgotten. Convergence to the average consensus is achieved asymptotically. The convergence rate is fundamentally limited by network connectivity, and it can be prohibitively slow on topologies such as grids and random geometric graphs, even if the update rules are optimized. In this paper, we provide the first theoretical demonstration that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's current and previous values (i.e., two memory taps), and our update rule computes a combination of the predictor and the usual weighted linear combination of values received from neighboring nodes. We derive the optimal mixing parameter for combining the predictor with the neighbors' values, and conduct a theoretical analysis of the improvement in convergence rate that can be achieved using this acceleration methodology. For a chain topology on N nodes, this leads to a factor of N improvement over standard consensus, and for a two-dimensional grid, our approach achieves a factor of ¿N improvement.