Publications

Relaxation Dynamics in Stochastic Iterative Decoders

S. Tehrani

C. Winstead

Shie Mannor

S. Howard

Vincent C. Gaudet

Stochastic decoding is a recently proposed approach for graph-based iterative error control decoding. We present and investigate three hyste… (voir plus)resis methods for stochastic decoding on graphs with cycles and show their close relationship with the successive relaxation method. Implementation results demonstrate the tradeoff in bit error rate performance with circuit complexity.

2010-11-01

IEEE Transactions on Signal Processing (publié)

doi.org

Relaxation Dynamics in Stochastic Iterative Decoders

Saeed Sharifi Tehrani

Chris Winstead

Warren Gross

Shie Mannor

Sheryl L. Howard

Vincent C. Gaudet

Stochastic decoding is a recently proposed approach for graph-based iterative error control decoding. We present and investigate three hyste… (voir plus)resis methods for stochastic decoding on graphs with cycles and show their close relationship with the successive relaxation method. Implementation results demonstrate the tradeoff in bit error rate performance with circuit complexity.

2010-11-01

IEEE Transactions on Signal Processing (publié)

doi.org

Majority-Based Tracking Forecast Memories for Stochastic LDPC Decoding

S. Tehrani

Ali Naderi

Guy-Armand Kamendje

Saied Hemati

Shie Mannor

Warren Gross

This paper proposes majority-based tracking forecast memories (MTFMs) for area efficient high throughput ASIC implementation of stochastic L… (voir plus)ow-Density Parity-Check (LDPC) decoders. The proposed method is applied for ASIC implementation of a fully parallel stochastic decoder that decodes the (2048, 1723) LDPC code from the IEEE 802.3an (10GBASE-T) standard. The decoder occupies a silicon core area of 6.38 mm2 in CMOS 90 nm technology, achieves a maximum clock frequency of 500 MHz, and provides a maximum core throughput of 61.3 Gb/s. The decoder also has good decoding performance and error-floor behavior and provides a bit error rate (BER) of about 4 × 10-13 at Eb/N0=5.15 dB. To the best of our knowledge, the implemented decoder is the most area efficient fully parallel soft -decision LDPC decoder reported in the literature.

2010-09-01

IEEE Transactions on Signal Processing (publié)

doi.org

Majority-Based Tracking Forecast Memories for Stochastic LDPC Decoding

Saeed Sharifi Tehrani

Ali Naderi

Guy-Armand Kamendje

Saied Hemati

Shie Mannor

Warren Gross

This paper proposes majority-based tracking forecast memories (MTFMs) for area efficient high throughput ASIC implementation of stochastic L… (voir plus)ow-Density Parity-Check (LDPC) decoders. The proposed method is applied for ASIC implementation of a fully parallel stochastic decoder that decodes the (2048, 1723) LDPC code from the IEEE 802.3an (10GBASE-T) standard. The decoder occupies a silicon core area of 6.38 mm2 in CMOS 90 nm technology, achieves a maximum clock frequency of 500 MHz, and provides a maximum core throughput of 61.3 Gb/s. The decoder also has good decoding performance and error-floor behavior and provides a bit error rate (BER) of about 4 × 10-13 at Eb/N0=5.15 dB. To the best of our knowledge, the implemented decoder is the most area efficient fully parallel soft -decision LDPC decoder reported in the literature.

2010-09-01

IEEE Transactions on Signal Processing (publié)

doi.org

Greedy Gossip With Eavesdropping

Deniz Ustebay

Boris Oreshkin

Mark Coates

Michael Rabbat

This paper presents greedy gossip with eavesdropping (GGE), a novel randomized gossip algorithm for distributed computation of the average c… (voir plus)onsensus problem. In gossip algorithms, nodes in the network randomly communicate with their neighbors and exchange information iteratively. The algorithms are simple and decentralized, making them attractive for wireless network applications. In general, gossip algorithms are robust to unreliable wireless conditions and time varying network topologies. In this paper, we introduce GGE and demonstrate that greedy updates lead to rapid convergence. We do not require nodes to have any location information. Instead, greedy updates are made possible by exploiting the broadcast nature of wireless communications. During the operation of GGE, when a node decides to gossip, instead of choosing one of its neighbors at random, it makes a greedy selection, choosing the node which has the value most different from its own. In order to make this selection, nodes need to know their neighbors' values. Therefore, we assume that all transmissions are wireless broadcasts and nodes keep track of their neighbors' values by eavesdropping on their communications. We show that the convergence of GGE is guaranteed for connected network topologies. We also study the rates of convergence and illustrate, through theoretical bounds and numerical simulations, that GGE consistently outperforms randomized gossip and performs comparably to geographic gossip on moderate-sized random geometric graph topologies.

2010-07-01

IEEE Transactions on Signal Processing (publié)

doi.org

arxiv.org

Optimization and Analysis of Distributed Averaging With Short Node Memory

Boris Oreshkin

Mark Coates

Michael Rabbat

Distributed averaging describes a class of network algorithms for the decentralized computation of aggregate statistics. Initially, each nod… (voir plus)e has a scalar data value, and the goal is to compute the average of these values at every node (the so-called average consensus problem). Nodes iteratively exchange information with their neighbors and perform local updates until the value at every node converges to the initial network average. Much previous work has focused on algorithms where each node maintains and updates a single value; every time an update is performed, the previous value is forgotten. Convergence to the average consensus is achieved asymptotically. The convergence rate is fundamentally limited by network connectivity, and it can be prohibitively slow on topologies such as grids and random geometric graphs, even if the update rules are optimized. In this paper, we provide the first theoretical demonstration that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's current and previous values (i.e., two memory taps), and our update rule computes a combination of the predictor and the usual weighted linear combination of values received from neighboring nodes. We derive the optimal mixing parameter for combining the predictor with the neighbors' values, and conduct a theoretical analysis of the improvement in convergence rate that can be achieved using this acceleration methodology. For a chain topology on N nodes, this leads to a factor of N improvement over standard consensus, and for a two-dimensional grid, our approach achieves a factor of ¿N improvement.

2010-05-01

IEEE Transactions on Signal Processing (publié)

doi.org

arxiv.org

Theano: A CPU and GPU Math Compiler in Python

James Bergstra

Olivier Breuleux

Frédéric Bastien

Pascal Lamblin

Razvan Pascanu

Guillaume Desjardins

Joseph Turian

David Warde-Farley

Yoshua Bengio

2010-01-01

Proceedings of the Python in Science Conference (publié)

doi.org

Theano: A CPU and GPU Math Compiler in Python

James Bergstra

Olivier Breuleux

Frédéric Bastien

Pascal Lamblin

Razvan Pascanu

Guillaume Desjardins

Joseph P. Turian

David Warde-Farley

Yoshua Bengio

Theano is a compiler for mathematical expressions in Python that combines the convenience of NumPy's syntax with the speed of optimized nati… (voir plus)ve machine language. The user composes mathematical expressions in a high-level description that mimics NumPy's syntax and semantics, while being statically typed and functional (as opposed to imperative). These expressions allow Theano to provide symbolic differentiation. Before performing computation, Theano optimizes the choice of expressions, translates them into C++ (or CUDA for GPU), compiles them into dynamically loaded Python modules, all automatically. Common machine learn- ing algorithms implemented with Theano are from 1:6 to 7:5 faster than competitive alternatives (including those implemented with C/C++, NumPy/SciPy and MATLAB) when compiled for the CPU and between 6:5 and 44 faster when compiled for the GPU. This paper illustrates how to use Theano, outlines the scope of the compiler, provides benchmarks on both CPU and GPU processors, and explains its overall design.

2010-01-01

SciPy (publié)

doi.org

Optimization and Analysis of Distributed Averaging With Short Node Memory

Boris Oreshkin

Mark Coates

Michael Rabbat

Distributed averaging describes a class of network algorithms for the decentralized computation of aggregate statistics. Initially, each nod… (voir plus)e has a scalar data value, and the goal is to compute the average of these values at every node (the so-called average consensus problem). Nodes iteratively exchange information with their neighbors and perform local updates until the value at every node converges to the initial network average. Much previous work has focused on algorithms where each node maintains and updates a single value; every time an update is performed, the previous value is forgotten. Convergence to the average consensus is achieved asymptotically. The convergence rate is fundamentally limited by network connectivity, and it can be prohibitively slow on topologies such as grids and random geometric graphs, even if the update rules are optimized. In this paper, we provide the first theoretical demonstration that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's current and previous values (i.e., two memory taps), and our update rule computes a combination of the predictor and the usual weighted linear combination of values received from neighboring nodes. We derive the optimal mixing parameter for combining the predictor with the neighbors' values, and conduct a theoretical analysis of the improvement in convergence rate that can be achieved using this acceleration methodology. For a chain topology on N nodes, this leads to a factor of N improvement over standard consensus, and for a two-dimensional grid, our approach achieves a factor of ¿N improvement.

2009-03-20

ArXiv (prépublication)

doi.org

arxiv.org

Fully Parallel Stochastic LDPC Decoders

S. Tehrani

Shie Mannor

Warren Gross

Stochastic decoding is a new approach to iterative decoding on graphs. This paper presents a hardware architecture for fully parallel stocha… (voir plus)stic low-density parity-check (LDPC) decoders. To obtain the characteristics of the proposed architecture, we apply this architecture to decode an irregular state-of-the-art (1056,528) LDPC code on a Xilinx Virtex-4 LX200 field-programmable gate-array (FPGA) device. The implemented decoder achieves a clock frequency of 222 MHz and a throughput of about 1.66 Gb/s at Eb/N0=4.25 dB (a bit error rate of 10-8). It provides decoding performance within 0.5 and 0.25 dB of the floating-point sum-product algorithm with 32 and 16 iterations, respectively, and similar error-floor behavior. The decoder uses less than 40% of the lookup tables, flip-flops, and IO ports available on the FPGA device. The results provided in this paper validate the potential of stochastic LDPC decoding as a practical and competitive fully parallel decoding approach.

2008-11-01

IEEE Transactions on Signal Processing (publié)

doi.org

Fully Parallel Stochastic LDPC Decoders

Saeed Sharifi Tehrani

Shie Mannor

Warren Gross

Stochastic decoding is a new approach to iterative decoding on graphs. This paper presents a hardware architecture for fully parallel stocha… (voir plus)stic low-density parity-check (LDPC) decoders. To obtain the characteristics of the proposed architecture, we apply this architecture to decode an irregular state-of-the-art (1056,528) LDPC code on a Xilinx Virtex-4 LX200 field-programmable gate-array (FPGA) device. The implemented decoder achieves a clock frequency of 222 MHz and a throughput of about 1.66 Gb/s at

2008-11-01

IEEE Transactions on Signal Processing (publié)

doi.org

Stochastic Decoding of Linear Block Codes With High-Density Parity-Check Matrices

Saeed Sharifi Tehrani

Christophe Jego

Bo Zhu

Warren Gross

This correspondence extends the application of the recently proposed stochastic decoding approach to decode linear block codes with high-den… (voir plus)sity parity-check matrices and discusses its hardware complexity. Results demonstrate decoding performance close to floating-point iterative soft-input soft-output (SISO) decoding while offering nodes with considerably lower complexity compared to fixed-point SISO decoding.

2008-11-01

IEEE Transactions on Signal Processing (publié)

doi.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications