Warren Gross

Implementation of Sparse Superposition Codes

Carlo Condo

Sparse superposition codes (SSCs) are capacity achieving codes whose decoding process is a linear sensing problem. Decoding approaches thus … (see more)exploit the approximate message passing algorithm, which has been proven to be effective in compressing sensing. Previous work from the authors has evaluated the error correction performance of SSCs under finite precision and finite code length. This paper proposes the first SSC encoder and decoder architectures in the literature. The architectures are parametrized and applicable to all SSCs: A set of wide-ranging case studies is then considered, and code-specific approximations, along with implementation results in 65 nm CMOS technology, are then provided. The encoding process can be carried out with low power consumption (≤2.103 mW), while the semi-parallel decoder architecture can reach a throughput of 1.3 Gb/s with a 768 × 6-bit SSC codeword and an area occupation of 2.43 mm2.

2017-05-01

IEEE Transactions on Signal Processing (published)

doi.org

Implementation of Sparse Superposition Codes

Carlo Condo

Warren Gross

Sparse superposition codes (SSCs) are capacity achieving codes whose decoding process is a linear sensing problem. Decoding approaches thus … (see more)exploit the approximate message passing algorithm, which has been proven to be effective in compressing sensing. Previous work from the authors has evaluated the error correction performance of SSCs under finite precision and finite code length. This paper proposes the first SSC encoder and decoder architectures in the literature. The architectures are parametrized and applicable to all SSCs: A set of wide-ranging case studies is then considered, and code-specific approximations, along with implementation results in 65 nm CMOS technology, are then provided. The encoding process can be carried out with low power consumption (

2017-05-01

IEEE Transactions on Signal Processing (published)

doi.org

Fast and Flexible Successive-Cancellation List Decoders for Polar Codes

Seyyed Ali Hashemi

Carlo Condo

Warren Gross

Polar codes have gained significant amount of attention during the past few years and have been selected as a coding scheme for the next gen… (see more)eration of mobile broadband standard. Among decoding schemes, successive-cancellation list (SCL) decoding provides a reasonable tradeoff between the error-correction performance and hardware implementation complexity when used to decode polar codes, at the cost of limited throughput. The simplified SCL (SSCL) and its extension SSCL-SPC increase the speed of decoding by removing redundant calculations when encountering particular information and frozen bit patterns (rate one and single parity check codes), while keeping the error-correction performance unaltered. In this paper, we improve SSCL and SSCL-SPC by proving that the list size imposes a specific number of path splitting required to decode rate one and single parity check codes. Thus, the number of splitting can be limited while guaranteeing exactly the same error-correction performance as if the paths were forked at each bit estimation. We call the new decoding algorithms Fast-SSCL and Fast-SSCL-SPC. Moreover, we show that the number of path forks in a practical application can be tuned to achieve desirable speed, while keeping the error-correction performance almost unchanged. Hardware architectures implementing both algorithms are then described and implemented: It is shown that our design can achieve

2017-03-23

ArXiv (preprint)

doi.org

arxiv.org

Fault-Tolerant Associative Memories Based on $c$-Partite Graphs

François Leduc-Primeau

Vincent Gripon

Michael Rabbat

Warren Gross

Associative memories allow the retrieval of previously stored messages given a part of their content. In this paper, we are interested in as… (see more)sociative memories based on c-partite graphs that were recently introduced. These memories are almost optimal in terms of the amount of storage they require (efficiency) and allow retrieving messages with low complexity. We propose a generic implementation model for the retrieval algorithm that can be readily mapped to an integrated circuit and study the retrieval performance when hardware components are affected by faults. We show using analytical and simulation results that these associative memories can be made resilient to circuit faults with a minor modification of the retrieval algorithm. In one example, the memory retains 88% of its efficiency when 1% of the storage cells are faulty, or 98% when 0.1% of the binary outputs of the retrieval algorithm are faulty. When considering storage faults, the fault tolerance exhibited by the proposed associative memory can be comparable to using a capacity-achieving error correction code for protecting the stored information.

2016-02-15

IEEE Transactions on Signal Processing (published)

doi.org

Fault-Tolerant Associative Memories Based on $c$-Partite Graphs

François Leduc-Primeau

Vincent Gripon

Michael Rabbat

Warren Gross

Associative memories allow the retrieval of previously stored messages given a part of their content. In this paper, we are interested in as… (see more)sociative memories based on c-partite graphs that were recently introduced. These memories are almost optimal in terms of the amount of storage they require (efficiency) and allow retrieving messages with low complexity. We propose a generic implementation model for the retrieval algorithm that can be readily mapped to an integrated circuit and study the retrieval performance when hardware components are affected by faults. We show using analytical and simulation results that these associative memories can be made resilient to circuit faults with a minor modification of the retrieval algorithm. In one example, the memory retains 88% of its efficiency when 1% of the storage cells are faulty, or 98% when 0.1% of the binary outputs of the retrieval algorithm are faulty. When considering storage faults, the fault tolerance exhibited by the proposed associative memory can be comparable to using a capacity-achieving error correction code for protecting the stored information.

2016-02-01

IEEE Transactions on Signal Processing (published)

doi.org

A Scalable Successive-Cancellation Decoder for Polar Codes

Alexandre J. Raymond

Warren Gross

Polar codes are the first error-correcting codes to provably achieve channel capacity, asymptotically in code length, with an explicit const… (see more)ruction. However, under successive-cancellation decoding, polar codes require very long code lengths to compete with existing modern codes. Nonetheless, the successive cancellation algorithm enables very-low-complexity implementations in hardware, due to the regular structure exhibited by polar codes. In this paper, we present an improved architecture for successive-cancellation decoding of polar codes, making use of a novel semi-parallel, encoder-based partial-sum computation module. We also provide quantization results for realistic code length N=215, and explore various optimization techniques such as a chained processing element and a variable quantization scheme. This design is shown to scale to code lengths of up to N=221, enabled by its low logic use, low register use and simple datapaths, limited almost exclusively by the amount of available SRAM. It also supports an overlapped loading of frames, allowing full-throughput decoding with a single set of input buffers.

2014-10-15

IEEE Transactions on Signal Processing (published)

doi.org

arxiv.org

High-Throughput Energy-Efficient LDPC Decoders Using Differential Binary Message Passing

Kevin Cushon

Saied Hemati

Camille Leroux

Shie Mannor

Warren Gross

In this paper, we present energy-efficient architectures for decoders of low-density parity check (LDPC) codes using the differential decodi… (see more)ng with binary message passing (DD-BMP) algorithm and its modified variant (MDD-BMP). We also propose an improved differential binary (IDB) decoding algorithm. These algorithms offer significant intrinsic advantages in the energy domain: simple computations, low interconnect complexity, and very high throughput, while achieving error correction performance up to within 0.25 dB of the offset min-sum algorithm. We report on fully parallel decoder implementations of (273, 191), (1023, 781), and (4095, 3367) finite geometry-based LDPC codes in 65 nm CMOS. Using the MDD-BMP algorithm, these decoders achieve respective areas of 0.28 mm2, 1.38 mm2, and 15.37 mm2, average throughputs of 37 Gbps, 75 Gbps, and 141 Gbps, and energy efficiencies of 4.9 pJ/bit, 13.2 pJ/bit, and 37.9 pJ/bit with a 1.0 V supply voltage in post-layout simulations. At a reduced supply voltage of 0.8 V, these decoders achieve respective throughputs of 26 Gbps, 54 Gbps, and 94 Gbps, and energy efficiencies of 3.1 pJ/bit, 8.2 pJ/bit, and 23.5 pJ/bit. We also report on a fully parallel implementation of IDB for the (2048, 1723) LDPC code specified in the IEEE 802.3an (10GBASE-T) standard. This decoder achieves an area of 1.44 mm2, average throughput of 172 Gbps, and an energy efficiency of 2.8 pJ/bit with a 1.0 V supply voltage; at 0.8 V, it achieves throughput of 116 Gbps and energy efficiency of 1.7 pJ/bit.

2014-02-01

IEEE Transactions on Signal Processing (published)

doi.org

High-Throughput Energy-Efficient LDPC Decoders Using Differential Binary Message Passing

Kevin Cushon

Saied Hemati

Camille Leroux

Shie Mannor

Warren Gross

In this paper, we present energy-efficient architectures for decoders of low-density parity check (LDPC) codes using the differential decodi… (see more)ng with binary message passing (DD-BMP) algorithm and its modified variant (MDD-BMP). We also propose an improved differential binary (IDB) decoding algorithm. These algorithms offer significant intrinsic advantages in the energy domain: simple computations, low interconnect complexity, and very high throughput, while achieving error correction performance up to within 0.25 dB of the offset min-sum algorithm. We report on fully parallel decoder implementations of (273, 191), (1023, 781), and (4095, 3367) finite geometry-based LDPC codes in 65 nm CMOS. Using the MDD-BMP algorithm, these decoders achieve respective areas of 0.28 mm2, 1.38 mm2, and 15.37 mm2, average throughputs of 37 Gbps, 75 Gbps, and 141 Gbps, and energy efficiencies of 4.9 pJ/bit, 13.2 pJ/bit, and 37.9 pJ/bit with a 1.0 V supply voltage in post-layout simulations. At a reduced supply voltage of 0.8 V, these decoders achieve respective throughputs of 26 Gbps, 54 Gbps, and 94 Gbps, and energy efficiencies of 3.1 pJ/bit, 8.2 pJ/bit, and 23.5 pJ/bit. We also report on a fully parallel implementation of IDB for the (2048, 1723) LDPC code specified in the IEEE 802.3an (10GBASE-T) standard. This decoder achieves an area of 1.44 mm2, average throughput of 172 Gbps, and an energy efficiency of 2.8 pJ/bit with a 1.0 V supply voltage; at 0.8 V, it achieves throughput of 116 Gbps and energy efficiency of 1.7 pJ/bit.

2014-02-01

IEEE Transactions on Signal Processing (published)

doi.org

Adaptive Multiset Stochastic Decoding of Non-Binary LDPC Codes

Alexandru Ciobanu

Saied Hemati

Warren Gross

We propose a non-binary stochastic decoding algorithm for low-density parity-check (LDPC) codes over GF(q) with degree two variable nodes, c… (see more)alled Adaptive Multiset Stochastic Algorithm (AMSA). The algorithm uses multisets, an extension of sets that allows multiple occurrences of an element, to represent probability mass functions that simplifies the structure of the variable nodes. The run-time complexity of one decoding cycle using AMSA is O(q) for conventional memory architectures, and O(1) if a custom memory architecture is used. Two fully-parallel AMSA decoders are implemented on FPGA for two (192,96) (2,4)-regular codes over GF(64) and GF(256), both achieving a maximum clock frequency of 108 MHz. The GF(64) decoder has a coded throughput of 65 Mb/s at Eb/N0=2.4 dB when using conventional memory, while a decoder using the custom memory version can achieve 698 Mb/s at the same Eb/N0. At a frame error rate (FER) of 2×10-6 the GF(64) version of the algorithm is only 0.04 dB away from the floating-point SPA performance, and for the GF(256) code the difference is 0.2 dB. To the best of our knowledge, this is the first fully parallel non-binary LDPC decoder over GF(256) reported in the literature.

2013-08-15

IEEE Transactions on Signal Processing (published)

doi.org

Adaptive Multiset Stochastic Decoding of Non-Binary LDPC Codes

Alexandru Sorin Ciobanu

Saied Hemati

Warren Gross

We propose a non-binary stochastic decoding algorithm for low-density parity-check (LDPC) codes over GF(q) with degree two variable nodes, c… (see more)alled Adaptive Multiset Stochastic Algorithm (AMSA). The algorithm uses multisets, an extension of sets that allows multiple occurrences of an element, to represent probability mass functions that simplifies the structure of the variable nodes. The run-time complexity of one decoding cycle using AMSA is O(q) for conventional memory architectures, and O(1) if a custom memory architecture is used. Two fully-parallel AMSA decoders are implemented on FPGA for two (192,96) (2,4)-regular codes over GF(64) and GF(256), both achieving a maximum clock frequency of 108 MHz. The GF(64) decoder has a coded throughput of 65 Mb/s at Eb/N0=2.4 dB when using conventional memory, while a decoder using the custom memory version can achieve 698 Mb/s at the same Eb/N0. At a frame error rate (FER) of 2×10-6 the GF(64) version of the algorithm is only 0.04 dB away from the floating-point SPA performance, and for the GF(256) code the difference is 0.2 dB. To the best of our knowledge, this is the first fully parallel non-binary LDPC decoder over GF(256) reported in the literature.

2013-08-01

IEEE Transactions on Signal Processing (published)

doi.org

A Scalable Successive-Cancellation Decoder for Polar Codes

Alexandre J. Raymond

Warren Gross

Polar codes are the first error-correcting codes to provably achieve channel capacity, asymptotically in code length, with an explicit const… (see more)ruction. However, under successive-cancellation decoding, polar codes require very long code lengths to compete with existing modern codes. Nonetheless, the successive cancellation algorithm enables very-low-complexity implementations in hardware, due to the regular structure exhibited by polar codes. In this paper, we present an improved architecture for successive-cancellation decoding of polar codes, making use of a novel semi-parallel, encoder-based partial-sum computation module. We also provide quantization results for realistic code length N=215, and explore various optimization techniques such as a chained processing element and a variable quantization scheme. This design is shown to scale to code lengths of up to N=221, enabled by its low logic use, low register use and simple datapaths, limited almost exclusively by the amount of available SRAM. It also supports an overlapped loading of frames, allowing full-throughput decoding with a single set of input buffers.

2013-06-14

ArXiv (preprint)

doi.org

arxiv.org

A Semi-Parallel Successive-Cancellation Decoder for Polar Codes

Camille Leroux

Alexandre J. Raymond

Gabi Sarkis

Warren Gross

Polar codes are a recently discovered family of capacity-achieving codes that are seen as a major breakthrough in coding theory. Motivated b… (see more)y the recent rapid progress in the theory of polar codes, we propose a semi-parallel architecture for the implementation of successive cancellation decoding. We take advantage of the recursive structure of polar codes to make efficient use of processing resources. The derived architecture has a very low processing complexity while the memory complexity remains similar to that of previous architectures. This drastic reduction in processing complexity allows very large polar code decoders to be implemented in hardware. An N=217 polar code successive cancellation decoder is implemented in an FPGA. We also report synthesis results for ASIC.

2013-01-01

IEEE Transactions on Signal Processing (published)

doi.org

Hackathon | Building safer AI for youth mental health

Indigenous Pathfinders in AI

AI Advantage

Warren Gross

Biography

Current Students

Publications

Hackathon | Building safer AI for youth mental health

Indigenous Pathfinders in AI

AI Advantage

Popular keywords:

Warren Gross

Biography

Current Students

Publications