Publications

High-Throughput Energy-Efficient LDPC Decoders Using Differential Binary Message Passing
Kevin Cushon
Saied Hemati
Camille Leroux
Shie Mannor
In this paper, we present energy-efficient architectures for decoders of low-density parity check (LDPC) codes using the differential decodi… (see more)ng with binary message passing (DD-BMP) algorithm and its modified variant (MDD-BMP). We also propose an improved differential binary (IDB) decoding algorithm. These algorithms offer significant intrinsic advantages in the energy domain: simple computations, low interconnect complexity, and very high throughput, while achieving error correction performance up to within 0.25 dB of the offset min-sum algorithm. We report on fully parallel decoder implementations of (273, 191), (1023, 781), and (4095, 3367) finite geometry-based LDPC codes in 65 nm CMOS. Using the MDD-BMP algorithm, these decoders achieve respective areas of 0.28 mm2, 1.38 mm2, and 15.37 mm2, average throughputs of 37 Gbps, 75 Gbps, and 141 Gbps, and energy efficiencies of 4.9 pJ/bit, 13.2 pJ/bit, and 37.9 pJ/bit with a 1.0 V supply voltage in post-layout simulations. At a reduced supply voltage of 0.8 V, these decoders achieve respective throughputs of 26 Gbps, 54 Gbps, and 94 Gbps, and energy efficiencies of 3.1 pJ/bit, 8.2 pJ/bit, and 23.5 pJ/bit. We also report on a fully parallel implementation of IDB for the (2048, 1723) LDPC code specified in the IEEE 802.3an (10GBASE-T) standard. This decoder achieves an area of 1.44 mm2, average throughput of 172 Gbps, and an energy efficiency of 2.8 pJ/bit with a 1.0 V supply voltage; at 0.8 V, it achieves throughput of 116 Gbps and energy efficiency of 1.7 pJ/bit.
High-Throughput Energy-Efficient LDPC Decoders Using Differential Binary Message Passing
Kevin Cushon
Saied Hemati
Camille Leroux
Shie Mannor
In this paper, we present energy-efficient architectures for decoders of low-density parity check (LDPC) codes using the differential decodi… (see more)ng with binary message passing (DD-BMP) algorithm and its modified variant (MDD-BMP). We also propose an improved differential binary (IDB) decoding algorithm. These algorithms offer significant intrinsic advantages in the energy domain: simple computations, low interconnect complexity, and very high throughput, while achieving error correction performance up to within 0.25 dB of the offset min-sum algorithm. We report on fully parallel decoder implementations of (273, 191), (1023, 781), and (4095, 3367) finite geometry-based LDPC codes in 65 nm CMOS. Using the MDD-BMP algorithm, these decoders achieve respective areas of 0.28 mm2, 1.38 mm2, and 15.37 mm2, average throughputs of 37 Gbps, 75 Gbps, and 141 Gbps, and energy efficiencies of 4.9 pJ/bit, 13.2 pJ/bit, and 37.9 pJ/bit with a 1.0 V supply voltage in post-layout simulations. At a reduced supply voltage of 0.8 V, these decoders achieve respective throughputs of 26 Gbps, 54 Gbps, and 94 Gbps, and energy efficiencies of 3.1 pJ/bit, 8.2 pJ/bit, and 23.5 pJ/bit. We also report on a fully parallel implementation of IDB for the (2048, 1723) LDPC code specified in the IEEE 802.3an (10GBASE-T) standard. This decoder achieves an area of 1.44 mm2, average throughput of 172 Gbps, and an energy efficiency of 2.8 pJ/bit with a 1.0 V supply voltage; at 0.8 V, it achieves throughput of 116 Gbps and energy efficiency of 1.7 pJ/bit.
Bayesian and grAphical Models for Biomedical Imaging
M. Jorge Cardoso
Ivor J. A. Simpson
Annemie Ribbens
Bayesian and grAphical Models for Biomedical Imaging
M. Cardoso
Ivor J. A. Simpson
Annemie Ribbens
Machine Learning and Interpretation in Neuroimaging
Georg Langs
Leila Wehbe
Guillermo Cecchi
Kai-min Kevin Chang
Brian G Murphy
Experimental Algorithms
Samuel Rosat
Issmail ElHallaoui
François Soumis
Adaptive Multiset Stochastic Decoding of Non-Binary LDPC Codes
Alexandru Ciobanu
Saied Hemati
We propose a non-binary stochastic decoding algorithm for low-density parity-check (LDPC) codes over GF(q) with degree two variable nodes, c… (see more)alled Adaptive Multiset Stochastic Algorithm (AMSA). The algorithm uses multisets, an extension of sets that allows multiple occurrences of an element, to represent probability mass functions that simplifies the structure of the variable nodes. The run-time complexity of one decoding cycle using AMSA is O(q) for conventional memory architectures, and O(1) if a custom memory architecture is used. Two fully-parallel AMSA decoders are implemented on FPGA for two (192,96) (2,4)-regular codes over GF(64) and GF(256), both achieving a maximum clock frequency of 108 MHz. The GF(64) decoder has a coded throughput of 65 Mb/s at Eb/N0=2.4 dB when using conventional memory, while a decoder using the custom memory version can achieve 698 Mb/s at the same Eb/N0. At a frame error rate (FER) of 2×10-6 the GF(64) version of the algorithm is only 0.04 dB away from the floating-point SPA performance, and for the GF(256) code the difference is 0.2 dB. To the best of our knowledge, this is the first fully parallel non-binary LDPC decoder over GF(256) reported in the literature.
Adaptive Multiset Stochastic Decoding of Non-Binary LDPC Codes
Alexandru Sorin Ciobanu
Saied Hemati
We propose a non-binary stochastic decoding algorithm for low-density parity-check (LDPC) codes over GF(q) with degree two variable nodes, c… (see more)alled Adaptive Multiset Stochastic Algorithm (AMSA). The algorithm uses multisets, an extension of sets that allows multiple occurrences of an element, to represent probability mass functions that simplifies the structure of the variable nodes. The run-time complexity of one decoding cycle using AMSA is O(q) for conventional memory architectures, and O(1) if a custom memory architecture is used. Two fully-parallel AMSA decoders are implemented on FPGA for two (192,96) (2,4)-regular codes over GF(64) and GF(256), both achieving a maximum clock frequency of 108 MHz. The GF(64) decoder has a coded throughput of 65 Mb/s at Eb/N0=2.4 dB when using conventional memory, while a decoder using the custom memory version can achieve 698 Mb/s at the same Eb/N0. At a frame error rate (FER) of 2×10-6 the GF(64) version of the algorithm is only 0.04 dB away from the floating-point SPA performance, and for the GF(256) code the difference is 0.2 dB. To the best of our knowledge, this is the first fully parallel non-binary LDPC decoder over GF(256) reported in the literature.
A Scalable Successive-Cancellation Decoder for Polar Codes
Alexandre J. Raymond
Polar codes are the first error-correcting codes to provably achieve channel capacity, asymptotically in code length, with an explicit const… (see more)ruction. However, under successive-cancellation decoding, polar codes require very long code lengths to compete with existing modern codes. Nonetheless, the successive cancellation algorithm enables very-low-complexity implementations in hardware, due to the regular structure exhibited by polar codes. In this paper, we present an improved architecture for successive-cancellation decoding of polar codes, making use of a novel semi-parallel, encoder-based partial-sum computation module. We also provide quantization results for realistic code length N=215, and explore various optimization techniques such as a chained processing element and a variable quantization scheme. This design is shown to scale to code lengths of up to N=221, enabled by its low logic use, low register use and simple datapaths, limited almost exclusively by the amount of available SRAM. It also supports an overlapped loading of frames, allowing full-throughput decoding with a single set of input buffers.
Multiscale Gossip for Efficient Decentralized Averaging in Wireless Packet Networks
Konstantinos I. Tsianos
This paper describes and analyzes a hierarchical algorithm called Multiscale Gossip for solving the distributed average consensus problem in… (see more) wireless sensor networks. The algorithm proceeds by recursively partitioning a given network. Initially, nodes at the finest scale gossip to compute local averages. Then, using multi-hop communication and geographic routing to communicate between nodes that are not directly connected, these local averages are progressively fused up the hierarchy until the global average is computed. We show that the proposed hierarchical scheme with k=Θ(loglogn) levels of hierarchy is competitive with state-of-the-art randomized gossip algorithms in terms of message complexity, achieving ε-accuracy with high probability after O(n loglogn log[1/(ε)] ) single-hop messages. Key to our analysis is the way in which the network is recursively partitioned. We find that the above scaling law is achieved when subnetworks at scale j contain O(n(2/3)j) nodes; then the message complexity at any individual scale is O(n log[1/ε]). Another important consequence of the hierarchical construction is that the longest distance over which messages are exchanged is O(n1/3) hops (at the highest scale), and most messages (at lower scales) travel shorter distances. In networks that use link-level acknowledgements, this results in less congestion and resource usage by reducing message retransmissions. Simulations illustrate that the proposed scheme is more efficient than state-of-the-art randomized gossip algorithms based on averaging along paths.
Active learning of multiple source multiple destination topologies
Pegah Sattari
Maciej Kurant
Animashree Anandkumar
Athina Markopoulou
We consider the problem of inferring the topology of an M-by-N network by sending probes between M sources and N receivers. Prior work has s… (see more)hown that this problem can be decomposed into two parts: first, infer smaller subnetwork components (i.e., 1-by-N's or 2-by-2's) and then merge these components to identify the M-by-N topology. In this paper, we focus on the second part. In particular, we assume that a 1by-N topology is given and that all 2-by-2 components can be queried and learned using end-to-end probes. The problem is which 2-by-2's to query and how to merge them with the 1-byN, so as to exactly identify the 2-by-N topology, and optimize a number of performance metrics including measurement traffic, time complexity, and memory usage. We provide a lower bound, ⌈N/2⌉, on the number of 2-by-2's required by any active learning algorithm and we also propose a greedy algorithm that is nearoptimal and efficient in practice. It follows a bottom-up approach: at every step, it selects two receivers, queries the corresponding 2-by-2, and merges it with the given 1-by-N. The algorithm requires exactly N - 1 steps, which is much less than all (N:2) possible 2-by-2's, and it correctly identifies the 2-by-N topology.
Active learning of multiple source multiple destination topologies
Pegah Sattari
Maciej Kurant
Anima Anandkumar
Athina P. Markopoulou
We consider the problem of inferring the topology of an M-by-N network by sending probes between M sources and N receivers. Prior work has s… (see more)hown that this problem can be decomposed into two parts: first, infer smaller subnetwork components (i.e., 1-by-N's or 2-by-2's) and then merge these components to identify the M-by-N topology. In this paper, we focus on the second part. In particular, we assume that a 1by-N topology is given and that all 2-by-2 components can be queried and learned using end-to-end probes. The problem is which 2-by-2's to query and how to merge them with the 1-byN, so as to exactly identify the 2-by-N topology, and optimize a number of performance metrics including measurement traffic, time complexity, and memory usage. We provide a lower bound, ⌈N/2⌉, on the number of 2-by-2's required by any active learning algorithm and we also propose a greedy algorithm that is nearoptimal and efficient in practice. It follows a bottom-up approach: at every step, it selects two receivers, queries the corresponding 2-by-2, and merges it with the given 1-by-N. The algorithm requires exactly N - 1 steps, which is much less than all (N:2) possible 2-by-2's, and it correctly identifies the 2-by-N topology.