Portrait of Warren Gross

Warren Gross

Associate Academic Member
Professor, McGill University, Department of Electrical and Computer Engineering
Research Topics
Computer Systems
Deep Learning
Information Theory
Natural Language Processing
Optimization

Biography

Warren Gross is a James McGill Professor and chair of the Department of Electrical and Computer Engineering at McGill University.

His research interests lie in bridging algorithms and implementation in machine learning and digital communications. His work focuses on efficient deep learning models, hardware for machine learning, stochastic computing, hardware-aware design-space exploration for neural networks, machine learning for digital communications, and efficient decoding algorithms and hardware for error-correcting codes.

Current Students

Publications

DsMLP: A Learning-Based Multi-Layer Perception for MIMO Detection Implemented by Dynamic Stochastic Computing
Qidie Wu
Jinsheng Kuang
Jiyun Tao
Jienan Chen
As the number of antennas increases in multi-input and multi-output (MIMO) systems, even linear detection methods suffer from sharply increa… (see more)sing complexity. This paper proposes a learning-based multi-layer perception (MLP), named dynamic stochastic multi-layer perception (DsMLP), which is implemented by dynamic stochastic computing (DSC). We first establish a similar form between the MLP structure and minimum mean square error (MMSE) matrix operations. Consequently, DsMLP transforms the complex computation problem into an optimization problem of MLP training. Due to the specific design of MLP structure, e.g., same input/output dimension and single layer without activation function, the mathematical representation of DsMLP is identical to the MMSE matrix operations. Therefore, DsMLP guarantees sound model explainability in mathematics, fast convergence in training, and low complexity in computation. Furthermore, we transform the MLP training process to the DSC domain and propose a hardware-efficient scheme for DsMLP. Compared with other state-of-the-art MIMO detectors, DsMLP achieves 1.2× energy efficiency and 1.74× area efficiency.
Improved DC-Free Run-Length Limited 4B6B Codes for Concatenated Schemes
Elie Ngomseu Mambou
Thibaud Tonnellier
In this letter, we introduce a class of improved DC-free 4B6B codes in terms of error correction capabilities for a serially concatenated ar… (see more)chitecture. There are billions of different codebooks that can be derived from the 16 codewords contained in the traditional 4B6B code as per the IEEE 802.15.7 standard for visible light communication (VLC). These codebooks can be classified based on distances properties which determine their error correction performances. The traditional 4B6B code is suitable for hard-decision decoding, however, when a soft decoder is used like in a serially concatenated architecture, that code becomes obsolete. Simulations show that the proposed 4B6B code concatenated with forward error correction (FEC) codes, has better performance compared to state-of-the-art schemes such as the original 4B6B code, the enhanced Miller code, the Manchester code, the 5B10B code and the (0,4) 2/3 RLL code.
Improved DC-Free Run-Length Limited 4B6B Codes for Concatenated Schemes
Elie Ngomseu Mambou
Thibaud Tonnellier
In this letter, we introduce a class of improved DC-free 4B6B codes in terms of error correction capabilities for a serially concatenated ar… (see more)chitecture. There are billions of different codebooks that can be derived from the 16 codewords contained in the traditional 4B6B code as per the IEEE 802.15.7 standard for visible light communication (VLC). These codebooks can be classified based on distances properties which determine their error correction performances. The traditional 4B6B code is suitable for hard-decision decoding, however, when a soft decoder is used like in a serially concatenated architecture, that code becomes obsolete. Simulations show that the proposed 4B6B code concatenated with forward error correction (FEC) codes, has better performance compared to state-of-the-art schemes such as the original 4B6B code, the enhanced Miller code, the Manchester code, the 5B10B code and the (0,4) 2/3 RLL code.
A Synchro-Set-Aided Breadth-First Sphere Decoder for Polar-Coded MIMO Systems
Huayi Zhou
Xiangyun Deng
Yiqian Cai
Yifei Shen
Minhua Yang
X. You
Chuan Zhang
The joint optimization of multiple-input-multiple-output (MIMO) detection and polar decoding has become a research hotspot for future commun… (see more)ication systems. The error-correction performance of the separate detection and decoding (SDD) is far from the Shannon capacity, which cannot meet the requirements of communication scenarios such as ultra-reliable and low latency communications (URLLC). The existing joint detection and decoding (JDD) using breadth-first sphere decoding (BFSD) improves the reliability over SDD but still has a huge performance loss on low-rate codes. In this paper, JDD using synchro-set-aided BFSD (SA-BFSD) is proposed to greatly improve the error-correction performance for polar-coded MIMO systems. We first propose a method to generate the symbol synchro sets through the concept of frozen symbols, then refine the symbol synchro sets based on the characteristics analysis of the channel matrix. We optimize the enumerating order of the symbols and reduce the enumerating levels. The frame error rate (FER) and the bit error rate of the proposed algorithms are significantly improved especially for the low-rate codes. The proposed SA-BFSD JDD achieves an up to 7.8 dB performance gain over BFSD at FER
A Synchro-Set-Aided Breadth-First Sphere Decoder for Polar-Coded MIMO Systems
Huayi Zhou
Xiangyun Deng
Yiqian Cai
Yifei Shen
Minhua Yang
Xiaohu You
Chuan Zhang
The joint optimization of multiple-input-multiple-output (MIMO) detection and polar decoding has become a research hotspot for future commun… (see more)ication systems. The error-correction performance of the separate detection and decoding (SDD) is far from the Shannon capacity, which cannot meet the requirements of communication scenarios such as ultra-reliable and low latency communications (URLLC). The existing joint detection and decoding (JDD) using breadth-first sphere decoding (BFSD) improves the reliability over SDD but still has a huge performance loss on low-rate codes. In this paper, JDD using synchro-set-aided BFSD (SA-BFSD) is proposed to greatly improve the error-correction performance for polar-coded MIMO systems. We first propose a method to generate the symbol synchro sets through the concept of frozen symbols, then refine the symbol synchro sets based on the characteristics analysis of the channel matrix. We optimize the enumerating order of the symbols and reduce the enumerating levels. The frame error rate (FER) and the bit error rate of the proposed algorithms are significantly improved especially for the low-rate codes. The proposed SA-BFSD JDD achieves an up to 7.8 dB performance gain over BFSD at FER
High-Throughput and Energy-Efficient VLSI Architecture for Ordered Reliability Bits GRAND
Syed Mohsin Abbas
Thibaud Tonnellier
Furkan Ercan
Marwan Jalaleddine
Ultrareliable low-latency communication (URLLC), a major 5G new-radio (NR) use case, is the key enabler for applications with strict reliabi… (see more)lity and latency requirements. These applications necessitate the use of short-length and high-rate channel codes. Guessing random additive noise decoding (GRAND) is a recently proposed maximum likelihood (ML) decoding technique for these short-length and high-rate codes. Rather than decoding the received vector, GRAND tries to infer the noise that corrupted the transmitted codeword during transmission through the communication channel. As a result, GRAND can decode any code, structured or unstructured. GRAND has hard-input as well as soft-input variants. Among these variants, ordered reliability bits GRAND (ORBGRAND) is a soft-input variant that outperforms hard-input GRAND and is suitable for parallel hardware implementation. This work reports the first hardware architecture for ORBGRAND, which achieves an average throughput of up to 42.5 Gb/s for a code length of 128 at a target frame error rate (FER) of 10−7. Furthermore, the proposed hardware can be used to decode any code as long as the length and rate constraints are met. In comparison to the GRAND with ABandonment (GRANDAB), a hard-input variant of GRAND, the proposed architecture enhances decoding performance by at least 2 dB. When compared to the state-of-the-art fast dynamic successive cancellation flip decoder (Fast-DSCF) using a 5G polar code (PC) (128, 105), the proposed ORBGRAND VLSI implementation has
Successive-Cancellation Decoding of Reed-Muller Codes With Fast Hadamard Transform
Nghia Doan
Seyyed Ali Hashemi
A novel permuted fast successive-cancellation list decoding algorithm with fast Hadamard transform (FHT-FSCL) is presented. The proposed dec… (see more)oder initializes
Practical Dynamic SC-Flip Polar Decoders: Algorithm and Implementation
Furkan Ercan
Thibaud Tonnellier
Nghia Doan
SC-Flip (SCF) is a low-complexity polar code decoding algorithm with improved performance, and is an alternative to high-complexity (CRC)-ai… (see more)ded SC-List (CA-SCL) decoding. However, the performance improvement of SCF is limited since it can correct up to only one channel error (
Practical Dynamic SC-Flip Polar Decoders: Algorithm and Implementation
Furkan Ercan
Thibaud Tonnellier
Nghia Doan
SC-Flip (SCF) is a low-complexity polar code decoding algorithm with improved performance, and is an alternative to high-complexity (CRC)-ai… (see more)ded SC-List (CA-SCL) decoding. However, the performance improvement of SCF is limited since it can correct up to only one channel error (
Stochastic Bit-Wise Iterative Decoding of Polar Codes
Kaining Han
Junchao Wang
Jianhao Hu
Polar codes have received recent attention due to their potential to be applied in advanced wireless communication protocols such as the fif… (see more)th generation mobile communication system (5G). Among the existing decoding algorithms, Belief Propagation (BP) exhibits high-throughput, low-latency, and soft output with a high hardware cost. Stochastic computing, as a form of approximate computing, provides a potential low-cost implementation solution for the BP algorithm. However, existing stochastic BP decoders suffer from a relatively long decoding latency resulting in low hardware efficiency. In this paper, a novel bit-wise iterative stochastic decoding architecture for the BP algorithm is proposed to improve the throughput and hardware efficiency. By utilizing the frozen bits of polar codes and stochastic computing, multiple novel optimization methods are presented to further speed up convergence and increase the hardware efficiency.
Stochastic Bit-Wise Iterative Decoding of Polar Codes
Kaining Han
Junchao Wang
Jianhao Hu
Polar codes have received recent attention due to their potential to be applied in advanced wireless communication protocols such as the fif… (see more)th generation mobile communication system (5G). Among the existing decoding algorithms, Belief Propagation (BP) exhibits high-throughput, low-latency, and soft output with a high hardware cost. Stochastic computing, as a form of approximate computing, provides a potential low-cost implementation solution for the BP algorithm. However, existing stochastic BP decoders suffer from a relatively long decoding latency resulting in low hardware efficiency. In this paper, a novel bit-wise iterative stochastic decoding architecture for the BP algorithm is proposed to improve the throughput and hardware efficiency. By utilizing the frozen bits of polar codes and stochastic computing, multiple novel optimization methods are presented to further speed up convergence and increase the hardware efficiency.
Fast and Flexible Successive-Cancellation List Decoders for Polar Codes
Seyyed Ali Hashemi
Carlo Condo
Polar codes have gained significant amount of attention during the past few years and have been selected as a coding scheme for the next gen… (see more)eration of mobile broadband standard. Among decoding schemes, successive-cancellation list (SCL) decoding provides a reasonable tradeoff between the error-correction performance and hardware implementation complexity when used to decode polar codes, at the cost of limited throughput. The simplified SCL (SSCL) and its extension SSCL-SPC increase the speed of decoding by removing redundant calculations when encountering particular information and frozen bit patterns (rate one and single parity check codes), while keeping the error-correction performance unaltered. In this paper, we improve SSCL and SSCL-SPC by proving that the list size imposes a specific number of path splitting required to decode rate one and single parity check codes. Thus, the number of splitting can be limited while guaranteeing exactly the same error-correction performance as if the paths were forked at each bit estimation. We call the new decoding algorithms Fast-SSCL and Fast-SSCL-SPC. Moreover, we show that the number of path forks in a practical application can be tuned to achieve desirable speed, while keeping the error-correction performance almost unchanged. Hardware architectures implementing both algorithms are then described and implemented: It is shown that our design can achieve