Warren Gross

GRAND for Rayleigh Fading Channels

Syed Mohsin Abbas

Marwan Jalaleddine

Guessing Random Additive Noise Decoding (GRAND) is a code-agnostic decoding technique for short-length and high-rate channel codes. GRAND at… (see more)tempts to guess the channel-induced noise by generating Test Error Patterns (TEPs), and the sequence of TEP generation is the primary distinction between GRAND variants. In this work, we extend the application of GRAND to multipath frequency non-selective Rayleigh fading communication channels, and we refer to this GRAND variant as Fading-GRAND. The proposed Fading-GRAND adapts its TEP generation to the fading conditions of the underlying communication channel, outperforming traditional channel code decoders in scenarios with L spatial diversity branches as well as scenarios with no diversity. Numerical simulation results show that the Fading-GRAND outperforms the traditional Berlekamp-Massey (B-M) decoder for decoding BCH code (127, 106) and BCH code (127, 113) by

2022-12-04

2022 IEEE Globecom Workshops (GC Wkshps) (published)

doi.org

arxiv.org

Successive-Cancellation Decoding of Reed-Muller Codes With Fast Hadamard Transform

Nghia Doan

Seyyed Ali Hashemi

Warren Gross

A novel permuted fast successive-cancellation list decoding algorithm with fast Hadamard transform (FHT-FSCL) is presented. The proposed dec… (see more)oder initializes

2022-11-01

IEEE Transactions on Vehicular Technology (published)

doi.org

arxiv.org

PipeBERT: High-throughput BERT Inference for ARM Big.LITTLE Multi-core Processors

Hung-Yang Chang

Seyyed Hasan Mozafari

Cheng Chen

James J. Clark

Brett Meyer

Warren Gross

2022-10-08

Journal of Signal Processing Systems (published)

doi.org

Conjugate Adder Net (CAddNet) - a Space-Efficient Approximate CNN

Lulan Shen

Maryam Ziaeefard

Brett Meyer

Warren Gross

James J. Clark

The AdderNet was recently developed as a way to implement deep neural networks without needing multiplication operations to combine weights … (see more)and inputs. Instead, absolute values of the difference between weights and inputs are used, greatly reducing the gate-level implementation complexity. Training of AdderNets is challenging, however, and the loss curves during training tend to fluctuate significantly. In this paper we propose the Conjugate Adder Network, or CAddNet, which uses the difference between the absolute values of conjugate pairs of inputs and the weights. We show that this can be implemented simply via a single minimum operation, resulting in a roughly 50% reduction in logic gate complexity as compared with AdderNets. The CAddNet method also stabilizes training as compared with AdderNets, yielding training curves similar to standard CNNs.

2022-06-01

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (published)

doi.org

Efficient Fine-Tuning of BERT Models on the Edge

Danilo Vucetic

Mohammadreza Tayaranian

Maryam Ziaeefard

James J. Clark

Brett Meyer

Warren Gross

Resource-constrained devices are increasingly the deployment targets of machine learning applications. Static models, however, do not always… (see more) suffice for dynamic environments. On-device training of models allows for quick adaptability to new scenarios. With the increasing size of deep neural networks, as noted with the likes of BERT and other natural language processing models, comes increased resource requirements, namely memory, computation, energy, and time. Furthermore, training is far more resource intensive than inference. Resource-constrained on-device learning is thus doubly difficult, especially with large BERT-like models. By reducing the memory usage of fine-tuning, pre-trained BERT models can become efficient enough to fine-tune on resource-constrained devices. We propose Freeze And ReconFigure (FAR), a memory-efficient training regime for BERT-like models that reduces the memory usage of activation maps during fine-tuning by avoiding unnecessary parameter updates. FAR reduces fine-tuning time on the DistilBERT model and CoLA dataset by 30 %, and time spent on memory operations by 47%. More broadly, reductions in metric performance on the GLUE and SQuAD datasets are around 1% on average.

2022-06-01

2022 IEEE International Symposium on Circuits and Systems (ISCAS) (published)

doi.org

arxiv.org

High-Throughput and Energy-Efficient VLSI Architecture for Ordered Reliability Bits GRAND

Syed Mohsin Abbas

Thibaud Tonnellier

Furkan Ercan

Marwan Jalaleddine

Warren Gross

Ultrareliable low-latency communication (URLLC), a major 5G new-radio (NR) use case, is the key enabler for applications with strict reliabi… (see more)lity and latency requirements. These applications necessitate the use of short-length and high-rate channel codes. Guessing random additive noise decoding (GRAND) is a recently proposed maximum likelihood (ML) decoding technique for these short-length and high-rate codes. Rather than decoding the received vector, GRAND tries to infer the noise that corrupted the transmitted codeword during transmission through the communication channel. As a result, GRAND can decode any code, structured or unstructured. GRAND has hard-input as well as soft-input variants. Among these variants, ordered reliability bits GRAND (ORBGRAND) is a soft-input variant that outperforms hard-input GRAND and is suitable for parallel hardware implementation. This work reports the first hardware architecture for ORBGRAND, which achieves an average throughput of up to 42.5 Gb/s for a code length of 128 at a target frame error rate (FER) of 10−7. Furthermore, the proposed hardware can be used to decode any code as long as the length and rate constraints are met. In comparison to the GRAND with ABandonment (GRANDAB), a hard-input variant of GRAND, the proposed architecture enhances decoding performance by at least 2 dB. When compared to the state-of-the-art fast dynamic successive cancellation flip decoder (Fast-DSCF) using a 5G polar code (PC) (128, 105), the proposed ORBGRAND VLSI implementation has

2022-06-01

IEEE Transactions on Very Large Scale Integration (VLSI) Systems (published)

doi.org

arxiv.org

Optimization and Simplification of PCPA Decoder for Reed-Muller Codes

Jiajie Li

Warren Gross

The collapsed projection-aggregation (CPA) decoder reduces the computational complexity of the recursive projection-aggregation (RPA) decode… (see more)r by removing the recursive structure. From simulations, the CPA decoder has similar error-correction performance as the RPA decoder, when decoding Reed-Muller (RM) (7, 3) and (8, 2) codes. The computational complexity can be further reduced by only selecting a subset of sub-spaces, which is achieved by pruning CPA decoders. In this work, optimization methods are proposed to find the pruned CPA (PCPA) decoder with small performance loss. Furthermore, the min-sum approximation is used to replace non-linear projection and aggregation functions, and a simplified list decoder based on the syndrome check is proposed. Under the same complexity, the optimized PCPA decoder has less performance loss than randomly constructed PCPA decoders in most case. The min-sum approximation incurs less than 0.15 dB performance loss at a target frame error rate of 10−4, and the simplified list decoder does not have noticeable performance loss.

2022-06-01

IEEE Communications Letters (published)

doi.org

Hardware Architecture for Guessing Random Additive Noise Decoding Markov Order (GRAND-MO)

Syed Mohsin Abbas

Marwan Jalaleddine

Warren Gross

2022-05-20

Journal of Signal Processing Systems (published)

doi.org

GRAND for Rayleigh Fading Channels

Syed Mohsin Abbas

Marwan Jalaleddine

Warren Gross

Guessing Random Additive Noise Decoding (GRAND) is a code-agnostic decoding technique for short-length and high-rate channel codes. GRAND at… (see more)tempts to guess the channel-induced noise by generating Test Error Patterns (TEPs), and the sequence of TEP generation is the primary distinction between GRAND variants. In this work, we extend the application of GRAND to multipath frequency non-selective Rayleigh fading communication channels, and we refer to this GRAND variant as Fading-GRAND. The proposed Fading-GRAND adapts its TEP generation to the fading conditions of the underlying communication channel, outperforming traditional channel code decoders in scenarios with L spatial diversity branches as well as scenarios with no diversity. Numerical simulation results show that the Fading-GRAND outperforms the traditional Berlekamp-Massey (B-M) decoder for decoding BCH code (127, 106) and BCH code (127, 113) by

2022-04-29

ArXiv (preprint)

doi.org

arxiv.org

Fast-Converging Simulated Annealing for Ising Models Based on Integral Stochastic Computing

Naoya Onizawa

K. Katsuki

Duckgyu Shin

Warren Gross

Takahiro Hanyu

Probabilistic bits (p-bits) have recently been presented as a spin (basic computing element) for the simulated annealing (SA) of Ising model… (see more)s. In this brief, we introduce fast-converging SA based on p-bits designed using integral stochastic computing. The stochastic implementation approximates a p-bit function, which can search for a solution to a combinatorial optimization problem at lower energy than conventional p-bits. Searching around the global minimum energy can increase the probability of finding a solution. The proposed stochastic computing-based SA method is compared with conventional SA and quantum annealing (QA) with a D-Wave Two quantum annealer on the traveling salesman, maximum cut (MAX-CUT), and graph isomorphism (GI) problems. The proposed method achieves a convergence speed a few orders of magnitude faster while dealing with an order of magnitude larger number of spins than the other methods.

2022-03-28

IEEE Transactions on Neural Networks and Learning Systems (published)

doi.org

Fast-Converging Simulated Annealing for Ising Models Based on Integral Stochastic Computing

Naoya Onizawa

Kota Katsuki

Duckgyu Shin

Warren Gross

Takahiro Hanyu

Probabilistic bits (p-bits) have recently been presented as a spin (basic computing element) for the simulated annealing (SA) of Ising model… (see more)s. In this brief, we introduce fast-converging SA based on p-bits designed using integral stochastic computing. The stochastic implementation approximates a p-bit function, which can search for a solution to a combinatorial optimization problem at lower energy than conventional p-bits. Searching around the global minimum energy can increase the probability of finding a solution. The proposed stochastic computing-based SA method is compared with conventional SA and quantum annealing (QA) with a D-Wave Two quantum annealer on the traveling salesman, maximum cut (MAX-CUT), and graph isomorphism (GI) problems. The proposed method achieves a convergence speed a few orders of magnitude faster while dealing with an order of magnitude larger number of spins than the other methods.

2022-03-28

IEEE Transactions on Neural Networks and Learning Systems (published)

doi.org

DsMLP: A Learning-Based Multi-Layer Perception for MIMO Detection Implemented by Dynamic Stochastic Computing

Qidie Wu

Jinsheng Kuang

Jiyun Tao

Jienan Chen

Warren Gross

As the number of antennas increases in multi-input and multi-output (MIMO) systems, even linear detection methods suffer from sharply increa… (see more)sing complexity. This paper proposes a learning-based multi-layer perception (MLP), named dynamic stochastic multi-layer perception (DsMLP), which is implemented by dynamic stochastic computing (DSC). We first establish a similar form between the MLP structure and minimum mean square error (MMSE) matrix operations. Consequently, DsMLP transforms the complex computation problem into an optimization problem of MLP training. Due to the specific design of MLP structure, e.g., same input/output dimension and single layer without activation function, the mathematical representation of DsMLP is identical to the MMSE matrix operations. Therefore, DsMLP guarantees sound model explainability in mathematics, fast convergence in training, and low complexity in computation. Furthermore, we transform the MLP training process to the DSC domain and propose a hardware-efficient scheme for DsMLP. Compared with other state-of-the-art MIMO detectors, DsMLP achieves 1.2× energy efficiency and 1.74× area efficiency.

2022-01-01

IEEE Transactions on Signal Processing (published)

doi.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Warren Gross

Biography

Current Students

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Warren Gross

Biography

Current Students

Publications