Portrait de Warren Gross

Warren Gross

Membre académique associé
Professeur, McGill University, Département de génie électrique et informatique

Biographie

Warren Gross est professeur titulaire de la chaire James McGill et directeur du Département de génie électrique et informatique de l'Université McGill. Dans ses recherches, il s’intéresse au rapprochement entre les algorithmes et leur mise en œuvre dans les domaines de l'apprentissage automatique et des communications numériques. Ses travaux portent sur les modèles efficaces d'apprentissage profond, le matériel pour l'apprentissage automatique, l'informatique stochastique, l'exploration matérielle de l'espace de conception pour les réseaux neuronaux, l'apprentissage automatique pour les communications numériques, ainsi que les algorithmes de décodage efficaces et le matériel pour les codes correcteurs d'erreurs.

Publications

Stochastic Simulated Quantum Annealing for Fast Solution of Combinatorial Optimization Problems
Naoya Onizawa
Ryoma Sasaki
Duckgyu Shin
Takahiro Hanyu
In this paper, we introduce stochastic simulated quantum annealing (SSQA) for large-scale combinatorial optimization problems. SSQA is desig… (voir plus)ned based on stochastic computing and quantum Monte Carlo, which can simulate quantum annealing (QA) by using multiple replicas of spins (probabilistic bits) in classical computing. The use of stochastic computing leads to an efficient parallel spin-state update algorithm, enabling quick search for a solution around the global minimum energy. Therefore, SSQA realizes quantum-like annealing for large-scale problems and can handle fully connected models in combinatorial optimization, unlike QA. The proposed method is evaluated in MATLAB on graph isomorphism problems, which are typical combinatorial optimization problems. The proposed method achieves a convergence speed an order of magnitude faster than a conventional stochastic simulaated annealing method. Additionally, it can handle a 100-times larger problem size compared to QA and a 25-times larger problem size compared to a traditional SA method, respectively, for similar convergence probabilities.
2023 S TOCHASTIC S IMULATED Q UANTUM A NNEALING FOR F AST S OLVING C OMBINATORIAL O PTIMIZATION P ROBLEMS
Naoya Onizawa
Ryoma Sasaki
Duckgyu Shin
Takahiro Hanyu
method. Additionally, it can handle a 100-times larger problem size compared to QA and a 25-times larger problem size compared to a traditio… (voir plus)nal SA method, respectively, for similar convergence probabilities.
2023 S TOCHASTIC Q UANTUM M ONTE C ARLO A LGORITHM FOR L ARGE -S CALE C OMBINATORIAL O PTIMIZATION P ROBLEMS
Naoya Onizawa
Ryoma Sasaki
Duckgyu Shin
Takahiro Hanyu
computing. In addition, it solves problems using two orders-of-magnitude larger number of spins than the D-Wave Two QA machine.
2023 S TOCHASTIC Q UANTUM M ONTE C ARLO A LGORITHM FOR L ARGE -S CALE C OMBINATORIAL O PTIMIZATION P ROBLEMS
Naoya Onizawa
Ryoma Sasaki
Duckgyu Shin
Takahiro Hanyu
computing. In addition, it solves problems using two orders-of-magnitude larger number of spins than the D-Wave Two QA machine.
Guessing Random Additive Noise Decoding
Syed Mohsin Abbas
Marwan Jalaleddine
List-GRAND: A Practical Way to Achieve Maximum Likelihood Decoding
Syed Mohsin Abbas
Marwan Jalaleddine
Guessing random additive noise decoding (GRAND) is a recently proposed universal maximum likelihood (ML) decoder for short-length and high-r… (voir plus)ate linear block codes. Soft-GRAND (SGRAND) is a prominent soft-input GRAND variant, outperforming the other GRAND variants in decoding performance; nevertheless, SGRAND is not suitable for parallel hardware implementation. Ordered Reliability Bits-GRAND (ORBGRAND) is another soft-input GRAND variant that is suitable for parallel hardware implementation; however, it has lower decoding performance than SGRAND. In this article, we propose List-GRAND (LGRAND), a technique for enhancing the decoding performance of ORBGRAND to match the ML decoding performance of SGRAND. Numerical simulation results show that LGRAND enhances ORBGRAND’s decoding performance by 0.5–0.75 dB for channel codes of various classes at a target frame error rate (FER) of 10−7. For linear block codes of length 127/128 and different code rates, LGRAND’s VLSI implementation can achieve an average information throughput of 47.27–51.36 Gb/s. In comparison to ORBGRAND’s VLSI implementation, the proposed LGRAND hardware has a 4.84% area overhead.
GRAND for Rayleigh Fading Channels
Syed Mohsin Abbas
Marwan Jalaleddine
Guessing Random Additive Noise Decoding (GRAND) is a code-agnostic decoding technique for short-length and high-rate channel codes. GRAND at… (voir plus)tempts to guess the channel-induced noise by generating Test Error Patterns (TEPs), and the sequence of TEP generation is the primary distinction between GRAND variants. In this work, we extend the application of GRAND to multipath frequency non-selective Rayleigh fading communication channels, and we refer to this GRAND variant as Fading-GRAND. The proposed Fading-GRAND adapts its TEP generation to the fading conditions of the underlying communication channel, outperforming traditional channel code decoders in scenarios with L spatial diversity branches as well as scenarios with no diversity. Numerical simulation results show that the Fading-GRAND outperforms the traditional Berlekamp-Massey (B-M) decoder for decoding BCH code (127, 106) and BCH code (127, 113) by
Successive-Cancellation Decoding of Reed-Muller Codes With Fast Hadamard Transform
Nghia Doan
Seyyed Ali Hashemi
A novel permuted fast successive-cancellation list decoding algorithm with fast Hadamard transform (FHT-FSCL) is presented. The proposed dec… (voir plus)oder initializes
PipeBERT: High-throughput BERT Inference for ARM Big.LITTLE Multi-core Processors
Hung-Yang Chang
Seyyed Hasan Mozafari
Cheng Chen
James J. Clark
Brett Meyer
Conjugate Adder Net (CAddNet) - a Space-Efficient Approximate CNN
Lulan Shen
Maryam Ziaeefard
Brett Meyer
James J. Clark
The AdderNet was recently developed as a way to implement deep neural networks without needing multiplication operations to combine weights … (voir plus)and inputs. Instead, absolute values of the difference between weights and inputs are used, greatly reducing the gate-level implementation complexity. Training of AdderNets is challenging, however, and the loss curves during training tend to fluctuate significantly. In this paper we propose the Conjugate Adder Network, or CAddNet, which uses the difference between the absolute values of conjugate pairs of inputs and the weights. We show that this can be implemented simply via a single minimum operation, resulting in a roughly 50% reduction in logic gate complexity as compared with AdderNets. The CAddNet method also stabilizes training as compared with AdderNets, yielding training curves similar to standard CNNs.
High-Throughput and Energy-Efficient VLSI Architecture for Ordered Reliability Bits GRAND
Syed Mohsin Abbas
Thibaud Tonnellier
Furkan Ercan
Marwan Jalaleddine
Ultrareliable low-latency communication (URLLC), a major 5G new-radio (NR) use case, is the key enabler for applications with strict reliabi… (voir plus)lity and latency requirements. These applications necessitate the use of short-length and high-rate channel codes. Guessing random additive noise decoding (GRAND) is a recently proposed maximum likelihood (ML) decoding technique for these short-length and high-rate codes. Rather than decoding the received vector, GRAND tries to infer the noise that corrupted the transmitted codeword during transmission through the communication channel. As a result, GRAND can decode any code, structured or unstructured. GRAND has hard-input as well as soft-input variants. Among these variants, ordered reliability bits GRAND (ORBGRAND) is a soft-input variant that outperforms hard-input GRAND and is suitable for parallel hardware implementation. This work reports the first hardware architecture for ORBGRAND, which achieves an average throughput of up to 42.5 Gb/s for a code length of 128 at a target frame error rate (FER) of 10−7. Furthermore, the proposed hardware can be used to decode any code as long as the length and rate constraints are met. In comparison to the GRAND with ABandonment (GRANDAB), a hard-input variant of GRAND, the proposed architecture enhances decoding performance by at least 2 dB. When compared to the state-of-the-art fast dynamic successive cancellation flip decoder (Fast-DSCF) using a 5G polar code (PC) (128, 105), the proposed ORBGRAND VLSI implementation has
Optimization and Simplification of PCPA Decoder for Reed-Muller Codes
Jiajie Li
The collapsed projection-aggregation (CPA) decoder reduces the computational complexity of the recursive projection-aggregation (RPA) decode… (voir plus)r by removing the recursive structure. From simulations, the CPA decoder has similar error-correction performance as the RPA decoder, when decoding Reed-Muller (RM) (7, 3) and (8, 2) codes. The computational complexity can be further reduced by only selecting a subset of sub-spaces, which is achieved by pruning CPA decoders. In this work, optimization methods are proposed to find the pruned CPA (PCPA) decoder with small performance loss. Furthermore, the min-sum approximation is used to replace non-linear projection and aggregation functions, and a simplified list decoder based on the syndrome check is proposed. Under the same complexity, the optimized PCPA decoder has less performance loss than randomly constructed PCPA decoders in most case. The min-sum approximation incurs less than 0.15 dB performance loss at a target frame error rate of 10−4, and the simplified list decoder does not have noticeable performance loss.