Portrait of Warren Gross

Warren Gross

Associate Academic Member
Professor, McGill University, Department of Electrical and Computer Engineering
Research Topics
Computer Systems
Deep Learning
Information Theory
Natural Language Processing
Optimization

Biography

Warren Gross is a James McGill Professor and chair of the Department of Electrical and Computer Engineering at McGill University.

His research interests lie in bridging algorithms and implementation in machine learning and digital communications. His work focuses on efficient deep learning models, hardware for machine learning, stochastic computing, hardware-aware design-space exploration for neural networks, machine learning for digital communications, and efficient decoding algorithms and hardware for error-correcting codes.

Current Students

Publications

Fast Fine-Tuning Using Curriculum Domain Adaptation
Lulan Shen
Ibtihel Amara
Ruofeng Li
Brett Meyer
James J. Clark
Current deep neural networks (DNNs) have achieved remarkable accuracy in various downstream tasks. However, their training and fine-tuning a… (see more)re challenging due to several factors, such as limited computational resources, extended training and fine-tuning times, and over-fitting due to small datasets. To address these challenges, we propose a three-stage fast fine-tuning method that efficiently trains DNNs for edge devices. Our method combines curriculum learning and domain adaptation techniques to accelerate training while achieving comparable performance. First, we develop a data curriculum approach, which ranks the dataset according to difficulty and split it into the source domain (containing easy data) and the target domain (containing difficult data). Second, we adapt the pretrained model from the source domain to the target domain using an unsupervised domain adaptation (UDA) method called Deep CORAL. Finally, we continue training the adapted model on the source domain with fewer epochs. Our method achieves high accuracy quickly on various modern neural network architectures and datasets such as CIFAR-10, CIFAR-100, and CINIC-10.
Hybrid GRAND Sphere Decoding: Accelerated GRAND for Low-Rate Codes
Huayi Zhou
Guessing random additive noise decoding (GRAND) and sphere decoding (SD) are two algorithms that can achieve maximum likelihood decoding. In… (see more) this paper, a hybrid GRAND-SD (HGRAND) scheme is proposed to extend GRAND to low-rate codes. An accelerated GRAND decoder, assisted by a sphere decoder running in parallel and giving hints to it to allow skipping of certain candidates allows HGRAND to achieve a latency below the minimum latency of the individual component decoders while guaranteeing error-correction performance.
Training Acceleration of Frequency Domain CNNs Using Activation Compression
Seyyed Hasan Mozafari
James J. Clark
Brett Meyer
Reducing the complexity of training convolutional neural networks results in lower energy consumption expended during training, or higher ac… (see more)curacy by admitting a greater number of training epochs within a training time budget. During backpropagation, a considerable amount of temporary data is offloaded from GPU memory to CPU memory, increasing training time. In this paper, we address this training time overhead by introducing an activation compression technique for frequency domain convolutional neural networks. Applying this compression technique on frequency domain AlexNet results in activation compression of 57.7%, and a reduction of training time by 23%, with a negligible effect on classification accuracy.
Low-Complexity Sphere Decoding for Polar-Coded MIMO Systems
Huayi Zhou
Jian Zheng
Minhua Yang
Xiaohu You
Chuan Zhang
For polar-coded MIMO systems, separate detection and decoding (SDD) is the traditional scheme. In SDD systems, sphere decoding (SD) is one o… (see more)f the competitive MIMO detection schemes. However, SD may not utilize the coding information sufficiently in SDD systems, causing an error-correction performance loss. The existed joint detection and decoding using breadth-first SD (BSD) improves the performance than SDD, whereas the limited search space still causes a performance loss. In this paper, we propose joint detection and decoding based on SD (SD JDD) for polar-coded MIMO systems to reach maximum likelihood (ML) bound. Subsequently, two approaches are further proposed to reduce the computational complexity. The first approach reduces the layers of the SD search tree by exploiting symbol synchro sets, which could accelerate the convergence of SD JDD. The second efficient approach performs multiple tree searches. A small initial radius of the sphere for the first search is assigned to reduce the search space. The ML optimality could be preserved by the following multiple tree searches with increasing radius. It is shown from the numerical results that the proposed JDD outperforms SDD by 3.1 dB at FER
SSS3D: Fast Neural Architecture Search For Efficient Three-Dimensional Semantic Segmentation
Olivier Therrien
Marihan Amein
Zhuoran Xiong
Brett Meyer
We present SSS3D, a fast multi-objective NAS framework designed to find computationally efficient 3D semantic scene segmentation networks. I… (see more)t uses RandLA-Net, an off-the-shelf point-based network, as a super-network to enable weight sharing and reduce search time by 99.67% for single-stage searches. SSS3D has a complex search space composed of sampling and architectural parameters that can form 2.88 * 10^17 possible networks. To further reduce search time, SSS3D splits the complete search space and introduces a two-stage search that finds optimal subnetworks in 54% of the time required by single-stage searches.
FMAS: Fast Multi-Objective SuperNet Architecture Search for Semantic Segmentation
Zhuoran Xiong
Marihan Amein
Olivier Therrien
Brett Meyer
Memory-Efficient FPGA Implementation of Stochastic Simulated Annealing
Duckgyu Shin
Naoya Onizawa
Takahiro Hanyu
Simulated annealing (SA) is a well-known algorithm for solving combinatorial optimization problems. However, the computation time of SA incr… (see more)eases rapidly, as the size of the problem grows. Recently, a stochastic simulated annealing (SSA) algorithm that converges faster than conventional SA has been reported. In this paper, we present a hardware-aware SSA (HA-SSA) algorithm for memory-efficient FPGA implementations. HA-SSA can reduce the memory usage of storing intermediate results while maintaining the computing speed of SSA. For evaluation purposes, the proposed algorithm is compared with the conventional SSA and SA approaches on maximum cut combinatorial optimization problems. HA-SSA achieves a convergence speed that is up to 114-times faster than that of the conventional SA algorithm depending on the maximum cut problem selected from the G-set which is a dataset of the maximum cut problems. HA-SSA is implemented on a field-programmable gate array (FPGA) (Xilinx Kintex-7), and it achieves up to 6-times the memory efficiency of conventional SSA while maintaining high solution quality for optimization problems.
2023 S TOCHASTIC S IMULATED Q UANTUM A NNEALING FOR F AST S OLVING C OMBINATORIAL O PTIMIZATION P ROBLEMS
Naoya Onizawa
Ryoma Sasaki
Duckgyu Shin
Takahiro Hanyu
method. Additionally, it can handle a 100-times larger problem size compared to QA and a 25-times larger problem size compared to a traditio… (see more)nal SA method, respectively, for similar convergence probabilities.
2023 S TOCHASTIC Q UANTUM M ONTE C ARLO A LGORITHM FOR L ARGE -S CALE C OMBINATORIAL O PTIMIZATION P ROBLEMS
Naoya Onizawa
Ryoma Sasaki
Duckgyu Shin
Takahiro Hanyu
computing. In addition, it solves problems using two orders-of-magnitude larger number of spins than the D-Wave Two QA machine.
2023 S TOCHASTIC Q UANTUM M ONTE C ARLO A LGORITHM FOR L ARGE -S CALE C OMBINATORIAL O PTIMIZATION P ROBLEMS
Naoya Onizawa
Ryoma Sasaki
Duckgyu Shin
Takahiro Hanyu
computing. In addition, it solves problems using two orders-of-magnitude larger number of spins than the D-Wave Two QA machine.
Guessing Random Additive Noise Decoding
Syed Mohsin Abbas
Marwan Jalaleddine
List-GRAND: A Practical Way to Achieve Maximum Likelihood Decoding
Syed Mohsin Abbas
Marwan Jalaleddine
Guessing random additive noise decoding (GRAND) is a recently proposed universal maximum likelihood (ML) decoder for short-length and high-r… (see more)ate linear block codes. Soft-GRAND (SGRAND) is a prominent soft-input GRAND variant, outperforming the other GRAND variants in decoding performance; nevertheless, SGRAND is not suitable for parallel hardware implementation. Ordered Reliability Bits-GRAND (ORBGRAND) is another soft-input GRAND variant that is suitable for parallel hardware implementation; however, it has lower decoding performance than SGRAND. In this article, we propose List-GRAND (LGRAND), a technique for enhancing the decoding performance of ORBGRAND to match the ML decoding performance of SGRAND. Numerical simulation results show that LGRAND enhances ORBGRAND’s decoding performance by 0.5–0.75 dB for channel codes of various classes at a target frame error rate (FER) of 10−7. For linear block codes of length 127/128 and different code rates, LGRAND’s VLSI implementation can achieve an average information throughput of 47.27–51.36 Gb/s. In comparison to ORBGRAND’s VLSI implementation, the proposed LGRAND hardware has a 4.84% area overhead.