Warren Gross

Associate Academic Member

Professor, McGill University, Department of Electrical and Computer Engineering

Research Topics

Computer Systems

Deep Learning

Information Theory

Natural Language Processing

Optimization

Biography

Warren Gross is a James McGill Professor and chair of the Department of Electrical and Computer Engineering at McGill University.

His research interests lie in bridging algorithms and implementation in machine learning and digital communications. His work focuses on efficient deep learning models, hardware for machine learning, stochastic computing, hardware-aware design-space exploration for neural networks, machine learning for digital communications, and efficient decoding algorithms and hardware for error-correcting codes.

Current Students

Fahad Rahman Amik

PhD - McGill University

Github

Google Scholar

Mohammadreza Tayaranian Hosseini

PhD - McGill University

Website

Github

Google Scholar

Publications

Fast Fine-Tuning Using Curriculum Domain Adaptation

Lulan Shen

Ibtihel Amara

Ruofeng Li

Brett Meyer

Warren Gross

James J. Clark

Current deep neural networks (DNNs) have achieved remarkable accuracy in various downstream tasks. However, their training and fine-tuning a… (see more)re challenging due to several factors, such as limited computational resources, extended training and fine-tuning times, and over-fitting due to small datasets. To address these challenges, we propose a three-stage fast fine-tuning method that efficiently trains DNNs for edge devices. Our method combines curriculum learning and domain adaptation techniques to accelerate training while achieving comparable performance. First, we develop a data curriculum approach, which ranks the dataset according to difficulty and split it into the source domain (containing easy data) and the target domain (containing difficult data). Second, we adapt the pretrained model from the source domain to the target domain using an unsupervised domain adaptation (UDA) method called Deep CORAL. Finally, we continue training the adapted model on the source domain with fewer epochs. Our method achieves high accuracy quickly on various modern neural network architectures and datasets such as CIFAR-10, CIFAR-100, and CINIC-10.

2023-06-01

Canadian Conference on Computer and Robot Vision (published)

doi.org

Hybrid GRAND Sphere Decoding: Accelerated GRAND for Low-Rate Codes

Huayi Zhou

Warren Gross

Guessing random additive noise decoding (GRAND) and sphere decoding (SD) are two algorithms that can achieve maximum likelihood decoding. In… (see more) this paper, a hybrid GRAND-SD (HGRAND) scheme is proposed to extend GRAND to low-rate codes. An accelerated GRAND decoder, assisted by a sphere decoder running in parallel and giving hints to it to allow skipping of certain candidates allows HGRAND to achieve a latency below the minimum latency of the individual component decoders while guaranteeing error-correction performance.

2023-05-21

International Symposium on Circuits and Systems (published)

doi.org

Training Acceleration of Frequency Domain CNNs Using Activation Compression

Seyyed Hasan Mozafari

James J. Clark

Warren Gross

Brett Meyer

Reducing the complexity of training convolutional neural networks results in lower energy consumption expended during training, or higher ac… (see more)curacy by admitting a greater number of training epochs within a training time budget. During backpropagation, a considerable amount of temporary data is offloaded from GPU memory to CPU memory, increasing training time. In this paper, we address this training time overhead by introducing an activation compression technique for frequency domain convolutional neural networks. Applying this compression technique on frequency domain AlexNet results in activation compression of 57.7%, and a reduction of training time by 23%, with a negligible effect on classification accuracy.

2023-05-21

International Symposium on Circuits and Systems (published)

doi.org

Low-Complexity Sphere Decoding for Polar-Coded MIMO Systems

Huayi Zhou

Jian Zheng

Minhua Yang

Warren Gross

Xiaohu You

Chuan Zhang

For polar-coded MIMO systems, separate detection and decoding (SDD) is the traditional scheme. In SDD systems, sphere decoding (SD) is one o… (see more)f the competitive MIMO detection schemes. However, SD may not utilize the coding information sufficiently in SDD systems, causing an error-correction performance loss. The existed joint detection and decoding using breadth-first SD (BSD) improves the performance than SDD, whereas the limited search space still causes a performance loss. In this paper, we propose joint detection and decoding based on SD (SD JDD) for polar-coded MIMO systems to reach maximum likelihood (ML) bound. Subsequently, two approaches are further proposed to reduce the computational complexity. The first approach reduces the layers of the SD search tree by exploiting symbol synchro sets, which could accelerate the convergence of SD JDD. The second efficient approach performs multiple tree searches. A small initial radius of the sphere for the first search is assigned to reduce the search space. The ML optimality could be preserved by the following multiple tree searches with increasing radius. It is shown from the numerical results that the proposed JDD outperforms SDD by 3.1 dB at FER

2023-05-01

IEEE Transactions on Vehicular Technology (published)

doi.org

SSS3D: Fast Neural Architecture Search For Efficient Three-Dimensional Semantic Segmentation

Olivier Therrien

Marihan Amein

Zhuoran Xiong

Warren Gross

Brett Meyer

We present SSS3D, a fast multi-objective NAS framework designed to find computationally efficient 3D semantic scene segmentation networks. I… (see more)t uses RandLA-Net, an off-the-shelf point-based network, as a super-network to enable weight sharing and reduce search time by 99.67% for single-stage searches. SSS3D has a complex search space composed of sampling and architectural parameters that can form 2.88 * 10^17 possible networks. To further reduce search time, SSS3D splits the complete search space and introduces a two-stage search that finds optimal subnetworks in 54% of the time required by single-stage searches.

2023-04-21

ArXiv (preprint)

doi.org

arxiv.org

FMAS: Fast Multi-Objective SuperNet Architecture Search for Semantic Segmentation

Zhuoran Xiong

Marihan Amein

Olivier Therrien

Warren Gross

Brett Meyer

2023-03-29

ArXiv (preprint)

doi.org

arxiv.org

Memory-Efficient FPGA Implementation of Stochastic Simulated Annealing

Duckgyu Shin

Naoya Onizawa

Warren Gross

Takahiro Hanyu

Simulated annealing (SA) is a well-known algorithm for solving combinatorial optimization problems. However, the computation time of SA incr… (see more)eases rapidly, as the size of the problem grows. Recently, a stochastic simulated annealing (SSA) algorithm that converges faster than conventional SA has been reported. In this paper, we present a hardware-aware SSA (HA-SSA) algorithm for memory-efficient FPGA implementations. HA-SSA can reduce the memory usage of storing intermediate results while maintaining the computing speed of SSA. For evaluation purposes, the proposed algorithm is compared with the conventional SSA and SA approaches on maximum cut combinatorial optimization problems. HA-SSA achieves a convergence speed that is up to 114-times faster than that of the conventional SA algorithm depending on the maximum cut problem selected from the G-set which is a dataset of the maximum cut problems. HA-SSA is implemented on a field-programmable gate array (FPGA) (Xilinx Kintex-7), and it achieves up to 6-times the memory efficiency of conventional SSA while maintaining high solution quality for optimization problems.

2023-03-01

IEEE Journal on Emerging and Selected Topics in Circuits and Systems (published)

doi.org

2023 S TOCHASTIC S IMULATED Q UANTUM A NNEALING FOR F AST S OLVING C OMBINATORIAL O PTIMIZATION P ROBLEMS

Naoya Onizawa

Ryoma Sasaki

Duckgyu Shin

Warren Gross

Takahiro Hanyu

method. Additionally, it can handle a 100-times larger problem size compared to QA and a 25-times larger problem size compared to a traditio… (see more)nal SA method, respectively, for similar convergence probabilities.

2023-01-01

(published)

www.semanticscholar.org

2023 S TOCHASTIC Q UANTUM M ONTE C ARLO A LGORITHM FOR L ARGE -S CALE C OMBINATORIAL O PTIMIZATION P ROBLEMS

Naoya Onizawa

Ryoma Sasaki

Duckgyu Shin

Warren Gross

Takahiro Hanyu

computing. In addition, it solves problems using two orders-of-magnitude larger number of spins than the D-Wave Two QA machine.

2023 S TOCHASTIC Q UANTUM M ONTE C ARLO A LGORITHM FOR L ARGE -S CALE C OMBINATORIAL O PTIMIZATION P ROBLEMS

Naoya Onizawa

Ryoma Sasaki

Duckgyu Shin

Warren Gross

Takahiro Hanyu

computing. In addition, it solves problems using two orders-of-magnitude larger number of spins than the D-Wave Two QA machine.

2023-01-01

(published)

www.semanticscholar.org

Guessing Random Additive Noise Decoding

Syed Mohsin Abbas

Marwan Jalaleddine

Warren Gross

2023-01-01

(published)

doi.org

List-GRAND: A Practical Way to Achieve Maximum Likelihood Decoding

Syed Mohsin Abbas

Marwan Jalaleddine

Warren Gross

Guessing random additive noise decoding (GRAND) is a recently proposed universal maximum likelihood (ML) decoder for short-length and high-r… (see more)ate linear block codes. Soft-GRAND (SGRAND) is a prominent soft-input GRAND variant, outperforming the other GRAND variants in decoding performance; nevertheless, SGRAND is not suitable for parallel hardware implementation. Ordered Reliability Bits-GRAND (ORBGRAND) is another soft-input GRAND variant that is suitable for parallel hardware implementation; however, it has lower decoding performance than SGRAND. In this article, we propose List-GRAND (LGRAND), a technique for enhancing the decoding performance of ORBGRAND to match the ML decoding performance of SGRAND. Numerical simulation results show that LGRAND enhances ORBGRAND’s decoding performance by 0.5–0.75 dB for channel codes of various classes at a target frame error rate (FER) of 10−7. For linear block codes of length 127/128 and different code rates, LGRAND’s VLSI implementation can achieve an average information throughput of 47.27–51.36 Gb/s. In comparison to ORBGRAND’s VLSI implementation, the proposed LGRAND hardware has a 4.84% area overhead.

2023-01-01

IEEE Transactions on Very Large Scale Integration (VLSI) Systems (published)

doi.org

arxiv.org

AI Advantage

Mila AI Policy Fellowship

Strategic Priorities

AI Advantage

Mila AI Policy Fellowship

Warren Gross

Biography

Current Students

Publications

AI Advantage

Mila AI Policy Fellowship

Strategic Priorities

AI Advantage

Mila AI Policy Fellowship

Popular keywords:

Warren Gross

Biography

Current Students

Publications