Warren Gross

Membre académique associé

Professeur, McGill University, Département de génie électrique et informatique

Sujets de recherche

Apprentissage profond

Optimisation

Systèmes informatiques

Théorie de l'information

Traitement du langage naturel

Biographie

Warren Gross est professeur titulaire de la chaire James McGill et directeur du Département de génie électrique et informatique de l'Université McGill. Dans ses recherches, il s’intéresse au rapprochement entre les algorithmes et leur mise en œuvre dans les domaines de l'apprentissage automatique et des communications numériques. Ses travaux portent sur les modèles efficaces d'apprentissage profond, le matériel pour l'apprentissage automatique, l'informatique stochastique, l'exploration matérielle de l'espace de conception pour les réseaux neuronaux, l'apprentissage automatique pour les communications numériques, ainsi que les algorithmes de décodage efficaces et le matériel pour les codes correcteurs d'erreurs.

Étudiants actuels

Fahad Rahman Amik

Doctorat - McGill

Github

Google Scholar

Mohammadreza Tayaranian Hosseini

Doctorat - McGill

Site web

Github

Google Scholar

Publications

Fast Fine-Tuning Using Curriculum Domain Adaptation

Lulan Shen

Ibtihel Amara

Ruofeng Li

Brett Meyer

Warren Gross

James J. Clark

Current deep neural networks (DNNs) have achieved remarkable accuracy in various downstream tasks. However, their training and fine-tuning a… (voir plus)re challenging due to several factors, such as limited computational resources, extended training and fine-tuning times, and over-fitting due to small datasets. To address these challenges, we propose a three-stage fast fine-tuning method that efficiently trains DNNs for edge devices. Our method combines curriculum learning and domain adaptation techniques to accelerate training while achieving comparable performance. First, we develop a data curriculum approach, which ranks the dataset according to difficulty and split it into the source domain (containing easy data) and the target domain (containing difficult data). Second, we adapt the pretrained model from the source domain to the target domain using an unsupervised domain adaptation (UDA) method called Deep CORAL. Finally, we continue training the adapted model on the source domain with fewer epochs. Our method achieves high accuracy quickly on various modern neural network architectures and datasets such as CIFAR-10, CIFAR-100, and CINIC-10.

2023-06-01

Canadian Conference on Computer and Robot Vision (publié)

doi.org

Hybrid GRAND Sphere Decoding: Accelerated GRAND for Low-Rate Codes

Huayi Zhou

Warren Gross

Guessing random additive noise decoding (GRAND) and sphere decoding (SD) are two algorithms that can achieve maximum likelihood decoding. In… (voir plus) this paper, a hybrid GRAND-SD (HGRAND) scheme is proposed to extend GRAND to low-rate codes. An accelerated GRAND decoder, assisted by a sphere decoder running in parallel and giving hints to it to allow skipping of certain candidates allows HGRAND to achieve a latency below the minimum latency of the individual component decoders while guaranteeing error-correction performance.

2023-05-21

International Symposium on Circuits and Systems (publié)

doi.org

Training Acceleration of Frequency Domain CNNs Using Activation Compression

Seyyed Hasan Mozafari

James J. Clark

Warren Gross

Brett Meyer

Reducing the complexity of training convolutional neural networks results in lower energy consumption expended during training, or higher ac… (voir plus)curacy by admitting a greater number of training epochs within a training time budget. During backpropagation, a considerable amount of temporary data is offloaded from GPU memory to CPU memory, increasing training time. In this paper, we address this training time overhead by introducing an activation compression technique for frequency domain convolutional neural networks. Applying this compression technique on frequency domain AlexNet results in activation compression of 57.7%, and a reduction of training time by 23%, with a negligible effect on classification accuracy.

2023-05-21

International Symposium on Circuits and Systems (publié)

doi.org

Low-Complexity Sphere Decoding for Polar-Coded MIMO Systems

Huayi Zhou

Jian Zheng

Minhua Yang

Warren Gross

Xiaohu You

Chuan Zhang

For polar-coded MIMO systems, separate detection and decoding (SDD) is the traditional scheme. In SDD systems, sphere decoding (SD) is one o… (voir plus)f the competitive MIMO detection schemes. However, SD may not utilize the coding information sufficiently in SDD systems, causing an error-correction performance loss. The existed joint detection and decoding using breadth-first SD (BSD) improves the performance than SDD, whereas the limited search space still causes a performance loss. In this paper, we propose joint detection and decoding based on SD (SD JDD) for polar-coded MIMO systems to reach maximum likelihood (ML) bound. Subsequently, two approaches are further proposed to reduce the computational complexity. The first approach reduces the layers of the SD search tree by exploiting symbol synchro sets, which could accelerate the convergence of SD JDD. The second efficient approach performs multiple tree searches. A small initial radius of the sphere for the first search is assigned to reduce the search space. The ML optimality could be preserved by the following multiple tree searches with increasing radius. It is shown from the numerical results that the proposed JDD outperforms SDD by 3.1 dB at FER

2023-05-01

IEEE Transactions on Vehicular Technology (publié)

doi.org

SSS3D: Fast Neural Architecture Search For Efficient Three-Dimensional Semantic Segmentation

Olivier Therrien

Marihan Amein

Zhuoran Xiong

Warren Gross

Brett Meyer

We present SSS3D, a fast multi-objective NAS framework designed to find computationally efficient 3D semantic scene segmentation networks. I… (voir plus)t uses RandLA-Net, an off-the-shelf point-based network, as a super-network to enable weight sharing and reduce search time by 99.67% for single-stage searches. SSS3D has a complex search space composed of sampling and architectural parameters that can form 2.88 * 10^17 possible networks. To further reduce search time, SSS3D splits the complete search space and introduces a two-stage search that finds optimal subnetworks in 54% of the time required by single-stage searches.

2023-04-21

ArXiv (prépublication)

doi.org

arxiv.org

FMAS: Fast Multi-Objective SuperNet Architecture Search for Semantic Segmentation

Zhuoran Xiong

Marihan Amein

Olivier Therrien

Warren Gross

Brett Meyer

2023-03-29

ArXiv (prépublication)

doi.org

arxiv.org

Memory-Efficient FPGA Implementation of Stochastic Simulated Annealing

Duckgyu Shin

Naoya Onizawa

Warren Gross

Takahiro Hanyu

Simulated annealing (SA) is a well-known algorithm for solving combinatorial optimization problems. However, the computation time of SA incr… (voir plus)eases rapidly, as the size of the problem grows. Recently, a stochastic simulated annealing (SSA) algorithm that converges faster than conventional SA has been reported. In this paper, we present a hardware-aware SSA (HA-SSA) algorithm for memory-efficient FPGA implementations. HA-SSA can reduce the memory usage of storing intermediate results while maintaining the computing speed of SSA. For evaluation purposes, the proposed algorithm is compared with the conventional SSA and SA approaches on maximum cut combinatorial optimization problems. HA-SSA achieves a convergence speed that is up to 114-times faster than that of the conventional SA algorithm depending on the maximum cut problem selected from the G-set which is a dataset of the maximum cut problems. HA-SSA is implemented on a field-programmable gate array (FPGA) (Xilinx Kintex-7), and it achieves up to 6-times the memory efficiency of conventional SSA while maintaining high solution quality for optimization problems.

2023-03-01

IEEE Journal on Emerging and Selected Topics in Circuits and Systems (publié)

doi.org

2023 S TOCHASTIC S IMULATED Q UANTUM A NNEALING FOR F AST S OLVING C OMBINATORIAL O PTIMIZATION P ROBLEMS

Naoya Onizawa

Ryoma Sasaki

Duckgyu Shin

Warren Gross

Takahiro Hanyu

method. Additionally, it can handle a 100-times larger problem size compared to QA and a 25-times larger problem size compared to a traditio… (voir plus)nal SA method, respectively, for similar convergence probabilities.

2023-01-01

(publié)

www.semanticscholar.org

2023 S TOCHASTIC Q UANTUM M ONTE C ARLO A LGORITHM FOR L ARGE -S CALE C OMBINATORIAL O PTIMIZATION P ROBLEMS

Naoya Onizawa

Ryoma Sasaki

Duckgyu Shin

Warren Gross

Takahiro Hanyu

computing. In addition, it solves problems using two orders-of-magnitude larger number of spins than the D-Wave Two QA machine.

2023-01-01

(published)

www.semanticscholar.org

2023 S TOCHASTIC Q UANTUM M ONTE C ARLO A LGORITHM FOR L ARGE -S CALE C OMBINATORIAL O PTIMIZATION P ROBLEMS

Naoya Onizawa

Ryoma Sasaki

Duckgyu Shin

Warren Gross

Takahiro Hanyu

computing. In addition, it solves problems using two orders-of-magnitude larger number of spins than the D-Wave Two QA machine.

Guessing Random Additive Noise Decoding

Syed Mohsin Abbas

Marwan Jalaleddine

Warren Gross

2023-01-01

(publié)

doi.org

List-GRAND: A Practical Way to Achieve Maximum Likelihood Decoding

Syed Mohsin Abbas

Marwan Jalaleddine

Warren Gross

Guessing random additive noise decoding (GRAND) is a recently proposed universal maximum likelihood (ML) decoder for short-length and high-r… (voir plus)ate linear block codes. Soft-GRAND (SGRAND) is a prominent soft-input GRAND variant, outperforming the other GRAND variants in decoding performance; nevertheless, SGRAND is not suitable for parallel hardware implementation. Ordered Reliability Bits-GRAND (ORBGRAND) is another soft-input GRAND variant that is suitable for parallel hardware implementation; however, it has lower decoding performance than SGRAND. In this article, we propose List-GRAND (LGRAND), a technique for enhancing the decoding performance of ORBGRAND to match the ML decoding performance of SGRAND. Numerical simulation results show that LGRAND enhances ORBGRAND’s decoding performance by 0.5–0.75 dB for channel codes of various classes at a target frame error rate (FER) of 10−7. For linear block codes of length 127/128 and different code rates, LGRAND’s VLSI implementation can achieve an average information throughput of 47.27–51.36 Gb/s. In comparison to ORBGRAND’s VLSI implementation, the proposed LGRAND hardware has a 4.84% area overhead.

2023-01-01

IEEE Transactions on Very Large Scale Integration (VLSI) Systems (publié)

doi.org

arxiv.org

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable