Warren Gross

Associate Academic Member

Professor, McGill University, Department of Electrical and Computer Engineering

Research Topics

Computer Systems

Deep Learning

Information Theory

Natural Language Processing

Optimization

Biography

Warren Gross is a James McGill Professor and chair of the Department of Electrical and Computer Engineering at McGill University.

His research interests lie in bridging algorithms and implementation in machine learning and digital communications. His work focuses on efficient deep learning models, hardware for machine learning, stochastic computing, hardware-aware design-space exploration for neural networks, machine learning for digital communications, and efficient decoding algorithms and hardware for error-correcting codes.

Current Students

Fahad Rahman Amik

PhD - McGill University

Website

Github

Google Scholar

Mohammadreza Tayaranian Hosseini

PhD - McGill University

Website

Github

Google Scholar

Hang Zhang

PhD - McGill University

Github

Google Scholar

Publications

Parameter Efficient Fine-tuning of Transformer-Based Language Models Using Dataset Pruning

Sayed Mohammadreza Tayaranian Hosseini

Seyyed Hasan Mozafari

James Clark

Brett Meyer

Warren Gross

The widespread use of transformer-based language models is in part owed to their ease of adaptation to various tasks. Fine-tuning is a metho… (see more)d of adapting pre-trained language models to a downstream task. The resource requirements for fine-tuning, although still less than pre-training, has been increasing due to the significant growth in the number of parameters of language models. Parameter efficient fine-tuning methods limit the set of model parameters that are updated during fine-tuning, leading to reductions in both memory usage and fine-tuning time. Dataset pruning is another method of efficient fine-tuning which removes training data points, thus reducing training time, while maintaining the evaluation performance of the fine-tuned model. In this work, we apply dataset pruning on top of parameter efficient fine-tuning to further reduce the hardware requirements of the fine-tuning. Our approach benefits from lower memory usage of parameter efficient methods while addressing their long fine-tuning time with dataset pruning. On average, our proposed method uses 22% of the fine-tuning dataset while updating only 0.5% of model parameters. As a result, while achieving an evaluation performance similar to full fine-tuning, our method reduces the peak memory usage of the fine-tuning by 40% and its wall clock time by 83%.

2023-12-31

Asilomar Conference on Signals, Systems, and Computers (published)

doi.org

Fast Fine-Tuning Using Curriculum Domain Adaptation

Lulan Shen

Ibtihel Amara

Ruofeng Li

Brett Meyer

Warren Gross

James J. Clark

Current deep neural networks (DNNs) have achieved remarkable accuracy in various downstream tasks. However, their training and fine-tuning a… (see more)re challenging due to several factors, such as limited computational resources, extended training and fine-tuning times, and over-fitting due to small datasets. To address these challenges, we propose a three-stage fast fine-tuning method that efficiently trains DNNs for edge devices. Our method combines curriculum learning and domain adaptation techniques to accelerate training while achieving comparable performance. First, we develop a data curriculum approach, which ranks the dataset according to difficulty and split it into the source domain (containing easy data) and the target domain (containing difficult data). Second, we adapt the pretrained model from the source domain to the target domain using an unsupervised domain adaptation (UDA) method called Deep CORAL. Finally, we continue training the adapted model on the source domain with fewer epochs. Our method achieves high accuracy quickly on various modern neural network architectures and datasets such as CIFAR-10, CIFAR-100, and CINIC-10.

2023-05-31

Canadian Conference on Computer and Robot Vision (published)

doi.org

Conjugate Adder Net (CAddNet) - a Space-Efficient Approximate CNN

Lulan Shen

Maryam Ziaeefard

Brett Meyer

Warren Gross

James J. Clark

The AdderNet was recently developed as a way to implement deep neural networks without needing multiplication operations to combine weights … (see more)and inputs. Instead, absolute values of the difference between weights and inputs are used, greatly reducing the gate-level implementation complexity. Training of AdderNets is challenging, however, and the loss curves during training tend to fluctuate significantly. In this paper we propose the Conjugate Adder Network, or CAddNet, which uses the difference between the absolute values of conjugate pairs of inputs and the weights. We show that this can be implemented simply via a single minimum operation, resulting in a roughly 50% reduction in logic gate complexity as compared with AdderNets. The CAddNet method also stabilizes training as compared with AdderNets, yielding training curves similar to standard CNNs.

2022-05-31

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (published)

doi.org

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Warren Gross

Biography

Current Students

Publications

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Popular keywords:

Warren Gross

Biography

Current Students

Publications