Portrait de Warren Gross

Warren Gross

Membre académique associé
Professeur, McGill University, Département de génie électrique et informatique
Sujets de recherche
Apprentissage profond
Optimisation
Systèmes informatiques
Théorie de l'information
Traitement du langage naturel

Biographie

Warren Gross est professeur titulaire de la chaire James McGill et directeur du Département de génie électrique et informatique de l'Université McGill. Dans ses recherches, il s’intéresse au rapprochement entre les algorithmes et leur mise en œuvre dans les domaines de l'apprentissage automatique et des communications numériques. Ses travaux portent sur les modèles efficaces d'apprentissage profond, le matériel pour l'apprentissage automatique, l'informatique stochastique, l'exploration matérielle de l'espace de conception pour les réseaux neuronaux, l'apprentissage automatique pour les communications numériques, ainsi que les algorithmes de décodage efficaces et le matériel pour les codes correcteurs d'erreurs.

Étudiants actuels

Publications

Parameter Efficient Fine-tuning of Transformer-Based Language Models Using Dataset Pruning
Sayed Mohammadreza Tayaranian Hosseini
Seyyed Hasan Mozafari
Brett Meyer
The widespread use of transformer-based language models is in part owed to their ease of adaptation to various tasks. Fine-tuning is a metho… (voir plus)d of adapting pre-trained language models to a downstream task. The resource requirements for fine-tuning, although still less than pre-training, has been increasing due to the significant growth in the number of parameters of language models. Parameter efficient fine-tuning methods limit the set of model parameters that are updated during fine-tuning, leading to reductions in both memory usage and fine-tuning time. Dataset pruning is another method of efficient fine-tuning which removes training data points, thus reducing training time, while maintaining the evaluation performance of the fine-tuned model. In this work, we apply dataset pruning on top of parameter efficient fine-tuning to further reduce the hardware requirements of the fine-tuning. Our approach benefits from lower memory usage of parameter efficient methods while addressing their long fine-tuning time with dataset pruning. On average, our proposed method uses 22% of the fine-tuning dataset while updating only 0.5% of model parameters. As a result, while achieving an evaluation performance similar to full fine-tuning, our method reduces the peak memory usage of the fine-tuning by 40% and its wall clock time by 83%.
Fast Fine-Tuning Using Curriculum Domain Adaptation
Lulan Shen
Ruofeng Li
Brett Meyer
James J. Clark
Current deep neural networks (DNNs) have achieved remarkable accuracy in various downstream tasks. However, their training and fine-tuning a… (voir plus)re challenging due to several factors, such as limited computational resources, extended training and fine-tuning times, and over-fitting due to small datasets. To address these challenges, we propose a three-stage fast fine-tuning method that efficiently trains DNNs for edge devices. Our method combines curriculum learning and domain adaptation techniques to accelerate training while achieving comparable performance. First, we develop a data curriculum approach, which ranks the dataset according to difficulty and split it into the source domain (containing easy data) and the target domain (containing difficult data). Second, we adapt the pretrained model from the source domain to the target domain using an unsupervised domain adaptation (UDA) method called Deep CORAL. Finally, we continue training the adapted model on the source domain with fewer epochs. Our method achieves high accuracy quickly on various modern neural network architectures and datasets such as CIFAR-10, CIFAR-100, and CINIC-10.
Conjugate Adder Net (CAddNet) - a Space-Efficient Approximate CNN
Lulan Shen
Maryam Ziaeefard
Brett Meyer
James J. Clark
The AdderNet was recently developed as a way to implement deep neural networks without needing multiplication operations to combine weights … (voir plus)and inputs. Instead, absolute values of the difference between weights and inputs are used, greatly reducing the gate-level implementation complexity. Training of AdderNets is challenging, however, and the loss curves during training tend to fluctuate significantly. In this paper we propose the Conjugate Adder Network, or CAddNet, which uses the difference between the absolute values of conjugate pairs of inputs and the weights. We show that this can be implemented simply via a single minimum operation, resulting in a roughly 50% reduction in logic gate complexity as compared with AdderNets. The CAddNet method also stabilizes training as compared with AdderNets, yielding training curves similar to standard CNNs.