Portrait de Hugo Larochelle

Hugo Larochelle

Membre industriel principal
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Chercheur scientifique
Directeur scientifique, Équipe de direction
Sujets de recherche
Apprentissage profond

Biographie

Hugo Larochelle est un chercheur pionnier en apprentissage profond, leader industriel et philanthrope.

Il a commencé son parcours académique auprès de deux des « Pères fondateurs » de l'intelligence artificielle : Yoshua Bengio, son directeur de thèse à l'Université de Montréal, et Geoffrey Hinton, son superviseur postdoctoral à l'Université de Toronto.

Au fil des ans, ses recherches ont mené à plusieurs découvertes majeures présentes dans les systèmes d'IA modernes. Ses travaux sur les auto-encodeurs débruiteurs (denoising autoencoders) ont identifié la reconstruction de données brutes à partir de versions corrompues comme un paradigme clé pour l'apprentissage de représentations abstraites utiles à partir de grandes quantités de données non étiquetées. Avec des modèles tels que l'estimateur de distribution autorégressif neuronal (neural autoregressive distribution estimator) et l'auto-encodeur masqué pour l'estimation de distribution (masked autoencoder distribution estimator), il a contribué à populariser la modélisation autorégressive avec des réseaux de neurones, un paradigme aujourd'hui omniprésent dans l'IA générative. Ses travaux sur l'apprentissage de nouvelles tâches sans données (Zero-Data Learning of New Tasks) ont introduit pour la première fois le concept aujourd'hui courant d'apprentissage zero-shot.

Il a ensuite transposé son expertise académique à l'industrie en cofondant la startup Whetlab, qui a été rachetée par Twitter en 2015. Après avoir travaillé chez Twitter Cortex, il a été recruté pour diriger le laboratoire de recherche en IA de Google à Montréal (Google Brain), maintenant intégré à Google DeepMind. Il est professeur associé à l'Université de Montréal où il mentore la prochaine génération de chercheuses et chercheurs en IA. Il a également développé une série de cours en ligne gratuits sur l’apprentissage automatique.

Père de quatre enfants, Hugo Larochelle et sa conjointe, Angèle St-Pierre, ont également fait de multiples dons à l'Université de Montréal, à l'Université de Sherbrooke (où il a été professeur) et l’Université Laval pour soutenir les étudiantes et étudiants et faire avancer la recherche, particulièrement dans le domaine de l'IA pour l’environnement. Il a également initié la conférence TechAide, qui mobilise la communauté technologique de Montréal pour amasser des fonds pour Centraide, soutenant ainsi la mission de l'organisme de bienfaisance de lutter contre la pauvreté et l'exclusion sociale.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :

Publications

Matching Feature Sets for Few-Shot Image Classification
Arman Afrasiyabi
Jean‐François Lalonde
In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classific… (voir plus)ation methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of images from the base classes, which can subsequently better transfer to the few-shot classes. To do so, we propose to adapt existing feature extractors to instead produce sets of feature vectors from images. Our approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside existing encoder architectures. The attention modules are lightweight, and as such our method results in encoders that have approximately the same number of parameters as their original versions. During training and inference, a set-to-set matching metric is used to perform image classification. The effectiveness of our proposed architecture and metrics is demonstrated via thorough experiments on standard few-shot datasets-namely miniImageNet, tieredImageNet, and CUB-in both the 1- and 5-shot scenarios. In all cases but one, our method outperforms the state-of-the-art.
Matching Feature Sets for Few-Shot Image Classification
Arman Afrasiyabi
Jean‐François Lalonde
In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classific… (voir plus)ation methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of images from the base classes, which can subsequently better transfer to the few-shot classes. To do so, we propose to adapt existing feature extractors to instead produce sets of feature vectors from images. Our approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside existing encoder architectures. The attention modules are lightweight, and as such our method results in encoders that have approximately the same number of parameters as their original versions. During training and inference, a set-to-set matching metric is used to perform image classification. The effectiveness of our proposed architecture and metrics is demonstrated via thorough experiments on standard few-shot datasets-namely miniImageNet, tieredImageNet, and CUB-in both the 1- and 5-shot scenarios. In all cases but one, our method outperforms the state-of-the-art.
Fortuitous Forgetting in Connectionist Networks
Forgetting is often seen as an unwanted characteristic in both human and machine learning. However, we propose that forgetting can in fact b… (voir plus)e favorable to learning. We introduce"forget-and-relearn"as a powerful paradigm for shaping the learning trajectories of artificial neural networks. In this process, the forgetting step selectively removes undesirable information from the model, and the relearning step reinforces features that are consistently useful under different conditions. The forget-and-relearn framework unifies many existing iterative training algorithms in the image classification and language emergence literature, and allows us to understand the success of these algorithms in terms of the disproportionate forgetting of undesirable information. We leverage this understanding to improve upon existing algorithms by designing more targeted forgetting operations. Insights from our analysis provide a coherent view on the dynamics of iterative training in neural networks and offer a clear path towards performance improvements.
Learning to Combine Per-Example Solutions for Neural Program Synthesis
Disha Shrivastava
Daniel Tarlow
The goal of program synthesis from examples is to find a computer program that is consistent with a given set of input-output examples. Most… (voir plus) learning-based approaches try to find a program that satisfies all examples at once. Our work, by contrast, considers an approach that breaks the problem into two stages: (a) find programs that satisfy only one example, and (b) leverage these per-example solutions to yield a program that satisfies all examples. We introduce the Cross Aggregator neural network module based on a multi-head attention mechanism that learns to combine the cues present in these per-example solutions to synthesize a global solution. Evaluation across programs of different lengths and under two different experimental settings reveal that when given the same time budget, our technique significantly improves the success rate over PCCoder [Zohar et. al 2018] and other ablation baselines.
Impact of Aliasing on Generalization in Deep Convolutional Networks
Cristina Vasconcelos
Vincent Dumoulin
Rob Romijnders
We investigate the impact of aliasing on generalization in Deep Convolutional Networks and show that data augmentation schemes alone are una… (voir plus)ble to prevent it due to structural limitations in widely used architectures. Drawing insights from frequency analysis theory, we take a closer look at ResNet and EfficientNet architectures and review the trade-off between aliasing and information loss in each of their major components. We show how to mitigate aliasing by inserting non-trainable low-pass filters at key locations, particularly where networks lack the capacity to learn them. These simple architectural changes lead to substantial improvements in generalization on i.i.d. and even more on out-of-distribution conditions, such as image classification under natural corruptions on ImageNet-C [11] and few-shot learning on Meta-Dataset [26]. State-of-the art results are achieved on both datasets without introducing additional trainable parameters and using the default hyper-parameters of open source codebases.
DIBS: Diversity inducing Information Bottleneck in Model Ensembles
Samarth Sinha
Homanga Bharadhwaj
Anirudh Goyal
Animesh Garg
Florian Shkurti
Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark
Vincent Dumoulin
Neil Houlsby
Utku Evci
Xiaohua Zhai
Sylvain Gelly
Meta and transfer learning are two successful families of approaches to few-shot learning. Despite highly related goals, state-of-the-art ad… (voir plus)vances in each family are measured largely in isolation of each other. As a result of diverging evaluation norms, a direct or thorough comparison of different approaches is challenging. To bridge this gap, we perform a cross-family study of the best transfer and meta learners on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB). We find that, on average, large-scale transfer methods (Big Transfer, BiT) outperform competing approaches on MD, even when trained only on ImageNet. In contrast, meta-learning approaches struggle to compete on VTAB when trained and validated on MD. However, BiT is not without limitations, and pushing for scale does not improve performance on highly out-of-distribution MD tasks. In performing this study, we reveal a number of discrepancies in evaluation norms and study some of these in light of the performance gap. We hope that this work facilitates sharing of insights from each community, and accelerates progress on few-shot learning.
Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)
Philippe Vincent‐lamarre
Koustuv Sinha
Vincent Larivière
Alina Beygelzimer
Florence D'alche-buc
E. Fox
Learning a Universal Template for Few-shot Dataset Generalization
Eleni Triantafillou
Richard Zemel
Vincent Dumoulin
A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches
Vincent Dumoulin
Neil Houlsby
Utku Evci
Xiaohua Zhai
Sylvain Gelly
Meta and transfer learning are two successful families of approaches to few-shot 1 learning. Despite highly related goals, state-of-the-art … (voir plus)advances in each family are 2 measured largely in isolation of each other. As a result of diverging evaluation 3 norms, a direct or thorough comparison of different approaches is challenging. 4 To bridge this gap, we introduce a few-shot classification evaluation protocol 5 named VTAB+MD with the explicit goal of facilitating sharing of insights from 6 each community. We demonstrate its accessibility in practice by performing a 7 cross-family study of the best transfer and meta learners which report on both a 8 large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning 9 benchmark (Visual Task Adaptation Benchmark, VTAB). We find that, on average, 10 large-scale transfer methods (Big Transfer, BiT) outperform competing approaches 11 on MD, even when trained only on ImageNet. In contrast, meta-learning approaches 12 struggle to compete on VTAB when trained and validated on MD. However, BiT 13 is not without limitations, and pushing for scale does not improve performance 14 on highly out-of-distribution MD tasks. We hope that this work contributes to 15 accelerating progress on few-shot learning research. 16
A Universal Representation Transformer Layer for Few-Shot Image Classification
Li Li
William L. Hamilton
Guodong Long
Jing Jiang
Few-shot classification aims to recognize unseen classes when presented with only a small number of samples. We consider the problem of mult… (voir plus)i-domain few-shot image classification, where unseen classes and examples come from diverse data sources. This problem has seen growing interest and has inspired the development of benchmarks such as Meta-Dataset. A key challenge in this multi-domain setting is to effectively integrate the feature representations from the diverse set of training domains. Here, we propose a Universal Representation Transformer (URT) layer, that meta-learns to leverage universal features for few-shot classification by dynamically re-weighting and composing the most appropriate domain-specific representations. In experiments, we show that URT sets a new state-of-the-art result on Meta-Dataset. Specifically, it achieves top-performance on the highest number of data sources compared to competing methods. We analyze variants of URT and present a visualization of the attention score heatmaps that sheds light on how the model performs cross-domain generalization.
Revisiting Fundamentals of Experience Replay
William Fedus
Prajit Ramachandran
Mark Rowland
Will Dabney
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understa… (voir plus)nding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and ablative studies upend conventional wisdom around experience replay -- greater capacity is found to substantially increase the performance of certain algorithms, while leaving others unaffected. Counterintuitively we show that theoretically ungrounded, uncorrected n-step returns are uniquely beneficial while other techniques confer limited benefit for sifting through larger memory. Separately, by directly controlling the replay ratio we contextualize previous observations in the literature and empirically measure its importance across a variety of deep RL algorithms. Finally, we conclude by testing a set of hypotheses on the nature of these performance benefits.