Portrait de Hugo Larochelle

Hugo Larochelle

Membre industriel principal
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Professeur associé, McGill University, École d'informatique
Directeur scientifique, Équipe de direction
Sujets de recherche
Apprentissage profond

Biographie

Hugo Larochelle est un chercheur pionnier en apprentissage profond, leader industriel et philanthrope.

Il a commencé son parcours académique auprès de deux des « Pères fondateurs » de l'intelligence artificielle : Yoshua Bengio, son directeur de thèse à l'Université de Montréal, et Geoffrey Hinton, son superviseur postdoctoral à l'Université de Toronto.

Au fil des ans, ses recherches ont mené à plusieurs découvertes majeures présentes dans les systèmes d'IA modernes. Ses travaux sur les auto-encodeurs débruiteurs (denoising autoencoders) ont identifié la reconstruction de données brutes à partir de versions corrompues comme un paradigme clé pour l'apprentissage de représentations abstraites utiles à partir de grandes quantités de données non étiquetées. Avec des modèles tels que l'estimateur de distribution autorégressif neuronal (neural autoregressive distribution estimator) et l'auto-encodeur masqué pour l'estimation de distribution (masked autoencoder distribution estimator), il a contribué à populariser la modélisation autorégressive avec des réseaux de neurones, un paradigme aujourd'hui omniprésent dans l'IA générative. Ses travaux sur l'apprentissage de nouvelles tâches sans données (Zero-Data Learning of New Tasks) ont introduit pour la première fois le concept aujourd'hui courant d'apprentissage zero-shot.

Il a ensuite transposé son expertise académique à l'industrie en cofondant la startup Whetlab, qui a été rachetée par Twitter en 2015. Après avoir travaillé chez Twitter Cortex, il a été recruté pour diriger le laboratoire de recherche en IA de Google à Montréal (Google Brain), maintenant intégré à Google DeepMind. Il est maintenant professeur associé à l'Université de Montréal et à l'Université McGill. Il a également développé une série de cours en ligne gratuits sur l’apprentissage automatique.

Père de quatre enfants, Hugo Larochelle et sa conjointe, Angèle St-Pierre, ont également fait de multiples dons à l'Université de Montréal, à l'Université de Sherbrooke (où il a été professeur) et l’Université Laval pour soutenir les étudiantes et étudiants et faire avancer la recherche, particulièrement dans le domaine de l'IA pour l’environnement. Il a également initié la conférence TechAide, qui mobilise la communauté technologique de Montréal pour amasser des fonds pour Centraide, soutenant ainsi la mission de l'organisme de bienfaisance de lutter contre la pauvreté et l'exclusion sociale.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Postdoctorat - Polytechnique
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :

Publications

Teaching Algorithmic Reasoning via In-context Learning
Azade Nova
Behnam Neyshabur
Hanie Sedghi
Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning
Utku Evci
Michael Curtis Mozer
Uniform Priors for Data-Efficient Learning
Samarth Sinha
Marzyeh Ghassemi
Zeynep Akata
Animesh Garg
Few or zero-shot adaptation to novel tasks is important for the scalability and deployment of machine learning models. It is therefore cruci… (voir plus)al to find properties that encourage more transferable features in deep networks for generalization. In this paper, we show that models that learn uniformly distributed features from the training data, are able to perform better transfer learning at test-time. Motivated by this, we evaluate our method: uniformity regularization (UR) on its ability to facilitate adaptation to unseen tasks and data on six distinct domains: Few-Learning with Images, Few-shot Learning with Language, Deep Metric Learning, 0-Shot Domain Adaptation, Out-of-Distribution classification, and Neural Radiance Fields. Across all experiments, we show that using UR, we are able to learn robust vision systems which consistently offer benefits over baselines trained without uniformity regularization and are able to achieve state-of-the-art performance in Deep Metric Learning, Few-shot learning with images and language.
Matching Feature Sets for Few-Shot Image Classification
In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classific… (voir plus)ation methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of images from the base classes, which can subsequently better transfer to the few-shot classes. To do so, we propose to adapt existing feature extractors to instead produce sets of feature vectors from images. Our approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside existing encoder architectures. The attention modules are lightweight, and as such our method results in encoders that have approximately the same number of parameters as their original versions. During training and inference, a set-to-set matching metric is used to perform image classification. The effectiveness of our proposed architecture and metrics is demonstrated via thorough experiments on standard few-shot datasets-namely miniImageNet, tieredImageNet, and CUB-in both the 1- and 5-shot scenarios. In all cases but one, our method outperforms the state-of-the-art.
Matching Feature Sets for Few-Shot Image Classification
In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classific… (voir plus)ation methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of images from the base classes, which can subsequently better transfer to the few-shot classes. To do so, we propose to adapt existing feature extractors to instead produce sets of feature vectors from images. Our approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside existing encoder architectures. The attention modules are lightweight, and as such our method results in encoders that have approximately the same number of parameters as their original versions. During training and inference, a set-to-set matching metric is used to perform image classification. The effectiveness of our proposed architecture and metrics is demonstrated via thorough experiments on standard few-shot datasets-namely miniImageNet, tieredImageNet, and CUB-in both the 1- and 5-shot scenarios. In all cases but one, our method outperforms the state-of-the-art.
Fortuitous Forgetting in Connectionist Networks
Forgetting is often seen as an unwanted characteristic in both human and machine learning. However, we propose that forgetting can in fact b… (voir plus)e favorable to learning. We introduce"forget-and-relearn"as a powerful paradigm for shaping the learning trajectories of artificial neural networks. In this process, the forgetting step selectively removes undesirable information from the model, and the relearning step reinforces features that are consistently useful under different conditions. The forget-and-relearn framework unifies many existing iterative training algorithms in the image classification and language emergence literature, and allows us to understand the success of these algorithms in terms of the disproportionate forgetting of undesirable information. We leverage this understanding to improve upon existing algorithms by designing more targeted forgetting operations. Insights from our analysis provide a coherent view on the dynamics of iterative training in neural networks and offer a clear path towards performance improvements.
Learning to Combine Per-Example Solutions for Neural Program Synthesis
Disha Shrivastava
Daniel Tarlow
The goal of program synthesis from examples is to find a computer program that is consistent with a given set of input-output examples. Most… (voir plus) learning-based approaches try to find a program that satisfies all examples at once. Our work, by contrast, considers an approach that breaks the problem into two stages: (a) find programs that satisfy only one example, and (b) leverage these per-example solutions to yield a program that satisfies all examples. We introduce the Cross Aggregator neural network module based on a multi-head attention mechanism that learns to combine the cues present in these per-example solutions to synthesize a global solution. Evaluation across programs of different lengths and under two different experimental settings reveal that when given the same time budget, our technique significantly improves the success rate over PCCoder [Zohar et. al 2018] and other ablation baselines.
Impact of Aliasing on Generalization in Deep Convolutional Networks
Cristina Vasconcelos
Rob Romijnders
We investigate the impact of aliasing on generalization in Deep Convolutional Networks and show that data augmentation schemes alone are una… (voir plus)ble to prevent it due to structural limitations in widely used architectures. Drawing insights from frequency analysis theory, we take a closer look at ResNet and EfficientNet architectures and review the trade-off between aliasing and information loss in each of their major components. We show how to mitigate aliasing by inserting non-trainable low-pass filters at key locations, particularly where networks lack the capacity to learn them. These simple architectural changes lead to substantial improvements in generalization on i.i.d. and even more on out-of-distribution conditions, such as image classification under natural corruptions on ImageNet-C [11] and few-shot learning on Meta-Dataset [26]. State-of-the art results are achieved on both datasets without introducing additional trainable parameters and using the default hyper-parameters of open source codebases.
DIBS: Diversity inducing Information Bottleneck in Model Ensembles
Samarth Sinha
Animesh Garg
Florian Shkurti
Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark
Neil Houlsby
Utku Evci
Xiaohua Zhai
Sylvain Gelly
Meta and transfer learning are two successful families of approaches to few-shot learning. Despite highly related goals, state-of-the-art ad… (voir plus)vances in each family are measured largely in isolation of each other. As a result of diverging evaluation norms, a direct or thorough comparison of different approaches is challenging. To bridge this gap, we perform a cross-family study of the best transfer and meta learners on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB). We find that, on average, large-scale transfer methods (Big Transfer, BiT) outperform competing approaches on MD, even when trained only on ImageNet. In contrast, meta-learning approaches struggle to compete on VTAB when trained and validated on MD. However, BiT is not without limitations, and pushing for scale does not improve performance on highly out-of-distribution MD tasks. In performing this study, we reveal a number of discrepancies in evaluation norms and study some of these in light of the performance gap. We hope that this work facilitates sharing of insights from each community, and accelerates progress on few-shot learning.
Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)
Philippe Vincent‐lamarre
Vincent Larivière
Alina Beygelzimer
Florence D'alche-buc
E. Fox
Learning a Universal Template for Few-shot Dataset Generalization
Eleni Triantafillou
Richard Zemel