Portrait de Hugo Larochelle

Hugo Larochelle

Directeur scientifique, Équipe de direction
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Professeur associé, McGill University, École d'informatique
Sujets de recherche
Apprentissage profond

Biographie

Hugo Larochelle est un chercheur pionnier en apprentissage profond, leader industriel et philanthrope.

Il a commencé son parcours académique auprès de deux des « Pères fondateurs » de l'intelligence artificielle : Yoshua Bengio, son directeur de thèse à l'Université de Montréal, et Geoffrey Hinton, son superviseur postdoctoral à l'Université de Toronto.

Au fil des ans, ses recherches ont mené à plusieurs découvertes majeures présentes dans les systèmes d'IA modernes. Ses travaux sur les auto-encodeurs débruiteurs (denoising autoencoders) ont identifié la reconstruction de données brutes à partir de versions corrompues comme un paradigme clé pour l'apprentissage de représentations abstraites utiles à partir de grandes quantités de données non étiquetées. Avec des modèles tels que l'estimateur de distribution autorégressif neuronal (neural autoregressive distribution estimator) et l'auto-encodeur masqué pour l'estimation de distribution (masked autoencoder distribution estimator), il a contribué à populariser la modélisation autorégressive avec des réseaux de neurones, un paradigme aujourd'hui omniprésent dans l'IA générative. Ses travaux sur l'apprentissage de nouvelles tâches sans données (Zero-Data Learning of New Tasks) ont introduit pour la première fois le concept aujourd'hui courant d'apprentissage zero-shot.

Il a ensuite transposé son expertise académique à l'industrie en cofondant la startup Whetlab, qui a été rachetée par Twitter en 2015. Après avoir travaillé chez Twitter Cortex, il a été recruté pour diriger le laboratoire de recherche en IA de Google à Montréal (Google Brain), maintenant intégré à Google DeepMind. Il est maintenant professeur associé à l'Université de Montréal et à l'Université McGill. Il a également développé une série de cours en ligne gratuits sur l’apprentissage automatique.

Père de quatre enfants, Hugo Larochelle et sa conjointe, Angèle St-Pierre, ont également fait de multiples dons à l'Université de Montréal, à l'Université de Sherbrooke (où il a été professeur) et l’Université Laval pour soutenir les étudiantes et étudiants et faire avancer la recherche, particulièrement dans le domaine de l'IA pour l’environnement. Il a également initié la conférence TechAide, qui mobilise la communauté technologique de Montréal pour amasser des fonds pour Centraide, soutenant ainsi la mission de l'organisme de bienfaisance de lutter contre la pauvreté et l'exclusion sociale.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Maîtrise professionnelle - McGill
Collaborateur·rice alumni - UdeM
Superviseur⋅e principal⋅e :
Postdoctorat - Polytechnique
Superviseur⋅e principal⋅e :

Publications

Many-Shot In-Context Learning
Avi Singh
Lei M Zhang
Bernd Bohnet
Stephanie C.Y. Chan
Luis Rosias
Biao Zhang
Zaheer Abbas
Azade Nova
John D Co-Reyes
Eric Chu
Feryal Behbahani
Aleksandra Faust
Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, w… (voir plus)ithout any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples – the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated outputs. To mitigate this limitation, we explore two new settings: (1) "Reinforced ICL" that uses model-generated chain-of-thought rationales in place of human rationales, and (2) "Unsupervised ICL" where we remove rationales from the prompt altogether, and prompts the model only with domain-specific inputs. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. We demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases, can learn high-dimensional functions with numerical inputs, and performs comparably to supervised fine-tuning. Finally, we reveal the limitations of next-token prediction loss as an indicator of downstream ICL performance.
Optimisation of quantitative brain diffusion-relaxation MRI acquisition protocols with physics-informed machine learning.
Álvaro Planchuelo-Gómez
Maxime Descoteaux
Jana Hutter
Derek K. Jones
C. Tax
A density estimation perspective on learning from pairwise human preferences
Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in tr… (voir plus)aining large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted to maximize the rewards, often under additional regularization constraints. We propose an alternative interpretation which centers on the generative process for pairwise preferences and treats LHF as a density estimation problem. We provide theoretical and empirical results showing that for a family of generative processes defined via preference behavior distribution equations, training a reward function on pairwise preferences effectively models an annotator's implicit preference distribution. Finally, we discuss and present findings on"annotator misspecification"-- failure cases where wrong modeling assumptions are made about annotator behavior, resulting in poorly-adapted models -- suggesting that approaches that learn from pairwise human preferences could have trouble learning from a population of annotators with diverse viewpoints.
Consolidating Separate Degradations Model via Weights Fusion and Distillation
Dinesh Daultani
Real-world images prevalently contain different varieties of degradation, such as motion blur and luminance noise. Computer vision recogniti… (voir plus)on models trained on clean images perform poorly on degraded images. Previously, several works have explored how to perform image classification of degraded images while training a single model for each degradation. Nevertheless, it becomes challenging to host several degradation models for each degradation on limited hardware applications and to estimate degradation parameters correctly at the run-time. This work proposes a method for effectively combining several models trained separately on different degradations into a single model to classify images with different types of degradations. Our proposed method is four-fold: (1) train a base model on clean images, (2) fine-tune the base model in-dividually for all given image degradations, (3) perform a fusion of weights given the fine-tuned models for individual degradations, (4) perform fine-tuning on given task using distillation and cross-entropy loss. Our proposed method can outperform previous state-of-the-art methods of pretraining in out-of-distribution generalization based on degradations such as JPEG compression, salt-and-pepper noise, Gaussian blur, and additive white Gaussian noise by 2.5% on CIFAR-100 dataset and by 1.3% on CIFAR-10 dataset. Moreover, our proposed method can handle degra-dation used for training without any explicit information about degradation at the inference time. Code will be available at https://github.com/dineshdaultani/FusionDistill.
Low Compute Unlearning via Sparse Representations
Ashish Malik
Michael Curtis Mozer
Sanjeev Arora
Machine unlearning, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible … (voir plus)using existing techniques. We propose a low-compute unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's performance on the rest of the dataset. We evaluate the proposed technique on the problem of class unlearning using four datasets: CIFAR-10, CIFAR-100, LACUNA-100 and ImageNet-1k. We compare the proposed technique to SCRUB, a state-of-the-art approach which uses knowledge distillation for unlearning. Across all four datasets, the proposed technique performs as well as, if not better than SCRUB while incurring almost no computational cost.
SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data
Mélisande Teng
Amna Elmustafa
Benjamin Akera
Hager Radi Abdelwahed
Neural Causal Structure Discovery from Interventions
Nan Rosemary Ke
Bernhard Schölkopf
Michael Curtis Mozer
Christopher Pal
Recent promising results have generated a surge of interest in continuous optimization methods for causal discovery from observational data.… (voir plus) However, there are theoretical limitations on the identifiability of underlying structures obtained solely from observational data. Interventional data, on the other hand, provides richer information about the underlying data-generating process. Nevertheless, extending and applying methods designed for observational data to include interventions is a challenging problem. To address this issue, we propose a general framework based on neural networks to develop models that incorporate both observational and interventional data. Notably, our method can handle the challenging and realistic scenario where the identity of the intervened upon variable is unknown. We evaluate our proposed approach in the context of graph recovery, both de novo and from a partially-known edge set. Our method achieves strong benchmark results on various structure learning tasks, including structure recovery of synthetic graphs as well as standard graphs from the Bayesian Network Repository.
Repository-Level Prompt Generation for Large Language Models of Code
Disha Shrivastava
Daniel Tarlow
With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex used in GitHub Copilot), techniques fo… (voir plus)r introducing domain-specific knowledge in the prompt design process become important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using prompt proposals. The prompt proposals take context from the entire repository, thereby incorporating both the structure of the repository and the context from other relevant files (e.g. imports, parent class files). Our technique doesn't require any access to the weights of the LLM, making it applicable in cases where we only have black-box access to the LLM. We conduct experiments on the task of single-line code-autocompletion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our prompt proposals gives a remarkably high relative improvement of 36% over Codex, showing the quality of these proposals. Further, we show that when we train a model to predict a prompt proposal, we can achieve significant performance gains over Codex and other baselines. We release our code, data, and trained checkpoints at: https://github.com/shrivastavadisha/repo_level_prompt_generation.
Bird Distribution Modelling using Remote Sensing and Citizen Science data
Mélisande Teng
Amna Elmustafa
Benjamin Akera
Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions
Rishab Goel
Dan Zheng
Daniel Tarlow
The execution behavior of a program often depends on external resources, such as program inputs or file contents, and so cannot be run in is… (voir plus)olation. Nevertheless, software developers benefit from fast iteration loops where automated tools identify errors as early as possible, even before programs can be compiled and run. This presents an interesting machine learning challenge: can we predict runtime errors in a"static"setting, where program execution is not possible? Here, we introduce a real-world dataset and task for predicting runtime errors, which we show is difficult for generic models like Transformers. We approach this task by developing an interpreter-inspired architecture with an inductive bias towards mimicking program executions, which models exception handling and"learns to execute"descriptions of the contents of external resources. Surprisingly, we show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error. In total, we present a practical and difficult-yet-approachable challenge problem related to learning program execution and we demonstrate promising new capabilities of interpreter-inspired machine learning models for code.
Survey of Scientific Rigor Studied in Machine Learning
D. Sculley
Gary R. Holt
Daniel R. Golovin
Eugene V. Davydov
Todd Phillips
Dietmar Ebner
Michael Young
Jean-francois Crespo
Dan Dennison
Emily Fox
The concern that Artificial Intelligence (AI) and Machine Learning (ML) are entering a “reproducibility crisis” has spurred significant … (voir plus)research in the past few years. Yet with each paper, it is often unclear what someone means by “reproducibility” and where it fits in the larger scope of what we will call the “scientific rigor” literature. Ultimately, the lack of clear rigor standards can affect the manner in which businesses seeking to adopt AI/ML implement such capabilities. In this survey, we will use 66 papers published since 2017 to construct a proposed set of 8 high-level categories of scientific rigor, what they are, and the history of work conducted in each. Our proposal is that these eight rigor types are not mutually exclusive and present a model for how they influence each other. To encourage more to study these questions, we map these rigors to the adoption process in real-world business use cases. In doing so, we can quantify gaps in the literature that suggest an under focus on the issues necessary for scientific rigor research to transition to practice
Teaching Algorithmic Reasoning via In-context Learning
Azade Nova
Behnam Neyshabur
Hanie Sedghi