Portrait of Hugo Larochelle

Hugo Larochelle

Core Industry Member
Canada CIFAR AI Chair
Adjunct professor, Université de Montréal, Depatment of Computer Science and Operations Research
Research Scientist, Google DeepMind
Research Topics
Deep Learning

Biography

I am a researcher in the Google DeepMind (previously Google Brain) team in Montréal, an adjunct professor at Université de Montréal, and a Canada CIFAR AI Chair. My research focuses on the study and development of deep learning algorithms.

Current Students

PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :

Publications

SatBird: a Dataset for Bird Species Distribution Modeling using Remote Sensing and Citizen Science Data
Mélisande Teng
Amna Elmustafa
Benjamin Akera
Hager Radi
Neural Causal Structure Discovery from Interventions
Nan Rosemary Ke
Olexa Bilaniuk
Anirudh Goyal
Stefan Bauer
Bernhard Schölkopf
Michael Curtis Mozer
Recent promising results have generated a surge of interest in continuous optimization methods for causal discovery from observational data.… (see more) However, there are theoretical limitations on the identifiability of underlying structures obtained solely from observational data. Interventional data, on the other hand, provides richer information about the underlying data-generating process. Nevertheless, extending and applying methods designed for observational data to include interventions is a challenging problem. To address this issue, we propose a general framework based on neural networks to develop models that incorporate both observational and interventional data. Notably, our method can handle the challenging and realistic scenario where the identity of the intervened upon variable is unknown. We evaluate our proposed approach in the context of graph recovery, both de novo and from a partially-known edge set. Our method achieves strong benchmark results on various structure learning tasks, including structure recovery of synthetic graphs as well as standard graphs from the Bayesian Network Repository.
Bird Distribution Modelling using Remote Sensing and Citizen Science data
Mélisande Teng
Amna Elmustafa
Benjamin Akera
Repository-Level Prompt Generation for Large Language Models of Code
Disha Shrivastava
Danny Tarlow
With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex used in GitHub Copilot), techniques fo… (see more)r introducing domain-specific knowledge in the prompt design process become important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using prompt proposals. The prompt proposals take context from the entire repository, thereby incorporating both the structure of the repository and the context from other relevant files (e.g. imports, parent class files). Our technique doesn't require any access to the weights of the LLM, making it applicable in cases where we only have black-box access to the LLM. We conduct experiments on the task of single-line code auto-completion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our prompt proposals gives a relative improvement of 36% over Codex, showing the quality of these proposals. Further, we show that when we train a model to predict a prompt proposal, we can achieve significant performance gains over Codex and other baselines. We release our code, data, and trained checkpoints at: https://github.com/shrivastavadisha/repo_level_prompt_generation.
Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions
David Bieber
Rishab Goel
Dan Zheng
Daniel Tarlow
The execution behavior of a program often depends on external resources, such as program inputs or file contents, and so cannot be run in is… (see more)olation. Nevertheless, software developers benefit from fast iteration loops where automated tools identify errors as early as possible, even before programs can be compiled and run. This presents an interesting machine learning challenge: can we predict runtime errors in a"static"setting, where program execution is not possible? Here, we introduce a real-world dataset and task for predicting runtime errors, which we show is difficult for generic models like Transformers. We approach this task by developing an interpreter-inspired architecture with an inductive bias towards mimicking program executions, which models exception handling and"learns to execute"descriptions of the contents of external resources. Surprisingly, we show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error. In total, we present a practical and difficult-yet-approachable challenge problem related to learning program execution and we demonstrate promising new capabilities of interpreter-inspired machine learning models for code.
Neural Causal Structure Discovery from Interventions
Nan Rosemary Ke
Olexa Bilaniuk
Anirudh Goyal
Stefan Bauer
Bernhard Schölkopf
Michael Curtis Mozer
Recent promising results have generated a surge of interest in continuous optimization methods for causal discovery from observational data.… (see more) However, there are theoretical limitations on the identifiability of underlying structures obtained solely from observational data. Interventional data, on the other hand, provides richer information about the underlying data-generating process. Nevertheless, extending and applying methods designed for observational data to include interventions is a challenging problem. To address this issue, we propose a general framework based on neural networks to develop models that incorporate both observational and interventional data. Notably, our method can handle the challenging and realistic scenario where the identity of the intervened upon variable is unknown. We evaluate our proposed approach in the context of graph recovery, both de novo and from a partially-known edge set. Our method achieves strong benchmark results on various structure learning tasks, including structure recovery of synthetic graphs as well as standard graphs from the Bayesian Network Repository
Survey of Scientific Rigor Studied in Machine Learning
D. Sculley
Gary R. Holt
Daniel R. Golovin
Eugene V. Davydov
Todd Phillips
Dietmar Ebner
Michael Young
Jean-francois Crespo
Dan Dennison
Emily Fox
The concern that Artificial Intelligence (AI) and Machine Learning (ML) are entering a “reproducibility crisis” has spurred significant … (see more)research in the past few years. Yet with each paper, it is often unclear what someone means by “reproducibility” and where it fits in the larger scope of what we will call the “scientific rigor” literature. Ultimately, the lack of clear rigor standards can affect the manner in which businesses seeking to adopt AI/ML implement such capabilities. In this survey, we will use 66 papers published since 2017 to construct a proposed set of 8 high-level categories of scientific rigor, what they are, and the history of work conducted in each. Our proposal is that these eight rigor types are not mutually exclusive and present a model for how they influence each other. To encourage more to study these questions, we map these rigors to the adoption process in real-world business use cases. In doing so, we can quantify gaps in the literature that suggest an under focus on the issues necessary for scientific rigor research to transition to practice
Teaching Algorithmic Reasoning via In-context Learning
Hattie Zhou
Azade Nova
Behnam Neyshabur
Hanie Sedghi
Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning
Utku Evci
Vincent Dumoulin
Michael Curtis Mozer
Uniform Priors for Data-Efficient Learning
Samarth Sinha
Karsten Roth
Anirudh Goyal
Marzyeh Ghassemi
Zeynep Akata
Animesh Garg
Few or zero-shot adaptation to novel tasks is important for the scalability and deployment of machine learning models. It is therefore cruci… (see more)al to find properties that encourage more transferable features in deep networks for generalization. In this paper, we show that models that learn uniformly distributed features from the training data, are able to perform better transfer learning at test-time. Motivated by this, we evaluate our method: uniformity regularization (UR) on its ability to facilitate adaptation to unseen tasks and data on six distinct domains: Few-Learning with Images, Few-shot Learning with Language, Deep Metric Learning, 0-Shot Domain Adaptation, Out-of-Distribution classification, and Neural Radiance Fields. Across all experiments, we show that using UR, we are able to learn robust vision systems which consistently offer benefits over baselines trained without uniformity regularization and are able to achieve state-of-the-art performance in Deep Metric Learning, Few-shot learning with images and language.
Matching Feature Sets for Few-Shot Image Classification
Arman Afrasiyabi
Jean‐François Lalonde
In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classific… (see more)ation methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of images from the base classes, which can subsequently better transfer to the few-shot classes. To do so, we propose to adapt existing feature extractors to instead produce sets of feature vectors from images. Our approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside existing encoder architectures. The attention modules are lightweight, and as such our method results in encoders that have approximately the same number of parameters as their original versions. During training and inference, a set-to-set matching metric is used to perform image classification. The effectiveness of our proposed architecture and metrics is demonstrated via thorough experiments on standard few-shot datasets-namely miniImageNet, tieredImageNet, and CUB-in both the 1- and 5-shot scenarios. In all cases but one, our method outperforms the state-of-the-art.
Matching Feature Sets for Few-Shot Image Classification
Arman Afrasiyabi
Jean‐François Lalonde
In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classific… (see more)ation methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of images from the base classes, which can subsequently better transfer to the few-shot classes. To do so, we propose to adapt existing feature extractors to instead produce sets of feature vectors from images. Our approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside existing encoder architectures. The attention modules are lightweight, and as such our method results in encoders that have approximately the same number of parameters as their original versions. During training and inference, a set-to-set matching metric is used to perform image classification. The effectiveness of our proposed architecture and metrics is demonstrated via thorough experiments on standard few-shot datasets-namely miniImageNet, tieredImageNet, and CUB-in both the 1- and 5-shot scenarios. In all cases but one, our method outperforms the state-of-the-art.