Portrait of Hugo Larochelle

Hugo Larochelle

Scientific Director, Leadership Team
Adjunct professor, Université de Montréal, Department of Computer Science and Operations Research
Adjunct professor, McGill University, School of Computer Science
Research Topics
Deep Learning

Biography

Hugo Larochelle is a pioneering deep learning researcher, industry leader and philanthropist.

He started his academic journey with two of the « Godfathers » of artificial intelligence: Yoshua Bengio, his Ph.D. supervisor at the Université de Montréal, and Geoffrey Hinton, his postdoctoral supervisor at the University of Toronto.

Over the years, his research has contributed several conceptual breakthroughs found in modern AI systems. His work on Denoising Autoencoders (DAE) identified the reconstruction of clean data from corrupted versions as a scalable paradigm for learning meaningful representations from large quantities of unlabeled data. With models such as the Neural Autoregressive Distribution Estimator (NADE) and the Masked Autoencoder Distribution Estimator (MADE), he helped popularize autoregressive modeling with neural networks, a paradigm now omnipresent in generative AI. And his work on Zero-Data Learning of New Tasks introduced for the first time the now common concept of zero-shot learning.

He then brought his academic expertise to the industry by co-founding the startup Whetlab, which was acquired by Twitter in 2015. After a role at Twitter Cortex, he was recruited to lead Google's AI research lab in Montreal (Google Brain), now part of Google DeepMind. He is now an Adjunct Professor at the Université de Montréal and McGill University. He has also developed a series of free online courses on machine learning.

A father of four, Hugo Larochelle and his wife, Angèle St-Pierre, have also made multiple donations to the Université de Montréal, Université de Sherbrooke (where he used to be a Professor) and Université Laval to support students and advance research, particularly in AI for environmental sustainability. He also initiated the TechAide conference, mobilizing Montreal's tech community to raise funds for the charity Centraide to support its mission to fight poverty and social exclusion.

Publications

Detoxifying LLMs via Representation Erasure-Based Preference Optimization
Large language models (LLMs) trained on webscale data can produce toxic outputs, raising concerns for safe deployment. Prior defenses, based… (see more) on applications of DPO, NPO, and similar algorithms, reduce the likelihood of harmful continuations, but not robustly so: they are vulnerable to adversarial prompting and easily undone by fine-tuning-based relearning attacks. Indeed, research has shown that these edits to the model are superficial: linear probing reveals that harmful "directions" remain present in representations. To address this, we propose Representation Erasure-based Preference Optimization (REPO), reformulating detoxification as a token-level preference problem. Using a novel objective with preference data, we force the representations of toxic continuations to converge toward their benign counterparts. Our mechanistic analysis reveals that this granular approach is critical: unlike baselines, REPO induces deep, localized edits to toxicity-encoding neurons while preserving general model utility. Exhaustive evaluations show that REPO achieves state-of-the-art robustness, stopping sophisticated threats-including relearning attacks and enhanced GCG jailbreaks-where existing representation- and output-based methods fail.
BRIDGE: Predicting Human Task Completion Time From Model Performance
Mila - Québec
AI Institute
McGill University
Polytechnique Montréal
Periodic Labs
Servicenow Research
Canada Cifar
AI Chair
Evaluating the real-world capabilities of AI systems requires grounding benchmark performance in human-interpretable measures of task diffic… (see more)ulty. Existing approaches that rely on direct human task completion time annotations are costly, noisy, and difficult to scale across benchmarks. In this work, we propose BRIDGE, a unified psychometric framework that learns the latent difficulty scale from model responses and anchors it to human task completion time. Using a two-parameter logistic Item Response Theory model, we jointly estimate latent task difficulty and model capability from model performance data across multiple benchmarks. We demonstrate that latent task difficulty varies linearly with the logarithm of human completion time, allowing human task completion time to be inferred for new benchmarks from model performance alone. Leveraging this alignment, we forecast frontier model capabilities in terms of human task length and independently reproduce METR's exponential scaling results, with the 50% solvable task horizon doubling approximately every 6 months.
DeLLMphi: A Multi-Turn Method for Multi-Agent Forecasting
Andrew Robert Williams
Victoria Feere
Nasim Rahaman
The Delphi method is a structured forecasting process that engages experts in iterative prediction and reflection. Each round, experts submi… (see more)t forecasts to a mediator, receive an aggregated and synthesized response highlighting key arguments, and update their forecasts based on collective insight. However, Delphi panels are labour intensive, slow and hard to reproduce, requiring diverse knowledgeable participants to engage periodically across weeks or months. To address these constraints, we propose **DeLLMphi**, a forecasting method that replaces human experts and mediators with LLMs. We show (i) that providing example superforecaster reasoning traces and predictions helps to elicit more accurate forecasts from LLM experts, (ii) that the mediator plays the crucial role of surfacing different lines of reasoning and points of disagreement, and (iii) that multiple rounds and experts lead to better forecasts, showing that multi-turn interaction is key to DeLLMphi.
Bringing SAM to new heights: Leveraging elevation data for tree crown segmentation from drone imagery
Information on trees at the individual level is crucial for monitoring forest ecosystems and planning forest management. Current monitoring … (see more)methods involve ground measurements, requiring extensive cost, time and labor. Advances in drone remote sensing and computer vision offer great potential for mapping individual trees from aerial imagery at broad-scale. Large pre-trained vision models, such as the Segment Anything Model (SAM), represent a particularly compelling choice given limited labeled data. In this work, we compare methods leveraging SAM for the task of automatic tree crown instance segmentation in high resolution drone imagery in three use cases: 1) boreal plantations, 2) temperate forests and 3) tropical forests. We also study the integration of elevation data into models, in the form of Digital Surface Model (DSM) information, which can readily be obtained at no additional cost from RGB drone imagery. We present BalSAM, a model leveraging SAM and DSM information, which shows potential over other methods, particularly in the context of plantations. We find that methods using SAM out-of-the-box do not outperform a custom Mask R-CNN, even with well-designed prompts. However, efficiently tuning SAM end-to-end and integrating DSM information are both promising avenues for tree crown instance segmentation models.
Capturing Individual Human Preferences with Reward Features
Andre Barreto
Yiran Mao
Nicolas Perez-Nieves
Mark Rowland
Bobak Shahriari
Reinforcement learning from human feedback usually models preferences using a reward model that does not distinguish between people. We argu… (see more)e that this is unlikely to be a good design choice in contexts with high potential for disagreement, like in the training of large language models. We propose a method to specialise a reward model to a person or group of people. Our approach builds on the observation that individual preferences can be captured as a linear combination of a set of general reward features. We show how to learn such features and subsequently use them to quickly adapt the reward model to a specific individual, even if their preferences are not reflected in the training data. We present experiments with large language models comparing the proposed architecture with a non-adaptive reward model and also adaptive counterparts, including models that do in-context personalisation. Depending on how much disagreement there is in the training data, our model either significantly outperforms the baselines or matches their performance with a simpler architecture and more stable training.
Identifying birdsong syllables without labelled data
Identifying sequences of syllables within birdsongs is key to tackling a wide array of challenges, including bird individual identification … (see more)and better understanding of animal communication and sensory-motor learning. Recently, machine learning approaches have demonstrated great potential to alleviate the need for experts to label long audio recordings by hand. However, they still typically rely on the availability of labelled data for model training, restricting applicability to a few species and datasets. In this work, we build the first fully unsupervised algorithm to decompose birdsong recordings into sequences of syllables. We first detect syllable events, then cluster them to extract templates -- syllable representations -- before performing matching pursuit to decompose the recording as a sequence of syllables. We evaluate our automatic annotations against human labels on a dataset of Bengalese finch songs and find that our unsupervised method achieves high performance. We also demonstrate that our approach can distinguish individual birds within a species through their unique vocal signatures, for both Bengalese finches and another species, the great tit.
CISO: Species Distribution Modeling Conditioned on Incomplete Species Observations
Hager Radi Abdelwahed
Mélisande Teng
Robin Zbinden
Laura Pollock
Devis Tuia
Species distribution models (SDMs) are widely used to predict species' geographic distributions, serving as critical tools for ecological re… (see more)search and conservation planning. Typically, SDMs relate species occurrences to environmental variables representing abiotic factors, such as temperature, precipitation, and soil properties. However, species distributions are also strongly influenced by biotic interactions with other species, which are often overlooked. While some methods partially address this limitation by incorporating biotic interactions, they often assume symmetrical pairwise relationships between species and require consistent co-occurrence data. In practice, species observations are sparse, and the availability of information about the presence or absence of other species varies significantly across locations. To address these challenges, we propose CISO, a deep learning-based method for species distribution modeling Conditioned on Incomplete Species Observations. CISO enables predictions to be conditioned on a flexible number of species observations alongside environmental variables, accommodating the variability and incompleteness of available biotic data. We demonstrate our approach using three datasets representing different species groups: sPlotOpen for plants, SatBird for birds, and a new dataset, SatButterfly, for butterflies. Our results show that including partial biotic information improves predictive performance on spatially separate test sets. When conditioned on a subset of species within the same dataset, CISO outperforms alternative methods in predicting the distribution of the remaining species. Furthermore, we show that combining observations from multiple datasets can improve performance. CISO is a promising ecological tool, capable of incorporating incomplete biotic information and identifying potential interactions between species from disparate taxa.
Towards Sustainable Investment Policies Informed by Opponent Shaping
Addressing climate change requires global coordination, yet rational economic actors often prioritize immediate gains over collective welfar… (see more)e, resulting in social dilemmas. InvestESG is a recently proposed multi-agent simulation that captures the dynamic interplay between investors and companies under climate risk. We provide a formal characterization of the conditions under which InvestESG exhibits an intertemporal social dilemma, deriving theoretical thresholds at which individual incentives diverge from collective welfare. Building on this, we apply Advantage Alignment, a scalable opponent shaping algorithm shown to be effective in general-sum games, to influence agent learning in InvestESG. We offer theoretical insights into why Advantage Alignment systematically favors socially beneficial equilibria by biasing learning dynamics toward cooperative outcomes. Our results demonstrate that strategically shaping the learning processes of economic agents can result in better outcomes that could inform policy mechanisms to better align market incentives with long-term sustainability goals.
The Search for Squawk: Agile Modeling in Bioacoustics
Otilia Stretcu
Jenny Hamer
Lauren Harrell
Rob Laber
Amanda K. Navine
Patrick Hart
Ben Williams
Timothy A. C. Lamont
Tries B. Rasak
Mars Coral Restoration Team
Sheryn Brodie
Brendan Doohan
Philip Eichinski
Paul Roe
Lin Schwarzkopf
Tom Denton
Assessing SAM for Tree Crown Instance Segmentation from Drone Imagery
Don't Flatten, Tokenize! Unlocking the Key to SoftMoE's Efficacy in Deep RL
Ghada Sokar
Johan Obando-Ceron
The use of deep neural networks in reinforcement learning (RL) often suffers from performance degradation as model size increases. While sof… (see more)t mixtures of experts (SoftMoEs) have recently shown promise in mitigating this issue for online RL, the reasons behind their effectiveness remain largely unknown. In this work we provide an in-depth analysis identifying the key factors driving this performance gain. We discover the surprising result that tokenizing the encoder output, rather than the use of multiple experts, is what is behind the efficacy of SoftMoEs. Indeed, we demonstrate that even with an appropriately scaled single expert, we are able to maintain the performance gains, largely thanks to tokenization.
Selective Unlearning via Representation Erasure Using Domain Adversarial Training
Eleni Triantafillou
James J. Clark
Daniel M. Roy