Portrait of Hugo Larochelle

Hugo Larochelle

Core Industry Member
Adjunct professor, Université de Montréal, Depatment of Computer Science and Operations Research
Research Scientist
Research Topics
Deep Learning

Biography

Hugo Larochelle is a pioneering deep learning researcher, industry leader and philanthropist.

He started his academic journey with two of the « Godfathers » of artificial intelligence: Yoshua Bengio, his Ph.D. supervisor at the Université de Montréal, and Geoffrey Hinton, his postdoctoral supervisor at the University of Toronto.

Over the years, his research has contributed several conceptual breakthroughs found in modern AI systems. His work on Denoising Autoencoders (DAE) identified the reconstruction of clean data from corrupted versions as a scalable paradigm for learning meaningful representations from large quantities of unlabeled data. With models such as the Neural Autoregressive Distribution Estimator (NADE) and the Masked Autoencoder Distribution Estimator (MADE), he helped popularize autoregressive modeling with neural networks, a paradigm now omnipresent in generative AI. And his work on Zero-Data Learning of New Tasks introduced for the first time the now common concept of zero-shot learning.

He then brought his academic expertise to the industry by co-founding the startup Whetlab, which was acquired by Twitter in 2015. After a role at Twitter Cortex, he was recruited to lead Google's AI research lab in Montreal (Google Brain), now part of Google DeepMind. He is an Adjunct Professor at the Université de Montréal, mentoring the next generation of AI researchers. He has also developed a series of free online courses on machine learning.

A father of four, Hugo Larochelle and his wife, Angèle St-Pierre, have also made multiple donations to the Université de Montréal, Université de Sherbrooke (where he used to be a Professor) and Université Laval to support students and advance research, particularly in AI for environmental sustainability. He also initiated the TechAide conference, mobilizing Montreal's tech community to raise funds for the charity Centraide to support its mission to fight poverty and social exclusion.

Current Students

PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :

Publications

Modulating early visual processing by language
Harm de Vries
Florian Strub
Jérémie Mary
Olivier Pietquin
It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view do… (see more)minates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial.
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Brain tumor segmentation with Deep Neural Networks
Mohammad Havaei
Axel Davy
David Warde-Farley
Antoine Biard
Pierre-Marc Jodoin
Modulating early visual processing by language
Harm de Vries
Florian Strub
Jérémie Mary
Olivier Pietquin
It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view do… (see more)minates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial.
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
J'anos Kram'ar
Nicolas Ballas
Nan Rosemary Ke
Anirudh Goyal
We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain thei… (see more)r previous values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, improving generalization. But by preserving instead of dropping hidden units, gradient information and state information are more readily propagated through time, as in feedforward stochastic depth networks. We perform an empirical investigation of various RNN regularizers, and find that zoneout gives significant performance improvements across tasks. We achieve competitive results with relatively simple models in character- and word-level language modelling on the Penn Treebank and Text8 datasets, and combining with recurrent batch normalization yields state-of-the-art results on permuted sequential MNIST.
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Audio description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their pee… (see more)rs. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. We introduce the Large Scale Movie Description Challenge (LSMDC) which contains a parallel corpus of 128,118 sentences aligned to video clips from 200 movies (around 150 h of video in total). The goal of the challenge is to automatically generate descriptions for the movie clips. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in the challenges organized in the context of two workshops at ICCV 2015 and ECCV 2016.
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Audio description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their pee… (see more)rs. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. We introduce the Large Scale Movie Description Challenge (LSMDC) which contains a parallel corpus of 128,118 sentences aligned to video clips from 200 movies (around 150 h of video in total). The goal of the challenge is to automatically generate descriptions for the movie clips. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in the challenges organized in the context of two workshops at ICCV 2015 and ECCV 2016.
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele