Hugo Larochelle

hugo.larochelle@mila.quebec

Scientific Director, Leadership Team

Adjunct professor, Université de Montréal, Department of Computer Science and Operations Research

Adjunct professor, McGill University, School of Computer Science

Research Topics

Deep Learning

Google Scholar

Biography

Hugo Larochelle is the Scientific Director of Mila, the world’s largest academic AI and deep learning research center. With a community of over 1,500 researchers, Mila has established itself as a pillar of the Canadian AI ecosystem with a reach that extends far beyond national borders.

As a pioneering researcher and industry leader, he has a unique perspective on both large-scale corporate research laboratories and Canada’s world-class academic AI community He built his academic foundation alongside two "Godfathers" of artificial intelligence: Yoshua Bengio and Geoffrey Hinton.

Over the years, his research has contributed several conceptual breakthroughs found in modern AI systems. His work on Denoising Autoencoders (DAE) identified the reconstruction of clean data from corrupted versions as a scalable paradigm for learning meaningful representations from large quantities of unlabeled data. Through models such as the Neural Autoregressive Distribution Estimator (NADE) and the Masked Autoencoder for Distribution Estimation (MADE), he helped popularize the neural autoregressive modeling paradigm now omnipresent in generative AI. Furthermore, his work on Zero-Data Learning of New Tasks introduced the now-standard concept of zero-shot learning.

He successfully bridged the gap between academia and industry by co-founding the startup Whetlab, which was acquired by Twitter in 2015. After a role at Twitter Cortex, he was recruited to lead Google's AI research lab in Montreal (Google Brain), now integrated into Google DeepMind. He remains an Adjunct Professor at the Université de Montréal and McGill University, and is a Canada CIFAR AI Chair, mentoring the next generation of AI researchers.

Alongside his role as Scientific Director at Mila, he also serves as Scientific Lead at Adaption Labs and advises the startups Tiptree Systems and Prizmal.

A father of four, Hugo Larochelle and his wife, Angèle St-Pierre, have also made multiple donations to the Université de Montréal and Université de Sherbrooke, particularly in AI for environmental sustainability. He also founded the Techaide conference, mobilizing Montreal's tech community to raise funds for the charity Centraide in its mission to fight poverty and social exclusion.

Current Students

Sangnie Bhardwaj

PhD - Université de Montréal

Principal supervisor :

Guillaume Lajoie

Jiang Evan (Duoduo)

Professional Master's - McGill University

Github

Mélisande Astrid Crystal Teng

Collaborating Alumni - Université de Montréal

Principal supervisor :

Postdoctorate - Polytechnique Montréal

Principal supervisor :

Publications

Recall Traces: Backtracking Models for Efficient Reinforcement Learning

Anirudh Goyal

Philemon Brakel

William Fedus

Soumye Singhal

Timothy Lillicrap

Sergey Levine

Hugo Larochelle

Yoshua Bengio

In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provid… (see more)e a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate at a given high-reward state. We can train a model which, starting from a high value state (or one that is estimated to have high value), predicts and sample for which the (state, action)-tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a variational interpretation for this idea and a practical algorithm in which the backtracking model samples from an approximate posterior distribution over trajectories which lead to large rewards. Our method improves the sample efficiency of both on- and off-policy RL algorithms across several environments and tasks.

2018-12-31

International Conference on Learning Representations (poster)

doi.org

openreview.net

Blindfold Baselines for Embodied QA

We explore blindfold (question-only) baselines for Embodied Question Answering. The EmbodiedQA task requires an agent to answer a question b… (see more)y intelligently navigating in a simulated environment, gathering necessary visual information only through first-person vision before finally answering. Consequently, a blindfold baseline which ignores the environment and visual information is a degenerate solution, yet we show through our experiments on the EQAv1 dataset that a simple question-only baseline achieves state-of-the-art results on the EmbodiedQA task in all cases except when the agent is spawned extremely close to the object.

2018-11-11

ArXiv (preprint)

doi.org

arxiv.org

Disentangling the independently controllable factors of variation by interacting with the world

Valentin Thomas

Philippe Beaudoin

William Fedus

It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it rema… (see more)ins an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors, and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal.

2017-12-31

arXiv (preprint)

doi.org

arxiv.org

Home: A Household Multimodal Environment

Simon Brodeur

Luca Celotti

Jean Rouat

We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction… (see more) with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more. We hope HoME better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting.

2017-12-31

International Conference on Learning Representations (published)

doi.org

openreview.net

GuessWhat?! Visual Object Discovery through Multi-modal Dialogue

Harm de Vries

Florian Strub

A. Chandar

Olivier Pietquin

Hugo Larochelle

Aaron Courville

We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The… (see more) goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial baselines of the introduced tasks.

2017-07-20

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (published)

doi.org

arxiv.org

Multiscale sequence modeling with a learned dictionary

We propose a generalization of neural network sequence models. Instead of predicting one symbol at a time, our multi-scale model makes predi… (see more)ctions over multiple, potentially overlapping multi-symbol tokens. A variation of the byte-pair encoding (BPE) compression algorithm is used to learn the dictionary of tokens that the model is trained with. When applied to language modelling, our model has the flexibility of character-level models while maintaining many of the performance benefits of word-level models. Our experiments show that this model performs better than a regular LSTM on language modeling tasks, especially for smaller models.

2017-07-02

ArXiv (preprint)

arxiv.org

Modulating early visual processing by language

Jérémie Mary

Olivier Pietquin

It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view do… (see more)minates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial.

2016-12-31

Advances in Neural Information Processing Systems 30 (NIPS 2017) (published)

arxiv.org

Dynamic Capacity Networks

Amjad Almahairi

Yin Zheng

We introduce the Dynamic Capacity Network (DCN), a neural network that can adaptively assign its capacity across different portions of the i… (see more)nput data. This is achieved by combining modules of two types: low-capacity sub-networks and high-capacity sub-networks. The low-capacity sub-networks are applied across most of the input, but also provide a guide to select a few portions of the input on which to apply the high-capacity sub-networks. The selection is made using a novel gradient-based attention mechanism, that efficiently identifies input regions for which the DCN's output is most sensitive and to which we should devote more capacity. We focus our empirical evaluation on the Cluttered MNIST and SVHN image datasets. Our findings indicate that DCNs are able to drastically reduce the number of computations, compared to traditional convolutional neural networks, while maintaining similar or even better performance.

2016-06-10

Proceedings of The 33rd International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Brain Tumor Segmentation with Deep Neural Networks

Pierre-Marc Jodoin

2016-05-20

Medical Image Analysis (unknown)

doi.org

arxiv.org

Movie Description

Anna Rohrbach

Atousa Torabi

Marcus Rohrbach

Niket Tandon

Christopher Pal

Hugo Larochelle

Aaron Courville

Bernt Schiele

Audio description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their pee… (see more)rs. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. We introduce the Large Scale Movie Description Challenge (LSMDC) which contains a parallel corpus of 128,118 sentences aligned to video clips from 200 movies (around 150 h of video in total). The goal of the challenge is to automatically generate descriptions for the movie clips. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in the challenges organized in the context of two workshops at ICCV 2015 and ECCV 2016.

2016-05-11

ArXiv (preprint)

doi.org

arxiv.org

Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

David Krueger

Nan Rosemary Ke

Christopher Pal

We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain thei… (see more)r previous values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, improving generalization. But by preserving instead of dropping hidden units, gradient information and state information are more readily propagated through time, as in feedforward stochastic depth networks. We perform an empirical investigation of various RNN regularizers, and find that zoneout gives significant performance improvements across tasks. We achieve competitive results with relatively simple models in character- and word-level language modelling on the Penn Treebank and Text8 datasets, and combining with recurrent batch normalization yields state-of-the-art results on permuted sequential MNIST.

2015-12-31

arXiv (preprint)

doi.org

arxiv.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Hugo Larochelle

Biography

Current Students

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Hugo Larochelle

Biography

Current Students

Publications