Portrait de Hugo Larochelle

Hugo Larochelle

Membre industriel principal
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Chercheur scientifique
Directeur scientifique, Équipe de direction
Sujets de recherche
Apprentissage profond

Biographie

Hugo Larochelle est un chercheur pionnier en apprentissage profond, leader industriel et philanthrope.

Il a commencé son parcours académique auprès de deux des « Pères fondateurs » de l'intelligence artificielle : Yoshua Bengio, son directeur de thèse à l'Université de Montréal, et Geoffrey Hinton, son superviseur postdoctoral à l'Université de Toronto.

Au fil des ans, ses recherches ont mené à plusieurs découvertes majeures présentes dans les systèmes d'IA modernes. Ses travaux sur les auto-encodeurs débruiteurs (denoising autoencoders) ont identifié la reconstruction de données brutes à partir de versions corrompues comme un paradigme clé pour l'apprentissage de représentations abstraites utiles à partir de grandes quantités de données non étiquetées. Avec des modèles tels que l'estimateur de distribution autorégressif neuronal (neural autoregressive distribution estimator) et l'auto-encodeur masqué pour l'estimation de distribution (masked autoencoder distribution estimator), il a contribué à populariser la modélisation autorégressive avec des réseaux de neurones, un paradigme aujourd'hui omniprésent dans l'IA générative. Ses travaux sur l'apprentissage de nouvelles tâches sans données (Zero-Data Learning of New Tasks) ont introduit pour la première fois le concept aujourd'hui courant d'apprentissage zero-shot.

Il a ensuite transposé son expertise académique à l'industrie en cofondant la startup Whetlab, qui a été rachetée par Twitter en 2015. Après avoir travaillé chez Twitter Cortex, il a été recruté pour diriger le laboratoire de recherche en IA de Google à Montréal (Google Brain), maintenant intégré à Google DeepMind. Il est professeur associé à l'Université de Montréal où il mentore la prochaine génération de chercheuses et chercheurs en IA. Il a également développé une série de cours en ligne gratuits sur l’apprentissage automatique.

Père de quatre enfants, Hugo Larochelle et sa conjointe, Angèle St-Pierre, ont également fait de multiples dons à l'Université de Montréal, à l'Université de Sherbrooke (où il a été professeur) et l’Université Laval pour soutenir les étudiantes et étudiants et faire avancer la recherche, particulièrement dans le domaine de l'IA pour l’environnement. Il a également initié la conférence TechAide, qui mobilise la communauté technologique de Montréal pour amasser des fonds pour Centraide, soutenant ainsi la mission de l'organisme de bienfaisance de lutter contre la pauvreté et l'exclusion sociale.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :

Publications

Modulating early visual processing by language
Harm de Vries
Florian Strub
Jérémie Mary
Olivier Pietquin
It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view do… (voir plus)minates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial.
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Brain tumor segmentation with Deep Neural Networks
Mohammad Havaei
Axel Davy
David Warde-Farley
Antoine Biard
Pierre-Marc Jodoin
Modulating early visual processing by language
Harm de Vries
Florian Strub
Jérémie Mary
Olivier Pietquin
It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view do… (voir plus)minates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial.
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
J'anos Kram'ar
Nicolas Ballas
Nan Rosemary Ke
Anirudh Goyal
We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain thei… (voir plus)r previous values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, improving generalization. But by preserving instead of dropping hidden units, gradient information and state information are more readily propagated through time, as in feedforward stochastic depth networks. We perform an empirical investigation of various RNN regularizers, and find that zoneout gives significant performance improvements across tasks. We achieve competitive results with relatively simple models in character- and word-level language modelling on the Penn Treebank and Text8 datasets, and combining with recurrent batch normalization yields state-of-the-art results on permuted sequential MNIST.
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Audio description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their pee… (voir plus)rs. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. We introduce the Large Scale Movie Description Challenge (LSMDC) which contains a parallel corpus of 128,118 sentences aligned to video clips from 200 movies (around 150 h of video in total). The goal of the challenge is to automatically generate descriptions for the movie clips. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in the challenges organized in the context of two workshops at ICCV 2015 and ECCV 2016.
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Audio description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their pee… (voir plus)rs. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. We introduce the Large Scale Movie Description Challenge (LSMDC) which contains a parallel corpus of 128,118 sentences aligned to video clips from 200 movies (around 150 h of video in total). The goal of the challenge is to automatically generate descriptions for the movie clips. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in the challenges organized in the context of two workshops at ICCV 2015 and ECCV 2016.
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele