Portrait de Aaron Courville

Aaron Courville

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage de représentations
Apprentissage par renforcement
Apprentissage profond
Communication efficace dans un jeu de somme générale
Modèles génératifs
Systèmes multi-agents
Théorie des jeux
Traitement du langage naturel
Vision par ordinateur

Biographie

Aaron Courville est professeur au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal et Directeur scientifique à IVADO. Il a obtenu son doctorat au Robotics Institute de l'Université Carnegie Mellon.

Il est l'un des premiers contributeurs à l'apprentissage profond, membre fondateur de Mila – Institut québécois d’intelligence artificielle. Avec Ian Goodfellow et Yoshua Bengio, il a coécrit le manuel de référence sur l'apprentissage profond.

Ses recherches actuelles portent sur le développement de modèles et de méthodes d'apprentissage profond. Il s'intéresse particulièrement à l'apprentissage par renforcement, à l'apprentissage par renforcement multi-agents, aux modèles génératifs profonds et au raisonnement.

Aaron Courville est titulaire d'une chaire en IA Canada-CIFAR et d'une Chaire de recherche du Canada (CRC) en généralisation systématique. Ses recherches ont été soutenues en partie par Microsoft Research, Samsung, Hitachi, Meta, Sony (bourse de recherche) et Google (bourse de recherche ciblée).

Étudiants actuels

Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - Université de Montréal
Maîtrise recherche - UdeM
Maîtrise professionnelle - UdeM
Doctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :

Publications

PixelVAE: A Latent Variable Model for Natural Images
Ishaan Gulrajani
Kundan Kumar
Faruk Ahmed
Adrien Ali Taiga
Francesco Visin
David Vazquez
Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representatio… (voir plus)n and model global structure well but have difficulty capturing small details. PixelCNN models details very well, but lacks a latent code and is difficult to scale for capturing large structures. We present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. Our model requires very few expensive autoregressive layers compared to PixelCNN and learns latent codes that are more compressed than a standard VAE while still capturing most non-trivial structure. Finally, we extend our model to a hierarchy of latent variables at different scales. Our model achieves state-of-the-art performance on binarized MNIST, competitive performance on 64 × 64 ImageNet, and high-quality samples on the LSUN bedrooms dataset.
Recurrent Batch Normalization
Tim Cooijmans
Nicolas Ballas
César Laurent
Caglar Gulcehre
We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works… (voir plus) only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate our proposal on various sequential problems such as sequence classification, language modeling and question answering. Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
Soroush Mehri
Kundan Kumar
Ishaan Gulrajani
Rithesh Kumar
Shubham Jain
Jose Sotelo
In this paper we propose a novel model for unconditional audio generation task that generates one audio sample at a time. We show that our m… (voir plus)odel which profits from combining memory-less modules, namely autoregressive multilayer perceptron, and stateful recurrent neural networks in a hierarchical structure is de facto powerful to capture the underlying sources of variations in temporal domain for very long time on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.
Sequentialized Sampling Importance Resampling and Scalable IWAE
Chin-Wei Huang
We propose a new sequential algorithm for Sampling Importance Resampling. The algorithm serves as a solution to expensive evaluation of impo… (voir plus)rtance weight, and can be interpreted as stochastically and iteratively refining the particles by correcting them towards the target distribution as pool size increases. We apply this algorithm to variational inference with Importance Weighted Lower Bound and propose a memory-scalable training procedure 1 that implicitly improves the variational proposal. 1 Sequentializing Sampling Importance Resampling 1.1 Sampling Importance Resampling Given an unnormalized target distribution p̃(x) and proposal distribution q(x), the Sampling Importance Resampling (SIR) proceeds as follows: 1. draw xi for 1 ≤ i ≤ n from q(x) 2. calculate the importance weight wi = p̃(xi) q(xi) 3. calculate the normalized importance weight w̄i = wi ∑ i wi 4. draw index variable yj ∼ mul(w̄1, ..., w̄n) for 1 ≤ j ≤ m The density of the set of resampled particles xy1 , ..., xym should resemble the pdf of the target distribution, and the new samples will be approximately distributed by p(x) (Bishop, 2007). On average, the samples can be improved by increasing the pool size n, and becomes corrected when n→∞. The procedure is visualized in Figure 1a. 1.2 SeqSIR The above procedure can be combined with the idea of reservoir sampling, so that we need not evaluate all n samples at the same time, which will be an issue when n is large or when evaluation of a sample (i.e. computation of wi) is expensive. The intuition is to keep a running sum of the importance weights while we evaluate the pool samples sequentially, and then decide to keep the old sample or replace it with the new one based on the ratio of the new sample’s importance weight to the running sum. This is what we call Sequentialized Sampling Importance Resampling (SEQSIR), which is summarized in Algorithm 1. See Figure 1b for illustration. Note that density and importance weight are computed on log scale to deal with numerical instability, and log-sum-exp operation (LSE) is used in place of addition to calculate the running sum of See https://github.com/CW-Huang/SeqIWAE for implementation. Second workshop on Bayesian Deep Learning (NIPS 2017), Long Beach, CA, USA. Algorithm 1 Sequentialized Sampling Importance Resampling and Stochastic Iterative Refinement procedure SEQSIR ( logp, logq . unnormalized target density function and proposal density function ss . n samples to be evaluated ) A←−∞ . initialize accumulated sum of importance weight on log scale s_old← 0 . initialize sample n← len([s1,...,sn]) for i=1,...,n do s_new = ss[i] A, s_old← STOCHREFINE(logp, logq, A, s_old, s_new) return s_old procedure STOCHREFINE ( logp, logq . unnormalized target density function and proposal density function A . accumulated sum of importance weight on log scale s_old, s_new . old and new samples ) w_new← logp(s_new) logq(s_new) A← LSE(A, w_new) u← unif(0,1) if w_new A >= log u then return A, s_new else return A, s_old
Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
Ying Zhang
Mohammad Pezeshki
Philemon Brakel
Saizheng Zhang
César Laurent
Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic fe… (voir plus)atures for automatic speech recognition (ASR). Hybrid speech recognition systems incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models (HMMs/GMMs) have achieved the state-of-the-art in various benchmarks. Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an end-to-end speech recognition system instead of hybrid settings. However, RNNs are computationally expensive and sometimes difficult to train. In this paper, inspired by the advantages of both CNNs and the CTC approach, we propose an end-to-end speech framework for sequence labeling, by combining hierarchical CNNs with CTC directly without recurrent connections. By evaluating the approach on the TIMIT phoneme recognition task, we show that the proposed model is not only computationally efficient, but also competitive with the existing baseline systems. Moreover, we argue that CNNs have the capability to model temporal correlations with appropriate context information.
Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus
Iulian V. Serban
Alberto García-Durán
Caglar Gulcehre
Sungjin Ahn
Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances. Howeve… (voir plus)r, to this date, there are no large-scale question-answer corpora available. In this paper we present the 30M Factoid Question-Answer Corpus, an enormous question answer pair corpus produced by applying a novel neural network architecture on the knowledge base Freebase to transduce facts into natural language questions. The produced question answer pairs are evaluated both by human evaluators and using automatic evaluation metrics, including well-established machine translation and sentence similarity metrics. Across all evaluation criteria the question-generation model outperforms the competing template-based baseline. Furthermore, when presented to human evaluators, the generated questions appear comparable in quality to real human-generated questions.
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
Bernt Schiele
Theano: A Python framework for fast computation of mathematical expressions
Rami Al-rfou'
Guillaume Alain
Amjad Almahairi
Christof Angermüller
Nicolas Ballas
Frédéric Bastien
Justin S. Bayer
A. Belikov
A. Belopolsky
Arnaud Bergeron
J. Bergstra
Valentin Bisson
Josh Bleecher Snyder
Nicolas Bouchard
Nicolas Boulanger-Lewandowski
Xavier Bouthillier
Alexandre De Brébisson
Olivier Breuleux … (voir 92 de plus)
pierre luc carrier
Kyunghyun Cho
Jan Chorowski
Paul F. Christiano
Tim Cooijmans
Marc-Alexandre Côté
Myriam Côté
Yann Dauphin
Olivier Delalleau
Julien Demouth
Guillaume Desjardins
Sander Dieleman
Laurent Dinh
M'elanie Ducoffe
Vincent Dumoulin
Dumitru Erhan
Ziye Fan
Orhan Firat
Mathieu Germain
Xavier Glorot
Ian J. Goodfellow
Matthew Graham
Caglar Gulcehre
Philippe Hamel
Iban Harlouchet
Jean-philippe Heng
Balázs Hidasi
Sina Honari
Arjun Jain
S'ebastien Jean
Kai Jia
Mikhail V. Korobov
Vivek Kulkarni
Alex Lamb
Pascal Lamblin
Eric P. Larsen
César Laurent
S. Lee
Simon-mark Lefrancois
Simon Lemieux
Nicholas Léonard
Zhouhan Lin
J. Livezey
Cory R. Lorenz
Jeremiah L. Lowin
Qianli M. Ma
Pierre-Antoine Manzagol
Olivier Mastropietro
R. McGibbon
Roland Memisevic
Bart van Merriënboer
Vincent Michalski
Mehdi Mirza
Alberto Orlandi
Mohammad Pezeshki
Colin Raffel
Daniel Renshaw
Matthew David Rocklin
Markus Dr. Roth
Peter Sadowski
John Salvatier
François Savard
Jan Schlüter
John D. Schulman
Gabriel Schwartz
Iulian V. Serban
Dmitriy Serdyuk
Samira Shabanian
Etienne Simon
Sigurd Spieckermann
S. Subramanyam
Jakub Sygnowski
Jérémie Tanguay
Gijs van Tulder
Joseph P. Turian
Sebastian Urban
Francesco Visin
Harm de Vries
David Warde-Farley
Dustin J. Webb
M. Willson
Kelvin Xu
Lijun Xue
Li Yao
Saizheng Zhang
Ying Zhang
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (voir plus)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
Theano: A Python framework for fast computation of mathematical expressions
Rami Al-rfou'
Guillaume Alain
Amjad Almahairi
Christof Angermüller
Nicolas Ballas
Frédéric Bastien
Justin S. Bayer
A. Belikov
A. Belopolsky
Arnaud Bergeron
James Bergstra
Valentin Bisson
Josh Bleecher Snyder
Nicolas Bouchard
Nicolas Boulanger-Lewandowski
Xavier Bouthillier
Alexandre De Brébisson
Olivier Breuleux … (voir 92 de plus)
pierre luc carrier
Kyunghyun Cho
Jan Chorowski
Paul F. Christiano
Tim Cooijmans
Marc-Alexandre Côté
Myriam Côté
Yann Dauphin
Olivier Delalleau
Julien Demouth
Guillaume Desjardins
Sander Dieleman
Laurent Dinh
M'elanie Ducoffe
Vincent Dumoulin
Dumitru Erhan
Ziye Fan
Orhan Firat
Mathieu Germain
Xavier Glorot
Ian G Goodfellow
Matthew Graham
Caglar Gulcehre
Philippe Hamel
Iban Harlouchet
Jean-philippe Heng
Balázs Hidasi
Sina Honari
Arjun Jain
Sébastien Jean
Kai Jia
Mikhail V. Korobov
Vivek Kulkarni
Alex Lamb
Pascal Lamblin
Eric Larsen
César Laurent
S. Lee
Simon-mark Lefrancois
Simon Lemieux
Nicholas Léonard
Zhouhan Lin
J. Livezey
Cory R. Lorenz
Jeremiah L. Lowin
Qianli M. Ma
Pierre-Antoine Manzagol
Olivier Mastropietro
R. McGibbon
Roland Memisevic
Bart van Merriënboer
Vincent Michalski
Mehdi Mirza
Alberto Orlandi
Mohammad Pezeshki
Colin Raffel
Daniel Renshaw
Matthew David Rocklin
Markus Dr. Roth
Peter Sadowski
John Salvatier
François Savard
Jan Schlüter
John D. Schulman
Gabriel Schwartz
Iulian V. Serban
Dmitriy Serdyuk
Samira Shabanian
Etienne Simon
Sigurd Spieckermann
S. Subramanyam
Jakub Sygnowski
Jérémie Tanguay
Gijs van Tulder
Joseph Turian
Sebastian Urban
Francesco Visin
Harm de Vries
David Warde-Farley
Dustin J. Webb
M. Willson
Kelvin Xu
Lijun Xue
Li Yao
Saizheng Zhang
Ying Zhang
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (voir plus)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
Task Loss Estimation for Structured Prediction
D. Serdyuk
Philemon Brakel
Nan Rosemary Ke
Jan Chorowski
Former NASA chief unveils $ 100 million neural chip maker KnuEdge
C. Strasser
Dean Takahashi
Tim Klinger
Gerald Tesauro
Kartik Talamadupula
Bowen Zhou
Medium, Moore Data, Carly Strasser from June 07, 2016 Open access to research articles has been in the news quite a bit lately (see the SciH… (voir plus)ub controversy, the preprints in biology discussion, and the European Union’s recent announcement). The Data-Driven Discovery team at the Moore Foundation has also been discussing open access, particularly as it relates to the publications generated by our #MooreData researchers. Our grantee population is fairly progressive when it comes to open science, and many of the outputs that they generate are already publicly available (including proposals, software, workflows, and publications). It is therefore easy for us to imagine that they would embrace a policy that mandates open access for research articles that they produce. That said, we are always open to discussions!