Portrait of Aaron Courville

Aaron Courville

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Computer Vision
Deep Learning
Efficient Communication in General Sum Game
Game Theory
Generative Models
Multi-Agent Systems
Natural Language Processing
Reinforcement Learning
Representation Learning

Biography

Aaron Courville is a professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal and Scientific Director of IVADO. He has a PhD from the Robotics Institute, Carnegie Mellon University.

Courville was an early contributor to deep learning: he is a founding member of Mila – Quebec Artificial Intelligence Institute. Together with Ian Goodfellow and Yoshua Bengio, he co-wrote the seminal textbook on deep learning.

His current research focuses on the development of deep learning models and methods. He is particularly interested in reinforcement learning, multi-agent reinforcement learning, deep generative models and reasoning.

Courville holds a Canada CIFAR AI Chair and a Canada Research Chair in Systematic Generalization. His research has been supported by Microsoft Research, Samsung, Hitachi, Meta, Sony (Research Award) and Google (Focused Research Award).

Current Students

PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Master's Research - Université de Montréal
Master's Research - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Professional Master's - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Master's Research - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :

Publications

Note on the bias and variance of variational inference
Chin-Wei Huang
In this note, we study the relationship between the variational gap and the variance of the (log) likelihood ratio. We show that the gap can… (see more) be upper bounded by some form of dispersion measure of the likelihood ratio, which suggests the bias of variational inference can be reduced by making the distribution of the likelihood ratio more concentrated, such as via averaging and variance reduction.
Representation Mixing for TTS Synthesis
Kyle Kastner
Joao Felipe Santos
Recent character and phoneme-based parametric TTS systems using deep learning have shown strong performance in natural speech generation. Ho… (see more)wever, the choice between character or phoneme input can create serious limitations for practical deployment, as direct control of pronunciation is crucial in certain cases. We demonstrate a simple method for combining multiple types of linguistic information in a single encoder, named representation mixing, enabling flexible choice between character, phoneme, or mixed representations during inference. Experiments and user studies on a public audiobook corpus show the efficacy of our approach.
Brief Report: Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Yikeng Shen
Shawn Tan
Maximum Entropy Generators for Energy-Based Models
Rithesh Kumar
Anirudh Goyal
Maximum likelihood estimation of energy-based models is a challenging problem due to the intractability of the log-likelihood gradient. In t… (see more)his work, we propose learning both the energy function and an amortized approximate sampling mechanism using a neural generator network, which provides an efficient approximation of the log-likelihood gradient. The resulting objective requires maximizing entropy of the generated samples, which we perform using recently proposed nonparametric mutual information estimators. Finally, to stabilize the resulting adversarial game, we use a zero-centered gradient penalty derived as a necessary condition from the score matching literature. The proposed technique can generate sharp images with Inception and FID scores competitive with recent GAN techniques, does not suffer from mode collapse, and is competitive with state-of-the-art anomaly detection techniques.
Hierarchical Importance Weighted Autoencoders
Chin-Wei Huang
Kris Sankaran
Eeshan Dhekane
Alexandre Lacoste
Importance weighted variational inference (Burda et al., 2015) uses multiple i.i.d. samples to have a tighter variational lower bound. We be… (see more)lieve a joint proposal has the potential of reducing the number of redundant samples, and introduce a hierarchical structure to induce correlation. The hope is that the proposals would coordinate to make up for the error made by one another to reduce the variance of the importance estimator. Theoretically, we analyze the condition under which convergence of the estimator variance can be connected to convergence of the lower bound. Empirically, we confirm that maximization of the lower bound does implicitly minimize variance. Further analysis shows that this is a result of negative correlation induced by the proposed hierarchical meta sampling scheme, and performance of inference also improves when the number of samples increases.
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
Kundan Kumar
Rithesh Kumar
Thibault De Boissière
Lucas Gestin
Wei Zhen Teoh
Jose Sotelo
Alexandre De Brébisson
Ordered Memory
Yikang Shen
Shawn Tan
Arian Hosseini
Zhouhan Lin
Stack-augmented recurrent neural networks (RNNs) have been of interest to the deep learning community for some time. However, the difficult… (see more)y of training memory models remains a problem obstructing the widespread use of such models. In this paper, we propose the Ordered Memory architecture. Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory. We also introduce a new Gated Recursive Cell to compose lower-level representations into higher-level representation. We demonstrate that our model achieves strong performance on the logical inference task (Bowman et al., 2015) and the ListOps (Nangia and Bowman, 2018) task. We can also interpret the model to retrieve the induced tree structure, and find that these induced structures align with the ground truth. Finally, we evaluate our model on the Stanford Sentiment Treebank tasks (Socher et al., 2013), and find that it performs comparatively with the state-of-the-art methods in the literature.
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Yikang Shen
Shawn Tan
Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger c… (see more)onstituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of constituents. This paper proposes to add such an inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.
No Press Diplomacy: Modeling Multi-Agent Gameplay
Philip Paquette
Yuchen Lu
Steven Bocco
Max Olan Smith
satya ortiz gagne
Jonathan K. Kummerfeld
Satinder Singh
Probability Distillation: A Caveat and Alternatives
Chin-Wei Huang
Faruk Ahmed
Kundan Kumar
Alexandre Lacoste
Due to Van den Oord et al. (2018), probability distillation has recently been of interest to deep learning practitioners, where, as a practi… (see more)cal workaround for deploying autoregressive models in real-time applications, a student network is used to obtain quality samples in parallel. We identify a pathological optimization issue with the adopted stochastic minimization of the reverse-KL divergence: the curse of dimensionality results in a skewed gradient distribution that renders training inefficient. This means that KL-based “evaluative” training can be susceptible to poor exploration if the target distribution is highly structured. We then explore alternative principles for distillation, including one with an “instructive” signal, and show that it is possible to achieve qualitatively better results than with KL minimization.
Systematic Generalization: What Is Required and Can It Be Learned?
Dzmitry Bahdanau*
Shikhar Murty*
Michael Noukhovitch
Thien Huu Nguyen
Harm de Vries
Towards Jumpy Planning
Akilesh
Suriya Singh
Anirudh Goyal
Alexander Neitz
Model-free reinforcement learning (RL) is a powerful paradigm for learning complex tasks but suffers from high sample inefficiency as well a… (see more)s ignorance of the environment dynamics. On the other hand, a model-based RL agent learns dynamical causal models of the environment and uses them to plan. However, using a model at the scale of time-steps (usually tens of milliseconds) is mostly unfeasible in practice due to compounding prediction errors and computational requirements for making vast numbers of model queries during the planning process. We propose to use a modelbased planner together with a goal-conditioned policy trained with model-free learning. We use a model-based planner that operates at higher levels of abstraction i.e., decision states and use modelfree RL between the decision states. We validate our approach in terms of transfer and generalization performance and show that it leads to improvement over model-based planner that jumps to states that are fixed timesteps ahead.