Portrait of Aaron Courville

Aaron Courville

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Computer Vision
Deep Learning
Efficient Communication in General Sum Game
Game Theory
Generative Models
Multi-Agent Systems
Natural Language Processing
Reinforcement Learning
Representation Learning

Biography

Aaron Courville is a professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal and Scientific Director of IVADO. He has a PhD from the Robotics Institute, Carnegie Mellon University.

Courville was an early contributor to deep learning: he is a founding member of Mila – Quebec Artificial Intelligence Institute. Together with Ian Goodfellow and Yoshua Bengio, he co-wrote the seminal textbook on deep learning.

His current research focuses on the development of deep learning models and methods. He is particularly interested in reinforcement learning, multi-agent reinforcement learning, deep generative models and reasoning.

Courville holds a Canada CIFAR AI Chair and a Canada Research Chair in Systematic Generalization. His research has been supported by Microsoft Research, Samsung, Hitachi, Meta, Sony (Research Award) and Google (Focused Research Award).

Current Students

PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - University of Waterloo
Master's Research - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Collaborating researcher - N/A
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Collaborating Alumni - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher - Université de Montréal
Master's Research - Université de Montréal
Master's Research - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :

Publications

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
Thibault De Boissière
Lucas Gestin
Wei Zhen Teoh
Jose Sotelo
Ordered Memory
Stack-augmented recurrent neural networks (RNNs) have been of interest to the deep learning community for some time. However, the difficult… (see more)y of training memory models remains a problem obstructing the widespread use of such models. In this paper, we propose the Ordered Memory architecture. Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory. We also introduce a new Gated Recursive Cell to compose lower-level representations into higher-level representation. We demonstrate that our model achieves strong performance on the logical inference task (Bowman et al., 2015) and the ListOps (Nangia and Bowman, 2018) task. We can also interpret the model to retrieve the induced tree structure, and find that these induced structures align with the ground truth. Finally, we evaluate our model on the Stanford Sentiment Treebank tasks (Socher et al., 2013), and find that it performs comparatively with the state-of-the-art methods in the literature.
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger c… (see more)onstituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of constituents. This paper proposes to add such an inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.
No Press Diplomacy: Modeling Multi-Agent Gameplay
Yuchen Lu
Steven Bocco
Max Olan Smith
satya ortiz gagne
Jonathan K. Kummerfeld
Satinder Singh
Probability Distillation: A Caveat and Alternatives
Due to Van den Oord et al. (2018), probability distillation has recently been of interest to deep learning practitioners, where, as a practi… (see more)cal workaround for deploying autoregressive models in real-time applications, a student network is used to obtain quality samples in parallel. We identify a pathological optimization issue with the adopted stochastic minimization of the reverse-KL divergence: the curse of dimensionality results in a skewed gradient distribution that renders training inefficient. This means that KL-based “evaluative” training can be susceptible to poor exploration if the target distribution is highly structured. We then explore alternative principles for distillation, including one with an “instructive” signal, and show that it is possible to achieve qualitatively better results than with KL minimization.
Systematic Generalization: What Is Required and Can It Be Learned?
Towards Jumpy Planning
Model-free reinforcement learning (RL) is a powerful paradigm for learning complex tasks but suffers from high sample inefficiency as well a… (see more)s ignorance of the environment dynamics. On the other hand, a model-based RL agent learns dynamical causal models of the environment and uses them to plan. However, using a model at the scale of time-steps (usually tens of milliseconds) is mostly unfeasible in practice due to compounding prediction errors and computational requirements for making vast numbers of model queries during the planning process. We propose to use a modelbased planner together with a goal-conditioned policy trained with model-free learning. We use a model-based planner that operates at higher levels of abstraction i.e., decision states and use modelfree RL between the decision states. We validate our approach in terms of transfer and generalization performance and show that it leads to improvement over model-based planner that jumps to states that are fixed timesteps ahead.
Towards Jumpy Planning
VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering
Cătălina Cangea
Pietro Lio
Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely o… (see more)n its egocentric input to answer a given question. The desired outcome is that the agent learns to combine capabilities such as scene understanding, navigation and language understanding in order to perform complex reasoning in the visual world. However, initial advancements combining standard vision and language methods with imitation and reinforcement learning algorithms have shown EQA might be too complex and challenging for these techniques. In order to investigate the feasibility of EQA-type tasks, we build the VideoNavQA dataset that contains pairs of questions and videos generated in the House3D environment. The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA task. We investigate several models, adapted from popular VQA methods, on this new benchmark. This establishes an initial understanding of how well VQA-style methods can perform within this novel EQA paradigm.
Planning in Dynamic Environments with Conditional Autoregressive Models
We demonstrate the use of conditional autoregressive generative models (van den Oord et al., 2016a) over a discrete latent space (van den Oo… (see more)rd et al., 2017b) for forward planning with MCTS. In order to test this method, we introduce a new environment featuring varying difficulty levels, along with moving goals and obstacles. The combination of high-quality frame generation and classical planning approaches nearly matches true environment performance for our task, demonstrating the usefulness of this method for model-based planning in dynamic environments.
Harmonic Recomposition using Conditional Autoregressive Modeling
We demonstrate a conditional autoregressive pipeline for efficient music recomposition, based on methods presented in van den Oord et al.(20… (see more)17). Recomposition (Casal & Casey, 2010) focuses on reworking existing musical pieces, adhering to structure at a high level while also re-imagining other aspects of the work. This can involve reuse of pre-existing themes or parts of the original piece, while also requiring the flexibility to generate new content at different levels of granularity. Applying the aforementioned modeling pipeline to recomposition, we show diverse and structured generation conditioned on chord sequence annotations.
Blindfold Baselines for Embodied QA
We explore blindfold (question-only) baselines for Embodied Question Answering. The EmbodiedQA task requires an agent to answer a question b… (see more)y intelligently navigating in a simulated environment, gathering necessary visual information only through first-person vision before finally answering. Consequently, a blindfold baseline which ignores the environment and visual information is a degenerate solution, yet we show through our experiments on the EQAv1 dataset that a simple question-only baseline achieves state-of-the-art results on the EmbodiedQA task in all cases except when the agent is spawned extremely close to the object.