Portrait of Aaron Courville

Aaron Courville

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Computer Vision
Deep Learning
Efficient Communication in General Sum Game
Game Theory
Generative Models
Multi-Agent Systems
Natural Language Processing
Reinforcement Learning
Representation Learning

Biography

Aaron Courville is a professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal and Scientific Director of IVADO. He has a PhD from the Robotics Institute, Carnegie Mellon University.

Courville was an early contributor to deep learning: he is a founding member of Mila – Quebec Artificial Intelligence Institute. Together with Ian Goodfellow and Yoshua Bengio, he co-wrote the seminal textbook on deep learning.

His current research focuses on the development of deep learning models and methods. He is particularly interested in reinforcement learning, multi-agent reinforcement learning, deep generative models and reasoning.

Courville holds a Canada CIFAR AI Chair and a Canada Research Chair in Systematic Generalization. His research has been supported by Microsoft Research, Samsung, Hitachi, Meta, Sony (Research Award) and Google (Focused Research Award).

Current Students

PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Master's Research - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher - Université de Montréal
Master's Research - Université de Montréal
Master's Research - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal

Publications

R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS
Building Robust Ensembles via Margin Boosting
Hongyang R. Zhang
Pradeep Ravikumar
Arun Sai Suggala
In the context of adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attac… (see more)ks, and as a result, has sub-optimal robustness. Consequently, an emerging line of work has focused on learning an ensemble of neural networks to defend against adversarial attacks. In this work, we take a principled approach towards building robust ensembles. We view this problem from the perspective of margin-boosting and develop an algorithm for learning an ensemble with maximum margin. Through extensive empirical evaluation on benchmark datasets, we show that our algorithm not only outperforms existing ensembling techniques, but also large models trained in an end-to-end fashion. An important byproduct of our work is a margin-maximizing cross-entropy (MCE) loss, which is a better alternative to the standard cross-entropy (CE) loss. Empirically, we show that replacing the CE loss in state-of-the-art adversarial training techniques with our MCE loss leads to significant performance improvement.
Generative Flow Networks for Discrete Probabilistic Modeling
We present energy-based generative flow networks (EB-GFN), a novel probabilistic modeling algorithm for high-dimensional discrete data. Buil… (see more)ding upon the theory of generative flow networks (GFlowNets), we model the generation process by a stochastic data construction policy and thus amortize expensive MCMC exploration into a fixed number of actions sampled from a GFlowNet. We show how GFlowNets can approximately perform large-block Gibbs sampling to mix between modes. We propose a framework to jointly train a GFlowNet with an energy function, so that the GFlowNet learns to sample from the energy distribution, while the energy learns with an approximate MLE objective with negative samples from the GFlowNet. We demonstrate EB-GFN's effectiveness on various probabilistic modeling tasks. Code is publicly available at https://github.com/zdhNarsil/EB_GFN.
The Primacy Bias in Deep Reinforcement Learning
This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore usefu… (see more)l evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance.
VIM: Variational Independent Modules for Video Prediction
Multi-label Iterated Learning for Image Classification with Label Ambiguity
Sai Rajeswar
Pau Rodríguez
Transfer learning from large-scale pre-trained models has become essential for many computer vision tasks. Recent studies have shown that da… (see more)tasets like ImageNet are weakly labeled since images with multiple object classes present are assigned a single label. This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data. Inspired by language emergence literature, we propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels using the framework of iterated learning. MILe is a simple yet effective procedure that builds a multi-label description of the image by propagating binary predictions through successive generations of teacher and student networks with a learning bottleneck. Experiments show that our approach exhibits systematic benefits on ImageNet accuracy as well as ReaL F1 score, which indicates that MILe deals better with label ambiguity than the standard training procedure, even when fine-tuning from self-supervised weights. We also show that MILe is effective reducing label noise, achieving state-of-the-art performance on real-world large-scale noisy data such as WebVision. Furthermore, MILe improves performance in class incremental settings such as IIRC and it is robust to distribution shifts. Code: https://github.com/rajeswar18/MILe
Unsupervised Model-based Pre-training for Data-efficient Reinforcement Learning from Pixels
Sai Rajeswar
Tim Verbelen
Bart Dhoedt
Alexandre Lacoste
Reinforcement learning (RL) aims at autonomously performing complex tasks. To this end, a reward signal is used to steer the learning proces… (see more)s. While successful in many circumstances, the approach is typically data hungry, requiring large amounts of task-specific interaction between agent and environment to learn efficient behaviors. To alleviate this, unsupervised RL proposes to collect data through self-supervised interaction to accelerate task-specific adaptation. However, whether current unsupervised strategies lead to improved generalization capabilities is still unclear, more so when the input observations are high-dimensional. In this work, we advance the field by closing the performance gap in the Unsupervised RL Benchmark, a collection of tasks to be solved in a data-efficient manner, after interacting with the environment in a self-supervised way. Our approach uses unsupervised exploration for collecting experience to pre-train a world model. Then, when fine-tuning for downstream tasks, the agent leverages the learned model and a hybrid planner to efficiently adapt for the given tasks, achieving comparable results to task-specific base-lines, while using 20x less data. We extensively evaluate our work, comparing several exploration methods and improving the fine-tuning process by studying the interactions between the learned components. Furthermore, we investigate the limitations of the pre-trained agent, gaining insights into how these influence the decision process and shedding light on new research directions.
Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning
Unsupervised Dependency Graph Network
Chunked Autoregressive GAN for Conditional Waveform Synthesis
Conditional waveform synthesis models learn a distribution of audio waveforms given conditioning such as text, mel-spectrograms, or MIDI. Th… (see more)ese systems employ deep generative models that model the waveform via either sequential (autoregressive) or parallel (non-autoregressive) sampling. Generative adversarial networks (GANs) have become a common choice for non-autoregressive waveform synthesis. However, state-of-the-art GAN-based models produce artifacts when performing mel-spectrogram inversion. In this paper, we demonstrate that these artifacts correspond with an inability for the generator to learn accurate pitch and periodicity. We show that simple pitch and periodicity conditioning is insufficient for reducing this error relative to using autoregression. We discuss the inductive bias that autoregression provides for learning the relationship between instantaneous frequency and phase, and show that this inductive bias holds even when autoregressively sampling large chunks of the waveform during each forward pass. Relative to prior state-of-the-art GAN-based models, our proposed model, Chunked Autoregressive GAN (CARGAN) reduces pitch error by 40-60%, reduces training time by 58%, maintains a fast generation speed suitable for real-time or interactive applications, and maintains or improves subjective quality.
I NTRODUCING C OORDINATION IN C ONCURRENT R EIN - FORCEMENT L EARNING
Bellemare Marc-Emmanuel
Google Brain
Research on exploration in reinforcement learning has mostly focused on problems with a single agent interacting with an environment. Howeve… (see more)r many problems are better addressed by the concurrent reinforcement learning paradigm, where multiple agents operate in a common environment. Recent work has tackled the challenge of exploration in this particular setting (Dimakopoulou & Van Roy, 2018; Dimakopoulou et al., 2018). Nonetheless, they do not completely leverage the characteristics of this framework and agents end up behaving independently from each other. In this work we argue that coordination among concurrent agents is crucial for efficient exploration. We introduce coordination in Thompson Sampling based methods by drawing correlated samples from an agent’s posterior. We apply this idea to extend existing exploration schemes such as randomized least squares value iteration (RLSVI). Empirical results on simple toy tasks emphasize the merits of our approach and call attention to coordination as a key objective for efficient exploration in concurrent reinforcement learning.
INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision
We propose INFERNO, a method to infer object-centric representations of visual scenes without annotations. Our method decomposes a scene int… (see more)o multiple objects, with each object having a structured representation that disentangles its shape, appearance and pose. Each object representation defines a localized neural radiance field used to generate 2D views of the scene through differentiable rendering. Our model is subsequently trained by minimizing a reconstruction loss between inputs and corresponding rendered scenes. We empirically show that INFERNO discovers objects in a scene without supervision. We also validate the interpretability of the learned representations by manipulating inferred scenes and showing the corresponding effect in the rendered output. Finally, we demonstrate the usefulness of our 3D object representations in a visual reasoning task using the CATER dataset.