Portrait of Aaron Courville

Aaron Courville

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Computer Vision
Deep Learning
Efficient Communication in General Sum Game
Game Theory
Generative Models
Multi-Agent Systems
Natural Language Processing
Reinforcement Learning
Representation Learning

Biography

Aaron Courville is a professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal and Scientific Director of IVADO. He has a PhD from the Robotics Institute, Carnegie Mellon University.

Courville was an early contributor to deep learning: he is a founding member of Mila – Quebec Artificial Intelligence Institute. Together with Ian Goodfellow and Yoshua Bengio, he co-wrote the seminal textbook on deep learning.

His current research focuses on the development of deep learning models and methods. He is particularly interested in reinforcement learning, multi-agent reinforcement learning, deep generative models and reasoning.

Courville holds a Canada CIFAR AI Chair and a Canada Research Chair in Systematic Generalization. His research has been supported by Microsoft Research, Samsung, Hitachi, Meta, Sony (Research Award) and Google (Focused Research Award).

Current Students

PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Master's Research - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher - Université de Montréal
Master's Research - Université de Montréal
Master's Research - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :

Publications

Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data
Amjad Almahairi
Sai Rajeswar
Philip Bachman
Learning inter-domain mappings from unpaired data can improve performance in structured prediction tasks, such as image segmentation, by red… (see more)ucing the need for paired data. CycleGAN was recently proposed for this problem, but critically assumes the underlying inter-domain mapping is approximately deterministic and one-to-one. This assumption renders the model ineffective for tasks requiring flexible, many-to-many mappings. We propose a new model, called Augmented CycleGAN, which learns many-to-many mappings between domains. We examine Augmented CycleGAN qualitatively and quantitatively on several image datasets.
Mutual Information Neural Estimation
We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent … (see more)over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings.
Neural Autoregressive Flows
David Krueger
Alexandre Lacoste
Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via M… (see more)asked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.
Straight to the Tree: Constituency Parsing with Neural Syntactic Distance
In this work, we propose a novel constituency parsing scheme. The model predicts a vector of real-valued scalars, named syntactic distances,… (see more) for each split position in the input sentence. The syntactic distances specify the order in which the split points will be selected, recursively partitioning the input, in a top-down fashion. Compared to traditional shift-reduce parsing schemes, our approach is free from the potential problem of compounding errors, while being faster and easier to parallelize. Our model achieves competitive performance amongst single model, discriminative parsers in the PTB dataset and outperforms previous models in the CTB dataset.
On the Spectral Bias of Deep Neural Networks
Nasim Rahaman
Felix Draxler
Fred Hamprecht
It is well known that over-parametrized deep neural networks (DNNs) are an overly expressive class of functions that can memorize even rando… (see more)m data with
Manifold Mixup: Better Representations by Interpolating Hidden States
Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly d… (see more)ifferent test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations. Manifold Mixup leverages semantic interpolations as additional training signal, obtaining neural networks with smoother decision boundaries at multiple levels of representation. As a result, neural networks trained with Manifold Mixup learn class-representations with fewer directions of variance. We prove theory on why this flattening happens under ideal conditions, validate it on practical situations, and connect it to previous works on information theory and generalization. In spite of incurring no significant computation and being implemented in a few lines of code, Manifold Mixup improves strong baselines in supervised learning, robustness to single-step adversarial attacks, and test log-likelihood.
MINE: Mutual Information Neural Estimation
Ishmael Belghazi
Sai Rajeswar
R Devon Hjelm
This paper presents a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size. MINE… (see more) is back-propable and we prove that it is strongly consistent. We illustrate a handful of applications in which MINE is succesfully applied to enhance the property of generative models in both unsupervised and supervised settings. We apply our framework to estimate the information bottleneck, and apply it in tasks related to supervised classification problems. Our results demonstrate substantial added flexibility and improvement in these settings.
Learning Visual Reasoning Without Strong Priors
We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence ne… (see more)ural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.
Generating Contradictory, Neutral, and Entailing Sentences
Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP). We want to l… (see more)earn a model that approximates the conditional latent space over the representations of a logical antecedent of the given statement. In our paper, we propose an approach to generating sentences, conditioned on an input sentence and a logical inference label. We do this by modeling the different possibilities for the output sentence as a distribution over the latent representation, which we train using an adversarial objective. We evaluate the model using two state-of-the-art models for the Recognizing Textual Entailment (RTE) task, and measure the BLEU scores against the actual sentences as a probe for the diversity of sentences produced by our model. The experiment results show that, given our framework, we have clear ways to improve the quality and diversity of generated sentences.
Learning Generative Models with Locally Disentangled Latent Factors
Inferring Identity Factors for Grouped Examples
Christopher Pal
We propose a method for modelling groups of face images from the same identity. The model is trained to infer a distribution over the latent… (see more) space for identity given a small set of “training data”. One can then sample images using that latent representation to produce images of the same identity. We demonstrate that the model extracts disentangled factors for identity factors and image-specific vectors. We also perform generative classification over identities to assess its feasibility for few-shot face recognition.
Hierarchical Adversarially Learned Inference
We propose a novel hierarchical generative model with a simple Markovian structure and a corresponding inference model. Both the generative … (see more)and inference model are trained using the adversarial learning paradigm. We demonstrate that the hierarchical structure supports the learning of progressively more abstract representations as well as providing semantically meaningful reconstructions with different levels of fidelity. Furthermore, we show that minimizing the Jensen-Shanon divergence between the generative and inference network is enough to minimize the reconstruction error. The resulting semantically meaningful hierarchical latent structure discovery is exemplified on the CelebA dataset. There, we show that the features learned by our model in an unsupervised way outperform the best handcrafted features. Furthermore, the extracted features remain competitive when compared to several recent deep supervised approaches on an attribute prediction task on CelebA. Finally, we leverage the model's inference network to achieve state-of-the-art performance on a semi-supervised variant of the MNIST digit classification task.