Portrait of Aaron Courville

Aaron Courville

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, Université de Montréal, Department of Computer Science and Operations Research

Biography

Aaron Courville is a professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal. He has a PhD from the Robotics Institute, Carnegie Mellon University.

Courville was an early contributor to deep learning: he is a founding member of Mila – Quebec Artificial Intelligence Institute, a fellow in CIFAR’s Learning in Machines & Brains program and, with Ian Goodfellow and Yoshua Bengio, co-wrote the seminal textbook on deep learning.

His current research focuses on the development of deep learning models and methods. He is particularly interested in reinforcement learning, deep generative models and multimodal ML, as well as their applications, such as computer vision and natural language processing.

Courville holds a Canada CIFAR AI Chair and a Canada Research Chair in Learning Representations that Generalize Systematically. His research has been supported by Microsoft Research, Samsung, Hitachi, Sony and Google (Focused Research Award).

Current Students

PhD - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
PhD - Université de Montréal
Undergraduate - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher
Collaborating researcher - Université de Montréal
Research Intern - Université de Montréal
PhD - Université de Montréal
Master's Research - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :

Publications

Invariant representation driven neural classifier for anti-QCD jet tagging
Taoli Cheng
VIM: Variational Independent Modules for Video Prediction
Rim Assouel
Lluis Castrejon
Nicolas Ballas
We introduce a variational inference model called VIM, for Variational Independent Modules, for sequential data that learns and infers laten… (see more)t representations as a set of objects and discovers modular causal mechanisms over these objects. These mechanisms - which we call modules - are independently parametrized, define the stochastic transitions of entities and are shared across entities. At each time step, our model infers from a low-level input sequence a high-level sequence of categorical latent variables to select which transition modules to apply to which high-level object. We evaluate this model in video prediction tasks where the goal is to predict multi-modal future events given previous observations. We demonstrate empirically that VIM can model 2D visual sequences in an interpretable way and is able to identify the underlying dynamically instantiated mechanisms of the generation process. We additionally show that the learnt modules can be composed at test time to generalize to out-of-distribution observations.
Multi-label Iterated Learning for Image Classification with Label Ambiguity
Sai Rajeswar
Pau Rodriguez
Soumye Singhal
David Vazquez
Transfer learning from large-scale pre-trained models has become essential for many computer vision tasks. Recent studies have shown that da… (see more)tasets like ImageNet are weakly labeled since images with multiple object classes present are assigned a single label. This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data. Inspired by language emergence literature, we propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels using the framework of iterated learning. MILe is a simple yet effective procedure that builds a multi-label description of the image by propagating binary predictions through successive generations of teacher and student networks with a learning bottleneck. Experiments show that our approach exhibits systematic benefits on ImageNet accuracy as well as ReaL F1 score, which indicates that MILe deals better with label ambiguity than the standard training procedure, even when fine-tuning from self-supervised weights. We also show that MILe is effective reducing label noise, achieving state-of-the-art performance on real-world large-scale noisy data such as WebVision. Furthermore, MILe improves performance in class incremental settings such as IIRC and it is robust to distribution shifts. Code: https://github.com/rajeswar18/MILe
Unsupervised Dependency Graph Network
Yikang Shen
Shawn Tan
Peng Li
Jie Zhou
Fortuitous Forgetting in Connectionist Networks
Hattie Zhou
Ankit Vani
Forgetting is often seen as an unwanted characteristic in both human and machine learning. However, we propose that forgetting can in fact b… (see more)e favorable to learning. We introduce"forget-and-relearn"as a powerful paradigm for shaping the learning trajectories of artificial neural networks. In this process, the forgetting step selectively removes undesirable information from the model, and the relearning step reinforces features that are consistently useful under different conditions. The forget-and-relearn framework unifies many existing iterative training algorithms in the image classification and language emergence literature, and allows us to understand the success of these algorithms in terms of the disproportionate forgetting of undesirable information. We leverage this understanding to improve upon existing algorithms by designing more targeted forgetting operations. Insights from our analysis provide a coherent view on the dynamics of iterative training in neural networks and offer a clear path towards performance improvements.
Invariant representation driven neural classifier for anti-QCD jet tagging
Taoli Cheng
Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress
Unsupervised Dependency Graph Network
Yikang Shen
Shawn Tan
Peng Li
Jie Zhou
Recent work has identified properties of pretrained self-attention models that mirror those of dependency parse structures. In particular, s… (see more)ome self-attention heads correspond well to individual dependency types. Inspired by these developments, we propose a new competitive mechanism that encourages these attention heads to model different dependency relations. We introduce a new model, the Unsupervised Dependency Graph Network (UDGN), that can induce dependency structures from raw corpora and the masked language modeling task. Experiment results show that UDGN achieves very strong unsupervised dependency parsing performance without gold POS tags and any other external information. The competitive gated heads show a strong correlation with human-annotated dependency types. Furthermore, the UDGN can also achieve competitive performance on masked language modeling and sentence textual similarity tasks.
Multi-label Iterated Learning for Image Classification with Label Ambiguity
Sai Rajeswar
Pau Rodriguez
Soumye Singhal
David Vazquez
Transfer learning from large-scale pre-trained models has become essential for many computer vision tasks. Recent studies have shown that da… (see more)tasets like ImageNet are weakly labeled since images with multiple object classes present are assigned a single label. This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data. Inspired by language emergence literature, we propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels using the framework of iterated learning. MILe is a simple yet effective procedure that builds a multi-label description of the image by propagating binary predictions through successive generations of teacher and student networks with a learning bottleneck. Experiments show that our approach exhibits systematic benefits on ImageNet accuracy as well as ReaL F1 score, which indicates that MILe deals better with label ambiguity than the standard training procedure, even when fine-tuning from self-supervised weights. We also show that MILe is effective reducing label noise, achieving state-of-the-art performance on real-world large-scale noisy data such as WebVision. Furthermore, MILe improves performance in class incremental settings such as IIRC and it is robust to distribution shifts. Code: https://github.com/rajeswar18/MILe
Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle
Syntax is fundamental to our thinking about language. Failing to capture the structure of input language could lead to generalization proble… (see more)ms and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with an incremental parser and maintains the conditional probability setting of a standard language model (left-to-right). To train the incremental parser and avoid exposure bias, we also propose a novel dynamic oracle, so that SOM is more robust to wrong parsing decisions. Experiments show that SOM can achieve strong results in language modeling, incremental parsing, and syntactic generalization tests while using fewer parameters than other models.
Understanding by Understanding Not: Modeling Negation in Language Models
Negation is a core construction in natural language. Despite being very successful on many tasks, state-of-the-art pre-trained language mode… (see more)ls often handle negation incorrectly. To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. By training BERT with the resulting combined objective we reduce the mean top 1 error rate to 4% on the negated LAMA dataset. We also see some improvements on the negated NLI benchmarks.
DATA-EFFICIENT REINFORCEMENT LEARNING
Nitarshan Rajkumar
Michael Noukhovitch
Ankesh Anand
Philip Bachman
Data efficiency poses a major challenge for deep reinforcement learning. We approach this issue from the perspective of self-supervised repr… (see more)esentation learning, leveraging reward-free exploratory data to pretrain encoder networks. We employ a novel combination of latent dynamics modelling and goal-reaching objectives, which exploit the inherent structure of data in reinforcement learning. We demonstrate that our method scales well with network capacity and pretraining data. When evaluated on the Atari 100k data-efficiency benchmark, our approach significantly outperforms previous methods combining unsupervised pretraining with task-specific finetuning, and approaches human-level performance.