Portrait de Aaron Courville

Aaron Courville

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage de représentations
Apprentissage par renforcement
Apprentissage profond
Communication efficace dans un jeu de somme générale
Modèles génératifs
Systèmes multi-agents
Théorie des jeux
Traitement du langage naturel
Vision par ordinateur

Biographie

Aaron Courville est professeur au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal et Directeur scientifique à IVADO. Il a obtenu son doctorat au Robotics Institute de l'Université Carnegie Mellon.

Il est l'un des premiers contributeurs à l'apprentissage profond, membre fondateur de Mila – Institut québécois d’intelligence artificielle. Avec Ian Goodfellow et Yoshua Bengio, il a coécrit le manuel de référence sur l'apprentissage profond.

Ses recherches actuelles portent sur le développement de modèles et de méthodes d'apprentissage profond. Il s'intéresse particulièrement à l'apprentissage par renforcement, à l'apprentissage par renforcement multi-agents, aux modèles génératifs profonds et au raisonnement.

Aaron Courville est titulaire d'une chaire en IA Canada-CIFAR et d'une Chaire de recherche du Canada (CRC) en généralisation systématique. Ses recherches ont été soutenues en partie par Microsoft Research, Samsung, Hitachi, Meta, Sony (bourse de recherche) et Google (bourse de recherche ciblée).

Étudiants actuels

Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - Université de Montréal
Doctorat - UdeM
Doctorat - UdeM
Collaborateur·rice de recherche - N/A
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice alumni - UdeM
Superviseur⋅e principal⋅e :
Stagiaire de recherche - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :

Publications

Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods
Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods
Unsupervised Dependency Graph Network
Yikang Shen
Shawn Tan
Peng Li
Jie Zhou
I NTRODUCING C OORDINATION IN C ONCURRENT R EIN - FORCEMENT L EARNING
Adrien Ali Taiga
Google Brain
Research on exploration in reinforcement learning has mostly focused on problems with a single agent interacting with an environment. Howeve… (voir plus)r many problems are better addressed by the concurrent reinforcement learning paradigm, where multiple agents operate in a common environment. Recent work has tackled the challenge of exploration in this particular setting (Dimakopoulou & Van Roy, 2018; Dimakopoulou et al., 2018). Nonetheless, they do not completely leverage the characteristics of this framework and agents end up behaving independently from each other. In this work we argue that coordination among concurrent agents is crucial for efficient exploration. We introduce coordination in Thompson Sampling based methods by drawing correlated samples from an agent’s posterior. We apply this idea to extend existing exploration schemes such as randomized least squares value iteration (RLSVI). Empirical results on simple toy tasks emphasize the merits of our approach and call attention to coordination as a key objective for efficient exploration in concurrent reinforcement learning.
INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision
Lluis Castrejon
Nicolas Ballas
We propose INFERNO, a method to infer object-centric representations of visual scenes without annotations. Our method decomposes a scene int… (voir plus)o multiple objects, with each object having a structured representation that disentangles its shape, appearance and pose. Each object representation defines a localized neural radiance field used to generate 2D views of the scene through differentiable rendering. Our model is subsequently trained by minimizing a reconstruction loss between inputs and corresponding rendered scenes. We empirically show that INFERNO discovers objects in a scene without supervision. We also validate the interpretability of the learned representations by manipulating inferred scenes and showing the corresponding effect in the rendered output. Finally, we demonstrate the usefulness of our 3D object representations in a visual reasoning task using the CATER dataset.
DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization
Aviral Kumar
Tengyu Ma
George Tucker
Sergey Levine
Despite overparameterization, deep networks trained via supervised learning are surprisingly easy to optimize and exhibit excellent generali… (voir plus)zation. One hypothesis to explain this is that overparameterized deep networks enjoy the benefits of implicit regularization induced by stochastic gradient descent, which favors parsimonious solutions that generalize well on test inputs. It is reasonable to surmise that deep reinforcement learning (RL) methods could also benefit from this effect. In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations. Our theoretical analysis shows that when existing models of implicit regularization are applied to temporal difference learning, the resulting derived regularizer favors degenerate solutions with excessive aliasing, in stark contrast to the supervised learning case. We back up these findings empirically, showing that feature representations learned by a deep network value function trained via bootstrapping can indeed become degenerate, aliasing the representations for state-action pairs that appear on either side of the Bellman backup. To address this issue, we derive the form of this implicit regularizer and, inspired by this derivation, propose a simple and effective explicit regularizer, called DR3, that counteracts the undesirable effects of this implicit regularizer. When combined with existing offline RL methods, DR3 substantially improves performance and stability, alleviating unlearning in Atari 2600 games, D4RL domains, and robotic manipulation from images.
Fortuitous Forgetting in Connectionist Networks
Forgetting is often seen as an unwanted characteristic in both human and machine learning. However, we propose that forgetting can in fact b… (voir plus)e favorable to learning. We introduce"forget-and-relearn"as a powerful paradigm for shaping the learning trajectories of artificial neural networks. In this process, the forgetting step selectively removes undesirable information from the model, and the relearning step reinforces features that are consistently useful under different conditions. The forget-and-relearn framework unifies many existing iterative training algorithms in the image classification and language emergence literature, and allows us to understand the success of these algorithms in terms of the disproportionate forgetting of undesirable information. We leverage this understanding to improve upon existing algorithms by designing more targeted forgetting operations. Insights from our analysis provide a coherent view on the dynamics of iterative training in neural networks and offer a clear path towards performance improvements.
MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling
Ethan Manilow
Yi Deng
Rigel Swavely
Kyle Kastner
Tim Cooijmans
Jesse Engel
Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detai… (voir plus)led expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments that enables both realistic neural audio synthesis and detailed user control. Starting from interpretable Differentiable Digital Signal Processing (DDSP) synthesis parameters, we infer musical notes and high-level properties of their expressive performance (such as timbre, vibrato, dynamics, and articulation). This creates a 3-level hierarchy (notes, performance, synthesis) that affords individuals the option to intervene at each level, or utilize trained priors (performance given notes, synthesis given performance) for creative assistance. Through quantitative experiments and listening tests, we demonstrate that this hierarchy can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence. By utilizing an interpretable hierarchy, with multiple levels of granularity, MIDI-DDSP opens the door to assistive tools to empower individuals across a diverse range of musical experience.
Invariant representation driven neural classifier for anti-QCD jet tagging
Taoli Cheng
Chunked Autoregressive GAN for Conditional Waveform Synthesis
Max Morrison
Rithesh Kumar
Kundan Kumar
Prem Seetharaman
Consistency-CAM: Towards Improved Weakly Supervised Semantic Segmentation.
Sai Rajeswar
Issam Hadj Laradji
Pau Rodriguez
David Vazquez
Learning to Dequantise with Truncated Flows
Shawn Tan
Chin-Wei Huang
Dequantisation is a general technique used for transforming data described by a discrete random variable x into a continuous (latent) random… (voir plus) variable z, for the purpose of it being modeled by likelihood-based density models. Dequantisation was first introduced in the context of ordinal data, such as image pixel values. However, when the data is categorical, the dequantisation scheme is not obvious. We learn such a dequantisation scheme q(z|x), using variational inference with TRUncated FLows (TRUFL) — a novel flow-based model that allows the dequantiser to have a learnable truncated support. Unlike previous work, the TRUFL dequantiser is (i) capable of embedding the data losslessly in certain cases, since the truncation allows the conditional distributions q(z|x) to have non-overlapping bounded supports, while being (ii) trainable with back-propagation. Addtionally, since the support of the marginal q(z) is bounded and the support of prior p(z) is not, we propose to renormalise the prior distribution over the support of q(z). We derive a lower bound for training, and propose a rejection sampling scheme to account for the invalid samples. Experimentally, we benchmark TRUFL on constrained generation tasks, and find that it outperforms prior approaches. In addition, we find that rejection sampling results in higher validity for the constrained problems.