Portrait de Aaron Courville

Aaron Courville

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage de représentations
Apprentissage par renforcement
Apprentissage profond
Communication efficace dans un jeu de somme générale
Modèles génératifs
Systèmes multi-agents
Théorie des jeux
Traitement du langage naturel
Vision par ordinateur

Biographie

Aaron Courville est professeur au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal et Directeur scientifique à IVADO. Il a obtenu son doctorat au Robotics Institute de l'Université Carnegie Mellon.

Il est l'un des premiers contributeurs à l'apprentissage profond, membre fondateur de Mila – Institut québécois d’intelligence artificielle. Avec Ian Goodfellow et Yoshua Bengio, il a coécrit le manuel de référence sur l'apprentissage profond.

Ses recherches actuelles portent sur le développement de modèles et de méthodes d'apprentissage profond. Il s'intéresse particulièrement à l'apprentissage par renforcement, à l'apprentissage par renforcement multi-agents, aux modèles génératifs profonds et au raisonnement.

Aaron Courville est titulaire d'une chaire en IA Canada-CIFAR et d'une Chaire de recherche du Canada (CRC) en généralisation systématique. Ses recherches ont été soutenues en partie par Microsoft Research, Samsung, Hitachi, Meta, Sony (bourse de recherche) et Google (bourse de recherche ciblée).

Étudiants actuels

Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - Université de Montréal
Doctorat - UdeM
Doctorat - UdeM
Collaborateur·rice de recherche - N/A
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice alumni - UdeM
Superviseur⋅e principal⋅e :
Stagiaire de recherche - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :

Publications

NU-GAN: High resolution neural upsampling with GAN
Rithesh Kumar
Kundan Kumar
Vicki Anand
In this paper, we propose NU-GAN, a new method for resampling audio from lower to higher sampling rates (upsampling). Audio upsampling is an… (voir plus) important problem since productionizing generative speech technology requires operating at high sampling rates. Such applications use audio at a resolution of 44.1 kHz or 48 kHz, whereas current speech synthesis methods are equipped to handle a maximum of 24 kHz resolution. NU-GAN takes a leap towards solving audio upsampling as a separate component in the text-to-speech (TTS) pipeline by leveraging techniques for audio generation using GANs. ABX preference tests indicate that our NU-GAN resampler is capable of resampling 22 kHz to 44.1 kHz audio that is distinguishable from original audio only 7.4% higher than random chance for single speaker dataset, and 10.8% higher than chance for multi-speaker dataset.
Explicitly Modeling Syntax in Language Model improves Generalization
Syntax is fundamental to our thinking about language. Although neural networks are very successful in many tasks, they do not explicitly mod… (voir plus)el syntactic structure. Failing to capture the structure of inputs could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with a one-step look-ahead parser and maintains the conditional probability setting of the standard language model. Experiments show that SOM can achieve strong results in language modeling and syntactic generalization tests, while using fewer parameters then other models.
Data-Efficient Reinforcement Learning with Momentum Predictive Representations
Max Schwarzer
Ankesh Anand
Rishab Goel
Philip Bachman
While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interacti… (voir plus)on with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential interaction with the environment. Our method, Momentum Predictive Representations (MPR), trains an agent to predict its own latent state representations multiple steps into the future. We compute target representations for future states using an encoder which is an exponential moving average of the agent's parameters, and we make predictions using a learned transition model. On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels. We further improve performance by adding data augmentation to the future prediction loss, which forces the agent's representations to be consistent across multiple views of an observation. Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.444 on Atari in a setting limited to 100K steps of environment interaction, which is a 66% relative improvement over the previous state-of-the-art. Moreover, even in this limited data regime, MPR exceeds expert human scores on 6 out of 26 games.
Data-Efficient Reinforcement Learning with Self-Predictive Representations
Max Schwarzer
Ankesh Anand
Rishab Goel
Philip Bachman
While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interacti… (voir plus)on with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential interaction with the environment. Our method, Self-Predictive Representations(SPR), trains an agent to predict its own latent state representations multiple steps into the future. We compute target representations for future states using an encoder which is an exponential moving average of the agent's parameters and we make predictions using a learned transition model. On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels. We further improve performance by adding data augmentation to the future prediction loss, which forces the agent's representations to be consistent across multiple views of an observation. Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.415 on Atari in a setting limited to 100k steps of environment interaction, which represents a 55% relative improvement over the previous state-of-the-art. Notably, even in this limited data regime, SPR exceeds expert human scores on 7 out of 26 games. The code associated with this work is available at https://github.com/mila-iqia/spr
A Large-Scale, Open-Domain, Mixed-Interface Dialogue-Based ITS for STEM
Iulian V. Serban
Varun Gupta
Ekaterina Kochmar
Dung D. Vu
Robert Belfer
Stochastic Neural Network with Kronecker Flow
Chin-Wei Huang
Ahmed Touati
Alexandre Lacoste
Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to… (voir plus) scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this work, we address this need and present the Kronecker Flow, a generalization of the Kronecker product to invertible mappings designed for stochastic neural networks. We apply our method to variational Bayesian neural networks on predictive tasks, PAC-Bayes generalization bound estimation, and approximate Thompson sampling in contextual bandits. In all setups, our methods prove to be competitive with existing methods and better than the baselines.
Detecting semantic anomalies
Faruk Ahmed
We critically appraise the recent interest in out-of-distribution (OOD) detection and question the practical relevance of existing benchmark… (voir plus)s. While the currently prevalent trend is to consider different datasets as OOD, we argue that out-distributions of practical interest are ones where the distinction is semantic in nature for a specified context, and that evaluative tasks should reflect this more closely. Assuming a context of object recognition, we recommend a set of benchmarks, motivated by practical applications. We make progress on these benchmarks by exploring a multi-task learning based approach, showing that auxiliary objectives for improved semantic awareness result in improved semantic anomaly detection, with accompanying generalization benefits.
Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images Using a View-Based Representation
Sai Rajeswar
Fahim Mannan
Jérôme Parent-Lévesque
David Vazquez
Out-of-Distribution Generalization via Risk Extrapolation (REx)
Joern-Henrik Jacobsen
Amy Zhang
Jonathan Binas
Rémi LE PRIOL
Generalizing outside of the training distribution is an open challenge for current machine learning systems. A weak form of out-of-distribut… (voir plus)ion (OoD) generalization is the ability to successfully interpolate between multiple observed distributions. One way to achieve this is through robust optimization, which seeks to minimize the worst-case risk over convex combinations of the training distributions. However, a much stronger form of OoD generalization is the ability of models to extrapolate beyond the distributions observed during training. In pursuit of strong OoD generalization, we introduce the principle of Risk Extrapolation (REx). REx can be viewed as encouraging robustness over affine combinations of training risks, by encouraging strict equality between training risks. We show conceptually how this principle enables extrapolation, and demonstrate the effectiveness and scalability of instantiations of REx on various OoD generalization tasks. Our code can be found at this https URL.
Solving ODE with Universal Flows: Approximation Theory for Flow-Based Models
Chin-Wei Huang
Laurent Dinh
Normalizing flows are powerful invertible probabilistic models that can be used to translate two probability distributions, in a way that al… (voir plus)lows us to efficiently track the change of probability density. However, to trade for computational efficiency in sampling and in evaluating the log-density, special parameterization designs have been proposed at the cost of representational expressiveness. In this work, we propose to use ODEs as a framework to establish universal approximation theory for certain families of flow-based models.
Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models
Chin-Wei Huang
Laurent Dinh
In this work, we propose a new family of generative flows on an augmented data space, with an aim to improve expressivity without drasticall… (voir plus)y increasing the computational cost of sampling and evaluation of a lower bound on the likelihood. Theoretically, we prove the proposed flow can approximate a Hamiltonian ODE as a universal transport map. Empirically, we demonstrate state-of-the-art performance on standard benchmarks of flow-based generative modeling.
On Bonus Based Exploration Methods In The Arcade Learning Environment
Adrien Ali Taiga
William Fedus
Marlos C. Machado
Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration prob… (voir plus)lems such as Montezuma's Revenge (Bellemare et al., 2016). Recently, bonus-based exploration methods, which explore by augmenting the environment reward, have reached above-human average performance on such domains. In this paper we reassess popular bonus-based exploration methods within a common evaluation framework. We combine Rainbow (Hessel et al., 2018) with different exploration bonuses and evaluate its performance on Montezuma's Revenge, Bellemare et al.'s set of hard of exploration games with sparse rewards, and the whole Atari 2600 suite. We find that while exploration bonuses lead to higher score on Montezuma's Revenge they do not provide meaningful gains over the simpler epsilon-greedy scheme. In fact, we find that methods that perform best on that game often underperform epsilon-greedy on easy exploration Atari 2600 games. We find that our conclusions remain valid even when hyperparameters are tuned for these easy-exploration games. Finally, we find that none of the methods surveyed benefit from additional training samples (1 billion frames, versus Rainbow's 200 million) on Bellemare et al.'s hard exploration games. Our results suggest that recent gains in Montezuma's Revenge may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain.