Portrait de Aaron Courville

Aaron Courville

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle

Biographie

Aaron Courville est professeur au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal. Il a obtenu son doctorat au Robotics Institute de l'Université Carnegie Mellon. Il est l'un des premiers contributeurs à l'apprentissage profond, membre fondateur de Mila – Institut québécois d’intelligence artificielle et membre du programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Avec Ian Goodfellow et Yoshua Bengio, il a coécrit le manuel de référence sur l'apprentissage profond. Ses recherches actuelles portent sur le développement de modèles et de méthodes d'apprentissage profond. Il s'intéresse particulièrement à l'apprentissage par renforcement, aux modèles génératifs profonds et à l'apprentissage multimodal avec des applications telles que la vision par ordinateur et le traitement du langage naturel. Aaron Courville est titulaire d'une chaire en IA Canada-CIFAR et d'une Chaire de recherche du Canada (CRC) en généralisation systématique. Ses recherches ont été soutenues en partie par Microsoft Research, Samsung, Hitachi, Sony (bourse de recherche) et Google (bourse de recherche ciblée).

Étudiants actuels

Doctorat - Université de Montréal
Co-superviseur⋅e :
Maîtrise recherche - Université de Montréal
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Baccalauréat - Université de Montréal
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Co-superviseur⋅e :
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Co-superviseur⋅e :
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Co-superviseur⋅e :
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche
Collaborateur·rice de recherche - Université de Montréal
Stagiaire de recherche - Ghent University
Stagiaire de recherche - Université de Montréal
Doctorat - Université de Montréal
Co-superviseur⋅e :
Doctorat - Université de Montréal
Maîtrise recherche - Université de Montréal
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal

Publications

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
Sai Rajeswar
Pietro Mazzaglia
Tim Verbelen
Alexandre Piché
Bart Dhoedt
Alexandre Lacoste
Controlling artificial agents from visual sensory data is an arduous task. Reinforcement learning (RL) algorithms can succeed but require la… (voir plus)rge amounts of interactions between the agent and the environment. To alleviate the issue, unsupervised RL proposes to employ self-supervised interaction and learning, for adapting faster to future tasks. Yet, as shown in the Unsupervised RL Benchmark (URLB; Laskin et al. 2021), whether current unsupervised strategies can improve generalization capabilities is still unclear, especially in visual control settings. In this work, we study the URLB and propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent, and a task-aware fine-tuning strategy combined with a new proposed hybrid planner, Dyna-MPC, to adapt the agent for downstream tasks. On URLB, our method obtains 93.59% overall normalized performance, surpassing previous baselines by a staggering margin. The approach is empirically evaluated through a large-scale empirical study, which we use to validate our design choices and analyze our models. We also show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation. Project website: https://masteringurlb.github.io/
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
Sai Rajeswar
Pietro Mazzaglia
Tim Verbelen
Alexandre Piché
Bart Dhoedt
Alexandre Lacoste
Generative Augmented Flow Networks
Ling Pan
Dinghuai Zhang
Longbo Huang
The Generative Flow Network is a probabilistic framework where an agent learns a stochastic policy for object generation, such that the prob… (voir plus)ability of generating an object is proportional to a given reward function. Its effectiveness has been shown in discovering high-quality and diverse solutions, compared to reward-maximizing reinforcement learning-based methods. Nonetheless, GFlowNets only learn from rewards of the terminal states, which can limit its applicability. Indeed, intermediate rewards play a critical role in learning, for example from intrinsic motivation to provide intermediate feedback even in particularly challenging sparse reward tasks. Inspired by this, we propose Generative Augmented Flow Networks (GAFlowNets), a novel learning framework to incorporate intermediate rewards into GFlowNets. We specify intermediate rewards by intrinsic motivation to tackle the exploration problem in sparse reward environments. GAFlowNets can leverage edge-based and state-based intrinsic rewards in a joint way to improve exploration. Based on extensive experiments on the GridWorld task, we demonstrate the effectiveness and efficiency of GAFlowNet in terms of convergence, performance, and diversity of solutions. We further show that GAFlowNet is scalable to a more complex and large-scale molecule generation domain, where it achieves consistent and significant performance improvement.
Investigating Multi-task Pretraining and Generalization in Reinforcement Learning
Adrien Ali Taiga
Rishabh Agarwal
Jesse Farebrother
Google Brain
Latent State Marginalization as a Low-cost Approach for Improving Exploration
Dinghuai Zhang
Qinqing Zheng
Amy Zhang
Ricky T. Q. Chen
Online black-box adaptation to label-shift in the presence of conditional-shift
Faruk Ahmed
We consider an out-of-distribution setting where trained predictive models are deployed online in new locations (inducing conditional-shift)… (voir plus), such that these locations are also associated with differently skewed target distributions (label-shift). While approaches for online adaptation to label-shift have recently been discussed by Wu et al. (2021), the potential presence of concurrent conditional-shift has not been considered in the literature, although one might anticipate such distributional shifts in realistic deployments. In this paper, we empirically explore the effectiveness of online adaptation methods in such situations on three synthetic and two realistic datasets, comprising both classification and regression problems. We show that it is possible to improve performance in these settings by learning additional hyper-parameters to account for the presence of conditional-shift by using appropriate validation sets.
Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier
Pierluca D'Oro
Max Schwarzer
Evgenii Nikishin
Increasing the replay ratio, the number of updates of an agent's parameters per environment interaction, is an appealing strategy for improv… (voir plus)ing the sample efficiency of deep reinforcement learning algorithms. In this work, we show that fully or partially resetting the parameters of deep reinforcement learning agents causes better replay ratio scaling capabilities to emerge. We push the limits of the sample efficiency of carefully-modified algorithms by training them using an order of magnitude more updates than usual, significantly improving their performance in the Atari 100k and DeepMind Control Suite benchmarks. We then provide an analysis of the design choices required for favorable replay ratio scaling to be possible and discuss inherent limits and tradeoffs.
Simplicial Embeddings in Self-Supervised Learning and Downstream Classification
Samuel Lavoie
Christos Tsirigotis
Max Schwarzer
Kenji Kawaguchi
Ankit Vani
Michael Noukhovitch
Simplicial Embeddings (SEM) are representations learned through self-supervised learning (SSL), wherein a representation is projected into …
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Max Schwarzer
Johan Samir Obando Ceron
Rishabh Agarwal
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on sca… (voir plus)ling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
Noisy Pairing and Partial Supervision for Stylized Opinion Summarization
Reinald Kim
Mirella Lapata. 2020
Un-611
Emmanuel Bengio
Maxinder S. Kan-620
Asja Fischer
Somnath Basu
Roy Chowdhury
Chao Zhao
Tanya Goyal
Junyi Jiacheng Xu
Jessy Li
Ivor Wai-hung Tsang
James T. Kwok
Neil Houlsby
Andrei Giurgiu
Stanisław Jastrzębski … (voir 22 de plus)
Bruna Morrone
Quentin de Laroussilhe
Mona Gesmundo
Attariyan Sylvain
Gelly
Thomas Wolf
Lysandre Debut
Julien Victor Sanh
Clement Chaumond
Anthony Delangue
Pier-339 Moi
Tim ric Cistac
R´emi Rault
Morgan Louf
Funtow-900 Joe
Sam Davison
Patrick Shleifer
Von Platen
Clara Ma
Yacine Jernite
Julien Plu
Canwen Xu
Opinion summarization research has primar-001 ily focused on generating summaries reflect-002 ing important opinions from customer reviews 0… (voir plus)03 without paying much attention to the writing 004 style. In this paper, we propose the stylized 005 opinion summarization task, which aims to 006 generate a summary of customer reviews in 007 the desired (e.g., professional) writing style. 008 To tackle the difficulty in collecting customer 009 and professional review pairs, we develop a 010 non-parallel training framework, Noisy Pair-011 ing and Partial Supervision ( NAPA ), which 012 trains a stylized opinion summarization sys-013 tem from non-parallel customer and profes-014 sional review sets. We create a benchmark P RO - 015 S UM by collecting customer and professional 016 reviews from Yelp and Michelin. Experimental 017 results on P RO S UM and FewSum demonstrate 018 that our non-parallel training framework con-019 sistently improves both automatic and human 020 evaluations, successfully building a stylized 021 opinion summarization model that can gener-022 ate professionally-written summaries from cus-023 tomer reviews. 024
Versatile Energy-Based Models for High Energy Physics
Taoli Cheng
Teaching Algorithmic Reasoning via In-context Learning
Hattie Zhou
Azade Nova
Behnam Neyshabur
Hanie Sedghi