Portrait de Aaron Courville

Aaron Courville

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle


Aaron Courville est professeur au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal. Il a obtenu son doctorat au Robotics Institute de l'Université Carnegie Mellon. Il est l'un des premiers contributeurs à l'apprentissage profond, membre fondateur de Mila – Institut québécois d’intelligence artificielle et membre du programme Apprentissage automatique, apprentissage biologique de l'Institut canadien de recherches avancées (CIFAR). Avec Ian Goodfellow et Yoshua Bengio, il a coécrit le manuel de référence sur l'apprentissage profond. Ses recherches actuelles portent sur le développement de modèles et de méthodes d'apprentissage profond. Il s'intéresse particulièrement à l'apprentissage par renforcement, aux modèles génératifs profonds et à l'apprentissage multimodal avec des applications telles que la vision par ordinateur et le traitement du langage naturel. Aaron Courville est titulaire d'une chaire en IA Canada-CIFAR et d'une Chaire de recherche du Canada (CRC) en généralisation systématique. Ses recherches ont été soutenues en partie par Microsoft Research, Samsung, Hitachi, Sony (bourse de recherche) et Google (bourse de recherche ciblée).

Étudiants actuels

Sparse Universal Transformer
Shawn Tan
Yikang Shen
Zhenfang Chen
Chuang Gan
The Universal Transformer (UT) is a variant of the Transformer that shares parameters across its layers and is Turing-complete under certain… (voir plus) assumptions. Empirical evidence also shows that UTs have better compositional generalization than Vanilla Transformers (VTs) in formal language tasks. The parameter-sharing also affords it better parameter efficiency than VTs. Despite its many advantages, most state-of-the-art NLP systems use VTs as their backbone model instead of UTs. This is mainly because scaling UT parameters is more compute and memory intensive than scaling up a VT. This paper proposes the Sparse Universal Transformer (SUT), which leverages Sparse Mixture of Experts (SMoE) to reduce UT's computation complexity while retaining its parameter efficiency and generalization ability. Experiments show that SUT combines the best of both worlds, achieving strong generalization results on formal language tasks (Logical inference and CFQ) and impressive parameter and computation efficiency on standard natural language benchmarks like WMT'14.
Double Gumbel Q-Learning.
David Yu-Tung Hui
Group Robust Classification Without Any Group Information
Christos Tsirigotis
Joao Monteiro
Pau Rodriguez
David Vazquez
Improving Compositional Generalization using Iterated Learning and Simplicial Embeddings
Yi Ren
Samuel Lavoie
Mikhail Galkin
Danica J. Sutherland
Language Model Alignment with Elastic Reset
Michael Noukhovitch
Samuel Lavoie
Florian Strub
Finetuning language models with reinforcement learning (RL), e.g. from human feedback (HF), is a prominent method for alignment. But optimiz… (voir plus)ing against a reward model can improve on reward while degrading performance in other areas, a phenomenon known as reward hacking, alignment tax, or language drift. First, we argue that commonly-used test metrics are insufficient and instead measure how different algorithms tradeoff between reward and drift. The standard method modified the reward with a Kullback-Lieber (KL) penalty between the online and initial model. We propose Elastic Reset, a new algorithm that achieves higher reward with less drift without explicitly modifying the training objective. We periodically reset the online model to an exponentially moving average (EMA) of itself, then reset the EMA model to the initial model. Through the use of an EMA, our model recovers quickly after resets and achieves higher reward with less drift in the same number of steps. We demonstrate that fine-tuning language models with Elastic Reset leads to state-of-the-art performance on a small scale pivot-translation benchmark, outperforms all baselines in a medium-scale RLHF-like IMDB mock sentiment task and leads to a more performant and more aligned technical QA chatbot with LLaMA-7B. Code available at github.com/mnoukhov/elastic-reset.
Versatile Energy-Based Probabilistic Models for High Energy Physics
Taoli Cheng
Discovering the Electron Beam Induced Transition Rates for Silicon Dopants in Graphene with Deep Neural Networks in the STEM
Kevin M Roccapriore
Max Schwarzer
Joshua Greaves
Jesse Farebrother
Rishabh Agarwal
Colton Bishop
Maxim Ziatdinov
Igor Mordatch
Ekin Dogus Cubuk
Sergei V Kalinin
Meta-Value Learning: a General Framework for Learning with Learning Awareness
Tim Cooijmans
Milad Aghajohari
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Max Schwarzer
Johan Samir Obando Ceron
Rishabh Agarwal
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on sca… (voir plus)ling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
Learning with Learning Awareness using Meta-Values
Tim Cooijmans
Milad Aghajohari
