Portrait de Aaron Courville

Aaron Courville

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur titulaire, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage de représentations
Apprentissage par renforcement
Apprentissage profond
Communication efficace dans un jeu de somme générale
Modèles génératifs
Systèmes multi-agents
Théorie des jeux
Traitement du langage naturel
Vision par ordinateur

Biographie

Aaron Courville est professeur au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal et Directeur scientifique à IVADO. Il a obtenu son doctorat au Robotics Institute de l'Université Carnegie Mellon.

Il est l'un des premiers contributeurs à l'apprentissage profond, membre fondateur de Mila – Institut québécois d’intelligence artificielle. Avec Ian Goodfellow et Yoshua Bengio, il a coécrit le manuel de référence sur l'apprentissage profond.

Ses recherches actuelles portent sur le développement de modèles et de méthodes d'apprentissage profond. Il s'intéresse particulièrement à l'apprentissage par renforcement, à l'apprentissage par renforcement multi-agents, aux modèles génératifs profonds et au raisonnement.

Aaron Courville est titulaire d'une chaire en IA Canada-CIFAR et d'une Chaire de recherche du Canada (CRC) en généralisation systématique. Ses recherches ont été soutenues en partie par Microsoft Research, Samsung, Hitachi, Meta, Sony (bourse de recherche) et Google (bourse de recherche ciblée).

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche - UdeM
Doctorat - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM

Publications

Representation Mixing for TTS Synthesis
Recent character and phoneme-based parametric TTS systems using deep learning have shown strong performance in natural speech generation. Ho… (voir plus)wever, the choice between character or phoneme input can create serious limitations for practical deployment, as direct control of pronunciation is crucial in certain cases. We demonstrate a simple method for combining multiple types of linguistic information in a single encoder, named representation mixing, enabling flexible choice between character, phoneme, or mixed representations during inference. Experiments and user studies on a public audiobook corpus show the efficacy of our approach.
Brief Report: Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Counterpoint by Convolution
Cheng-Zhi Anna Huang
Adam Roberts
Machine learning models of music typically break down the task of composition into a chronological process, composing a piece of music in a … (voir plus)single pass from beginning to end. On the contrary, human composers write music in a nonlinear fashion, scribbling motifs here and there, often revisiting choices previously made. We explore the use of blocked Gibbs sampling as an analogue to the human approach, and introduce Coconet, a convolutional neural network in the NADE family of generative models. Despite ostensibly sampling from the same distribution as the NADE ancestral sampling procedure, we find that a blocked Gibbs approach significantly improves sample quality. We provide evidence that this is due to some conditional distributions being poorly modeled. Moreover, we show that even the cheap approximate blocked Gibbs procedure from Yao et al. (2014) yields better samples than ancestral sampling. We demonstrate the versatility of our method on unconditioned polyphonic music generation.
Maximum Entropy Generators for Energy-Based Models
Maximum likelihood estimation of energy-based models is a challenging problem due to the intractability of the log-likelihood gradient. In t… (voir plus)his work, we propose learning both the energy function and an amortized approximate sampling mechanism using a neural generator network, which provides an efficient approximation of the log-likelihood gradient. The resulting objective requires maximizing entropy of the generated samples, which we perform using recently proposed nonparametric mutual information estimators. Finally, to stabilize the resulting adversarial game, we use a zero-centered gradient penalty derived as a necessary condition from the score matching literature. The proposed technique can generate sharp images with Inception and FID scores competitive with recent GAN techniques, does not suffer from mode collapse, and is competitive with state-of-the-art anomaly detection techniques.
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
Thibault de Boissiere
Lucas Gestin
Wei Zhen Teoh
Jose Sotelo
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger c… (voir plus)onstituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of constituents. This paper proposes to add such an inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.
No Press Diplomacy: Modeling Multi-Agent Gameplay
Yuchen Lu
Steven Bocco
Max O. Smith
Jonathan K. Kummerfeld
Satinder Singh
Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Rel… (voir plus)iance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players. We present DipNet, a neural-network-based policy model for No Press Diplomacy. The model was trained on a new dataset of more than 150,000 human games. Our model is trained by supervised learning (SL) from expert trajectories, which is then used to initialize a reinforcement learning (RL) agent trained through self-play. Both the SL and RL agents demonstrate state-of-the-art No Press performance by beating popular rule-based bots.
Probability Distillation: A Caveat and Alternatives
Due to Van den Oord et al. (2018), probability distillation has recently been of interest to deep learning practitioners, where, as a practi… (voir plus)cal workaround for deploying autoregressive models in real-time applications, a student network is used to obtain quality samples in parallel. We identify a pathological optimization issue with the adopted stochastic minimization of the reverse-KL divergence: the curse of dimensionality results in a skewed gradient distribution that renders training inefficient. This means that KL-based “evaluative” training can be susceptible to poor exploration if the target distribution is highly structured. We then explore alternative principles for distillation, including one with an “instructive” signal, and show that it is possible to achieve qualitatively better results than with KL minimization.
Systematic Generalization: What Is Required and Can It Be Learned?
On the Spectral Bias of Neural Networks
Nasim Rahaman
Felix Draxler
Fred A. Hamprecht
Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with …
Towards Jumpy Planning
Model-free reinforcement learning (RL) is a powerful paradigm for learning complex tasks but suffers from high sample inefficiency as well a… (voir plus)s ignorance of the environment dynamics. On the other hand, a model-based RL agent learns dynamical causal models of the environment and uses them to plan. However, using a model at the scale of time-steps (usually tens of milliseconds) is mostly unfeasible in practice due to compounding prediction errors and computational requirements for making vast numbers of model queries during the planning process. We propose to use a modelbased planner together with a goal-conditioned policy trained with model-free learning. We use a model-based planner that operates at higher levels of abstraction i.e., decision states and use modelfree RL between the decision states. We validate our approach in terms of transfer and generalization performance and show that it leads to improvement over model-based planner that jumps to states that are fixed timesteps ahead.
VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering
Cătălina Cangea
Pietro Lio
Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely o… (voir plus)n its egocentric input to answer a given question. The desired outcome is that the agent learns to combine capabilities such as scene understanding, navigation and language understanding in order to perform complex reasoning in the visual world. However, initial advancements combining standard vision and language methods with imitation and reinforcement learning algorithms have shown EQA might be too complex and challenging for these techniques. In order to investigate the feasibility of EQA-type tasks, we build the VideoNavQA dataset that contains pairs of questions and videos generated in the House3D environment. The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA task. We investigate several models, adapted from popular VQA methods, on this new benchmark. This establishes an initial understanding of how well VQA-style methods can perform within this novel EQA paradigm.