Portrait de Joelle Pineau

Joelle Pineau

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure agrégée, McGill University, École d'informatique
Co-directrice générale, Meta AI (FAIR - Facebook AI Research)
Sujets de recherche
Apprentissage automatique médical
Apprentissage par renforcement
Traitement du langage naturel

Biographie

Joelle Pineau est professeure agrégée et titulaire d’une bourse William Dawson à l'Université McGill, où elle codirige le Laboratoire de raisonnement et d'apprentissage. Elle est membre du corps professoral de Mila – Institut québécois d’intelligence artificielle et titulaire d'une chaire en IA Canada-CIFAR. Elle est également vice-présidente de la recherche en IA chez Meta (anciennement Facebook), où elle dirige l'équipe FAIR (Fundamental AI Research). Elle détient un baccalauréat ès sciences en génie de l'Université de Waterloo et une maîtrise et un doctorat en robotique de l'Université Carnegie Mellon.

Ses recherches sont axées sur le développement de nouveaux modèles et algorithmes pour la planification et l'apprentissage dans des domaines complexes partiellement observables. Elle travaille également sur l'application de ces algorithmes à des problèmes complexes en robotique, dans les soins de santé, dans les jeux et dans les agents conversationnels. Elle est membre du comité de rédaction du Journal of Artificial Intelligence Research et du Journal of Machine Learning Research, et est actuellement présidente de l'International Machine Learning Society. Elle a été lauréate de la bourse commémorative E. W. R. Steacie du Conseil de recherches en sciences naturelles et en génie (CRSNG) 2018 et du Prix du Gouverneur général pour l'innovation 2019. Elle est membre de l'Association pour l'avancement de l'intelligence artificielle (AAAI), membre principal de l'Institut canadien de recherches avancées (CIFAR) et membre de la Société royale du Canada.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill

Publications

Correcting Momentum in Temporal Difference Learning
A common optimization tool used in deep reinforcement learning is momentum, which consists in accumulating and discounting past gradients, r… (voir plus)eapplying them at each iteration. We argue that, unlike in supervised learning, momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale: not only does the gradient of the loss change due to parameter updates, the loss itself changes due to bootstrapping. We first show that this phenomenon exists, and then propose a first-order correction term to momentum. We show that this correction term improves sample efficiency in policy evaluation by correcting target value drift. An important insight of this work is that deep RL methods are not always best served by directly importing techniques from the supervised setting.
Model-Invariant State Abstractions for Model-Based Reinforcement Learning
Manan Tomar
Amy Zhang
Roberto Calandra
Matthew E. Taylor
Accuracy and generalization of dynamics models is key to the success of model-based reinforcement learning (MBRL). As the complexity of task… (voir plus)s increases, so does the sample inefficiency of learning accurate dynamics models. However, many complex tasks also exhibit sparsity in the dynamics, i.e., actions have only a local effect on the system dynamics. In this paper, we exploit this property with a causal invariance perspective in the single-task setting, introducing a new type of state abstraction called \textit{model-invariance}. Unlike previous forms of state abstractions, a model-invariance state abstraction leverages causal sparsity over state variables. This allows for compositional generalization to unseen states, something that non-factored forms of state abstractions cannot do. We prove that an optimal policy can be learned over this model-invariance state abstraction and show improved generalization in a simple toy domain. Next, we propose a practical method to approximately learn a model-invariant representation for complex domains and validate our approach by showing improved modelling performance over standard maximum likelihood approaches on challenging tasks, such as the MuJoCo-based Humanoid. Finally, within the MBRL setting we show strong performance gains with respect to sample efficiency across a host of other continuous control tasks.
Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)
Philippe Vincent‐lamarre
Koustuv Sinha
Vincent Larivière
Alina Beygelzimer
Florence D'alche-buc
E. Fox
Learning Robust State Abstractions for Hidden-Parameter Block MDPs
Amy Zhang
Shagun Sodhani
A Simple and Effective Model for Multi-Hop Question Generation
Jimmy Lei Ba
Jamie Ryan Kiros
Geoffrey E Hin-602
Peter W. Battaglia
Jessica Blake
Chandler Hamrick
Vic-613 tor Bapst
Alvaro Sanchez
Vinicius Zambaldi
M. Malinowski
Andrea Tacchetti
David Raposo
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam … (voir 72 de plus)
Girish Sastry
Koustuv Sinha
Shagun Sodhani
Jin Dong
William L. Hamilton
Clutrr
Nitish Srivastava
Geoffrey Hinton
Alex Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov. 2014
Gabriel Stanovsky
Julian Michael
Luke Zettlemoyer
Dan Su
Yan Xu
Wenliang Dai
Ziwei Ji
Tiezheng Yu
Minghao Tu
Kevin Huang
Guangtao Wang
Jing Huang
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin. 2017
Attention
Petar Veliˇckovi´c
Guillem Cucurull
Arantxa Casanova
Pietro Lio’
Johannes Welbl
Pontus Stenetorp
Yonghui Wu
Mike Schuster
Quoc Zhifeng Chen
Mohammad Le
Wolfgang Norouzi
Macherey
M. Krikun
Yuan Cao
Qin Gao
William W. Cohen
Jianxing Yu
Xiaojun Quan
Qinliang Su
Jian Yin
Yuyu Zhang
Hanjun Dai
Zornitsa Kozareva
Cheng Zhao
Chenyan Xiong
Corby Rosset
Xia
Paul Song
Bennett Saurabh
Tiwary
Yao Zhao
Xiaochuan Ni
Yuanyuan Ding
Qingyu Zhou
Nan Yang
Furu Wei
Chuanqi Tan
Previous research on automated question gen-001 eration has almost exclusively focused on gen-002 erating factoid questions whose answers ca… (voir plus)n 003 be extracted from a single document. How-004 ever, there is an increasing interest in develop-005 ing systems that are capable of more complex 006 multi-hop question generation (QG), where an-007 swering the question requires reasoning over 008 multiple documents. In this work, we pro-009 pose a simple and effective approach based on 010 the transformer model for multi-hop QG. Our 011 approach consists of specialized input repre-012 sentations, a supporting sentence classification 013 objective, and training data weighting. Prior 014 work on multi-hop QG considers the simpli-015 fied setting of shorter documents and also ad-016 vocates the use of entity-based graph struc-017 tures as essential ingredients in model design. 018 On the contrary, we showcase that our model 019 can scale to the challenging setting of longer 020 documents as input, does not rely on graph 021 structures, and substantially outperforms the 022 state-of-the-art approaches as measured by au-023 tomated metrics and human evaluation. 024
Intervention Design for Effective Sim2Real Transfer
Melissa Mozian
Amy Zhang
The goal of this work is to address the recent success of domain randomization and data augmentation for the sim2real setting. We explain th… (voir plus)is success through the lens of causal inference, positioning domain randomization and data augmentation as interventions on the environment which encourage invariance to irrelevant features. Such interventions include visual perturbations that have no effect on reward and dynamics. This encourages the learning algorithm to be robust to these types of variations and learn to attend to the true causal mechanisms for solving the task. This connection leads to two key findings: (1) perturbations to the environment do not have to be realistic, but merely show variation along dimensions that also vary in the real world, and (2) use of an explicit invariance-inducing objective improves generalization in sim2sim and sim2real transfer settings over just data augmentation or domain randomization alone. We demonstrate the capability of our method by performing zero-shot transfer of a robot arm reach task on a 7DoF Jaco arm learning from pixel observations.
Interference and Generalization in Temporal Difference Learning
We study the link between generalization and interference in temporal-difference (TD) learning. Interference is defined as the inner product… (voir plus) of two different gradients, representing their alignment. This quantity emerges as being of interest from a variety of observations about neural networks, parameter sharing and the dynamics of learning. We find that TD easily leads to low-interference, under-generalizing parameters, while the effect seems reversed in supervised learning. We hypothesize that the cause can be traced back to the interplay between the dynamics of interference and bootstrapping. This is supported empirically by several observations: the negative relationship between the generalization gap and interference in TD, the negative effect of bootstrapping on interference and the local coherence of targets, and the contrast between the propagation rate of information in TD(0) versus TD(
Invariant Causal Prediction for Block MDPs
Amy Zhang
Clare Lyle
Shagun Sodhani
Angelos Filos
Marta Z. Kwiatkowska
Yarin Gal
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. … (voir plus)In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.
The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach
Iulian V. Serban
Chinnadhurai Sankar
Michael Pieper
Deep reinforcement learning has recently shown many impressive successes. However, one major obstacle towards applying such methods to real-… (voir plus)world problems is their lack of data-efficiency. To this end, we propose the Bottleneck Simulator: a model-based reinforcement learning method which combines a learned, factorized transition model of the environment with rollout simulations to learn an effective policy from few examples. The learned transition model employs an abstract, discrete (bottleneck) state, which increases sample efficiency by reducing the number of model parameters and by exploiting structural properties of the environment. We provide a mathematical analysis of the Bottleneck Simulator in terms of fixed points of the learned policy, which reveals how performance is affected by four distinct sources of error: an error related to the abstract space structure, an error related to the transition model estimation variance, an error related to the transition model estimation bias, and an error related to the transition model class bias. Finally, we evaluate the Bottleneck Simulator on two natural language processing tasks: a text adventure game and a real-world, complex dialogue response selection task. On both tasks, the Bottleneck Simulator yields excellent performance beating competing approaches.
Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP
Amy Zhang
Shagun Sodhani
Multi-task reinforcement learning is a rich paradigm where information from previously seen environments can be leveraged for better perform… (voir plus)ance and improved sample-efficiency in new environments. In this work, we leverage ideas of common structure underlying a family of Markov decision processes (MDPs) to improve performance in the few-shot regime. We use assumptions of structure from Hidden-Parameter MDPs and Block MDPs to propose a new framework, HiP-BMDP, and approach for learning a common representation and universal dynamics model. To this end, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work. To demonstrate the efficacy of the proposed method, we empirically compare and show improvements against other multi-task and meta-reinforcement learning baselines.
Deep interpretability for GWAS
Deepak Sharma
Louis-philippe Lemieux Perreault
Audrey Lemaccon
Marie-Pierre Dub'e
Genome-Wide Association Studies are typically conducted using linear models to find genetic variants associated with common diseases. In the… (voir plus)se studies, association testing is done on a variant-by-variant basis, possibly missing out on non-linear interaction effects between variants. Deep networks can be used to model these interactions, but they are difficult to train and interpret on large genetic datasets. We propose a method that uses the gradient based deep interpretability technique named DeepLIFT to show that known diabetes genetic risk factors can be identified using deep models along with possibly novel associations.
Handling Black Swan Events in Deep Learning with Diversely Extrapolated Neural Networks
By virtue of their expressive power, neural networks (NNs) are well suited to fitting large, complex datasets, yet they are also known to … (voir plus)produce similar predictions for points outside the training distribution. As such, they are, like humans, under the influence of the Black Swan theory: models tend to be extremely "surprised" by rare events, leading to potentially disastrous consequences, while justifying these same events in hindsight. To avoid this pitfall, we introduce DENN, an ensemble approach building a set of Diversely Extrapolated Neural Networks that fits the training data and is able to generalize more diversely when extrapolating to novel data points. This leads DENN to output highly uncertain predictions for unexpected inputs. We achieve this by adding a diversity term in the loss function used to train the model, computed at specific inputs. We first illustrate the usefulness of the method on a low-dimensional regression problem. Then, we show how the loss can be adapted to tackle anomaly detection during classification, as well as safe imitation learning problems.