Portrait de Joelle Pineau

Joelle Pineau

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure agrégée, McGill University, École d'informatique
Co-directrice générale, Meta AI (FAIR - Facebook AI Research)
Sujets de recherche
Apprentissage automatique médical
Apprentissage par renforcement
Traitement du langage naturel

Biographie

Joelle Pineau est professeure agrégée et titulaire d’une bourse William Dawson à l'Université McGill, où elle codirige le Laboratoire de raisonnement et d'apprentissage. Elle est membre du corps professoral de Mila – Institut québécois d’intelligence artificielle et titulaire d'une chaire en IA Canada-CIFAR. Elle est également vice-présidente de la recherche en IA chez Meta (anciennement Facebook), où elle dirige l'équipe FAIR (Fundamental AI Research). Elle détient un baccalauréat ès sciences en génie de l'Université de Waterloo et une maîtrise et un doctorat en robotique de l'Université Carnegie Mellon.

Ses recherches sont axées sur le développement de nouveaux modèles et algorithmes pour la planification et l'apprentissage dans des domaines complexes partiellement observables. Elle travaille également sur l'application de ces algorithmes à des problèmes complexes en robotique, dans les soins de santé, dans les jeux et dans les agents conversationnels. Elle est membre du comité de rédaction du Journal of Artificial Intelligence Research et du Journal of Machine Learning Research, et est actuellement présidente de l'International Machine Learning Society. Elle a été lauréate de la bourse commémorative E. W. R. Steacie du Conseil de recherches en sciences naturelles et en génie (CRSNG) 2018 et du Prix du Gouverneur général pour l'innovation 2019. Elle est membre de l'Association pour l'avancement de l'intelligence artificielle (AAAI), membre principal de l'Institut canadien de recherches avancées (CIFAR) et membre de la Société royale du Canada.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Stagiaire de recherche - UdeM
Co-superviseur⋅e :

Publications

Learning Robust State Abstractions for Hidden-Parameter Block MDPs
Amy Zhang
Shagun Sodhani
A Simple and Effective Model for Multi-Hop Question Generation
Jimmy Lei Ba
Jamie Ryan Kiros
Geoffrey E Hin-602
Peter W. Battaglia
Jessica Blake
Chandler Hamrick
Vic-613 tor Bapst
Alvaro Sanchez
Vinicius Zambaldi
M. Malinowski
Andrea Tacchetti
David Raposo
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam … (voir 72 de plus)
Girish Sastry
Koustuv Sinha
Shagun Sodhani
Jin Dong
William L. Hamilton
Clutrr
Nitish Srivastava
Geoffrey Hinton
Alex Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov. 2014
Gabriel Stanovsky
Julian Michael
Luke Zettlemoyer
Dan Su
Yan Xu
Wenliang Dai
Ziwei Ji
Tiezheng Yu
Minghao Tu
Kevin Huang
Guangtao Wang
Jing Huang
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin. 2017
Attention
Petar Veliˇckovi´c
Guillem Cucurull
Arantxa Casanova
Pietro Lio’
Johannes Welbl
Pontus Stenetorp
Yonghui Wu
Mike Schuster
Quoc Zhifeng Chen
Mohammad Le
Wolfgang Norouzi
Macherey
M. Krikun
Yuan Cao
Qin Gao
William W. Cohen
Jianxing Yu
Xiaojun Quan
Qinliang Su
Jian Yin
Yuyu Zhang
Hanjun Dai
Zornitsa Kozareva
Chen Zhao
Chenyan Xiong
Corby Rosset
Xia
Paul Song
Bennett Saurabh
Tiwary
Yao Zhao
Xiaochuan Ni
Yuanyuan Ding
Qingyu Zhou
Nan Yang
Furu Wei
Chuanqi Tan
Previous research on automated question gen-001 eration has almost exclusively focused on gen-002 erating factoid questions whose answers ca… (voir plus)n 003 be extracted from a single document. How-004 ever, there is an increasing interest in develop-005 ing systems that are capable of more complex 006 multi-hop question generation (QG), where an-007 swering the question requires reasoning over 008 multiple documents. In this work, we pro-009 pose a simple and effective approach based on 010 the transformer model for multi-hop QG. Our 011 approach consists of specialized input repre-012 sentations, a supporting sentence classification 013 objective, and training data weighting. Prior 014 work on multi-hop QG considers the simpli-015 fied setting of shorter documents and also ad-016 vocates the use of entity-based graph struc-017 tures as essential ingredients in model design. 018 On the contrary, we showcase that our model 019 can scale to the challenging setting of longer 020 documents as input, does not rely on graph 021 structures, and substantially outperforms the 022 state-of-the-art approaches as measured by au-023 tomated metrics and human evaluation. 024
Interference and Generalization in Temporal Difference Learning
We study the link between generalization and interference in temporal-difference (TD) learning. Interference is defined as the inner product… (voir plus) of two different gradients, representing their alignment. This quantity emerges as being of interest from a variety of observations about neural networks, parameter sharing and the dynamics of learning. We find that TD easily leads to low-interference, under-generalizing parameters, while the effect seems reversed in supervised learning. We hypothesize that the cause can be traced back to the interplay between the dynamics of interference and bootstrapping. This is supported empirically by several observations: the negative relationship between the generalization gap and interference in TD, the negative effect of bootstrapping on interference and the local coherence of targets, and the contrast between the propagation rate of information in TD(0) versus TD(
Invariant Causal Prediction for Block MDPs
Amy Zhang
Clare Lyle
Shagun Sodhani
Angelos Filos
Marta Z. Kwiatkowska
Yarin Gal
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. … (voir plus)In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.
The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach
Iulian V. Serban
Chinnadhurai Sankar
Michael Pieper
Deep reinforcement learning has recently shown many impressive successes. However, one major obstacle towards applying such methods to real-… (voir plus)world problems is their lack of data-efficiency. To this end, we propose the Bottleneck Simulator: a model-based reinforcement learning method which combines a learned, factorized transition model of the environment with rollout simulations to learn an effective policy from few examples. The learned transition model employs an abstract, discrete (bottleneck) state, which increases sample efficiency by reducing the number of model parameters and by exploiting structural properties of the environment. We provide a mathematical analysis of the Bottleneck Simulator in terms of fixed points of the learned policy, which reveals how performance is affected by four distinct sources of error: an error related to the abstract space structure, an error related to the transition model estimation variance, an error related to the transition model estimation bias, and an error related to the transition model class bias. Finally, we evaluate the Bottleneck Simulator on two natural language processing tasks: a text adventure game and a real-world, complex dialogue response selection task. On both tasks, the Bottleneck Simulator yields excellent performance beating competing approaches.
Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP
Amy Zhang
Shagun Sodhani
Multi-task reinforcement learning is a rich paradigm where information from previously seen environments can be leveraged for better perform… (voir plus)ance and improved sample-efficiency in new environments. In this work, we leverage ideas of common structure underlying a family of Markov decision processes (MDPs) to improve performance in the few-shot regime. We use assumptions of structure from Hidden-Parameter MDPs and Block MDPs to propose a new framework, HiP-BMDP, and approach for learning a common representation and universal dynamics model. To this end, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work. To demonstrate the efficacy of the proposed method, we empirically compare and show improvements against other multi-task and meta-reinforcement learning baselines.
Deep interpretability for GWAS
Deepak Sharma
Louis-philippe Lemieux Perreault
Audrey Lemaccon
Marie-Pierre Dub'e
Genome-Wide Association Studies are typically conducted using linear models to find genetic variants associated with common diseases. In the… (voir plus)se studies, association testing is done on a variant-by-variant basis, possibly missing out on non-linear interaction effects between variants. Deep networks can be used to model these interactions, but they are difficult to train and interpret on large genetic datasets. We propose a method that uses the gradient based deep interpretability technique named DeepLIFT to show that known diabetes genetic risk factors can be identified using deep models along with possibly novel associations.
Handling Black Swan Events in Deep Learning with Diversely Extrapolated Neural Networks
Maxime Wabartha
Vincent Francois-Lavet
By virtue of their expressive power, neural networks (NNs) are well suited to fitting large, complex datasets, yet they are also known to … (voir plus)produce similar predictions for points outside the training distribution. As such, they are, like humans, under the influence of the Black Swan theory: models tend to be extremely "surprised" by rare events, leading to potentially disastrous consequences, while justifying these same events in hindsight. To avoid this pitfall, we introduce DENN, an ensemble approach building a set of Diversely Extrapolated Neural Networks that fits the training data and is able to generalize more diversely when extrapolating to novel data points. This leads DENN to output highly uncertain predictions for unexpected inputs. We achieve this by adding a diversity term in the loss function used to train the model, computed at specific inputs. We first illustrate the usefulness of the method on a low-dimensional regression problem. Then, we show how the loss can be adapted to tackle anomaly detection during classification, as well as safe imitation learning problems.
On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability (Extended Abstract)
Vincent Francois-Lavet
Damien Ernst
Raphael Fonteneau
When an agent has limited information on its environment, the suboptimality of an RL algorithm can be decomposed into the sum of two terms: … (voir plus)a term related to an asymptotic bias (suboptimality with unlimited data) and a term due to overfitting (additional suboptimality due to limited data). In the context of reinforcement learning with partial observability, this paper provides an analysis of the tradeoff between these two error sources. In particular, our theoretical analysis formally characterizes how a smaller state representation increases the asymptotic bias while decreasing the risk of overfitting.
A Large-Scale, Open-Domain, Mixed-Interface Dialogue-Based ITS for STEM
Iulian V. Serban
Varun Gupta
Ekaterina Kochmar
Dung D. Vu
Robert Belfer
Leveraging exploration in off-policy algorithms via normalizing flows
Exploration is a crucial component for discovering approximately optimal policies in most high-dimensional reinforcement learning (RL) setti… (voir plus)ngs with sparse rewards. Approaches such as neural density models and continuous exploration (e.g., Go-Explore) have been instrumental in recent advances. Soft actor-critic (SAC) is a method for improving exploration that aims to combine off-policy updates while maximizing the policy entropy. We extend SAC to a richer class of probability distributions through normalizing flows, which we show improves performance in exploration, sample complexity, and convergence. Finally, we show that not only the normalizing flow policy outperforms SAC on MuJoCo domains, it is also significantly lighter, using as low as 5.6% of the original network's parameters for similar performance.
Literature Mining for Incorporating Inductive Bias in Biomedical Prediction Tasks (Student Abstract)