Portrait of Joelle Pineau

Joelle Pineau

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Co-Manager Director, Meta AI (FAIR - Facebook AI Research)
Research Topics
Medical Machine Learning
Natural Language Processing
Reinforcement Learning

Biography

Joelle Pineau is a professor and William Dawson Scholar at the School of Computer Science, McGill University, where she co-directs the Reasoning and Learning Lab. She is a core academic member of Mila – Quebec Artificial Intelligence Institute, a Canada CIFAR AI Chair, and VP of AI research at Meta (previously Facebook), where she leads the Fundamental AI Research (FAIR) team. Pineau holds a BSc in systems design engineering from the University of Waterloo, and an MSc and PhD in robotics from Carnegie Mellon University.

Her research focuses on developing new models and algorithms for planning and learning in complex partially observable domains. She also works on applying these algorithms to complex problems in robotics, health care, games and conversational agents. In addition to being on the editorial board of the Journal of Machine Learning Research and past president of the International Machine Learning Society, Pineau is the recipient of numerous awards and honours: NSERC’s E.W.R. Steacie Memorial Fellowship (2018), Governor General Innovation Award (2019), Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), Senior Fellow of the Canadian Institute for Advanced Research (CIFAR), and Fellow of the Royal Society of Canada.

Current Students

Master's Research - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University

Publications

A Message from AI Research Leaders: Join Us in Supporting OpenReview
Andrew Y. Ng
Ruslan Salakhutdinov
Fernando Pereira
Advancing science- and evidence-based AI policy.
Rishi Bommasani
Sanjeev Arora
Jennifer Chayes
Yejin Choi
Mariano-Florentino Cuéllar
Li Fei-Fei
Daniel E. Ho
Dan Jurafsky
Sanmi Koyejo
Hima Lakkaraju
Arvind Narayanan
Alondra Nelson
Emma Pierson
Scott Singer
Suresh Venkatasubramanian
Ion Stoica
Percy Liang
Dawn Song
Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation
Deploying reinforcement learning (RL) policies in real-world involves significant challenges, including distribution shifts, safety concerns… (see more), and the impracticality of direct interactions during policy refinement. Existing methods, such as domain randomization (DR) and off-dynamics RL, enhance policy robustness by direct interaction with the target domain, an inherently unsafe practice. We propose Uncertainty-Aware RL (UARL), a novel framework that prioritizes safety during training by addressing Out-Of-Distribution (OOD) detection and policy adaptation without requiring direct interactions in target domain. UARL employs an ensemble of critics to quantify policy uncertainty and incorporates progressive environmental randomization to prepare the policy for diverse real-world conditions. By iteratively refining over high-uncertainty regions of the state space in simulated environments, UARL enhances robust generalization to the target domain without explicitly training on it. We evaluate UARL on MuJoCo benchmarks and a quadrupedal robot, demonstrating its effectiveness in reliable OOD detection, improved performance, and enhanced sample efficiency compared to baselines.
A flexible machine learning Mendelian randomization estimator applied to predict the safety and efficacy of sclerostin inhibition
Jason Hartford
Benoit J. Arsenault
Archer Y. Yang
Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning
Rethinking Machine Learning Benchmarks in the Context of Professional Codes of Conduct
Jieru Hu
Mona Diab
Group Fairness in Reinforcement Learning
Alessandro Lazaric
Matteo Pirotta
We pose and study the problem of satisfying fairness in the online Reinforcement Learning (RL) setting. We focus on the group notions of fai… (see more)rness, according to which agents belonging to different groups should have similar performance based on some given measure. We consider the setting of maximizing return in an unknown environment (unknown transition and reward function) and show that it is possible to have RL algorithms that learn the best fair policies without violating the fairness requirements at any point in time during the learning process. In the tabular finite-horizon episodic setting, we provide an algorithm that combines the principle of optimism and pessimism under uncertainty to achieve zero fairness violation with arbitrarily high probability while also maintaining sub-linear regret guarantees. For the high-dimensional Deep-RL setting, we present algorithms based on the performance-difference style approximate policy improvement update step and we report encouraging empirical results on various traditional RL-inspired benchmarks showing that our algorithms display the desired behavior of learning the optimal policy while performing a fair learning process.
Estimating causal effects with optimization-based methods: A review and empirical comparison
Martin Cousineau
Vedat Verter
Susan A. Murphy
Publisher Correction: Advancing ethics review practices in AI research
Madhulika Srikumar
Rebecca Finlay
Grace M. Abuhamad
Carolyn Ashurst
Rosie Campbell
Emily Campbell-Ratcliffe
Hudson Hongo
Sara Rene Jordan
Joseph Lindley
Aviv Ovadya
Questions Are All You Need to Train a Dense Passage Retriever
Devendra Singh Sachan
Mike Lewis
Dani Yogatama
Luke Zettlemoyer
Manzil Zaheer
We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training da… (see more)ta. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g. questions and potential answer documents). It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both document and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.
Advancing ethics review practices in AI research
Madhulika Srikumar
Rebecca Finlay
Grace M. Abuhamad
Carolyn Ashurst
Rosie Campbell
Emily Campbell-Ratcliffe
Hudson Hongo
Sara Rene Jordan
Joseph Lindley
Aviv Ovadya
Improving Passage Retrieval with Zero-Shot Question Generation
Devendra Singh Sachan
Mike Lewis
Mandar Joshi
Armen Aghajanyan
Wen-tau Yih
Luke Zettlemoyer
We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retr… (see more)ieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of any retrieval method (e.g. neural or keyword-based), does not require any domain- or task-specific training (and therefore is expected to generalize better to data distribution shifts), and provides rich cross-attention between query and passage (i.e. it must explain every token in the question). When evaluated on a number of open-domain retrieval datasets, our re-ranker improves strong unsupervised retrieval models by 6%-18% absolute and strong supervised models by up to 12% in terms of top-20 passage retrieval accuracy. We also obtain new state-of-the-art results on full open-domain question answering by simply adding the new re-ranker to existing models with no further changes.