Portrait of Joelle Pineau

Joelle Pineau

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Co-Manager Director, Meta AI (FAIR - Facebook AI Research)
Research Topics
Medical Machine Learning
Natural Language Processing
Reinforcement Learning

Biography

Joelle Pineau is a professor and William Dawson Scholar at the School of Computer Science, McGill University, where she co-directs the Reasoning and Learning Lab. She is a core academic member of Mila – Quebec Artificial Intelligence Institute, a Canada CIFAR AI Chair, and VP of AI research at Meta (previously Facebook), where she leads the Fundamental AI Research (FAIR) team. Pineau holds a BSc in systems design engineering from the University of Waterloo, and an MSc and PhD in robotics from Carnegie Mellon University.

Her research focuses on developing new models and algorithms for planning and learning in complex partially observable domains. She also works on applying these algorithms to complex problems in robotics, health care, games and conversational agents. In addition to being on the editorial board of the Journal of Machine Learning Research and past president of the International Machine Learning Society, Pineau is the recipient of numerous awards and honours: NSERC’s E.W.R. Steacie Memorial Fellowship (2018), Governor General Innovation Award (2019), Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), Senior Fellow of the Canadian Institute for Advanced Research (CIFAR), and Fellow of the Royal Society of Canada.

Current Students

Research Intern - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
Research Intern - McGill University
Research Intern - Université de Montréal

Publications

Rethinking Machine Learning Benchmarks in the Context of Professional Codes of Conduct
Peter Henderson
Jieru Hu
Mona Diab
A novel and efficient machine learning Mendelian randomization estimator applied to predict the safety and efficacy of sclerostin inhibition
Marc-andr'e Legault
Jason Hartford
Benoît J. Arsenault
Y. Archer
Yang
Mendelian Randomization (MR) enables estimation of causal effects while controlling for unmeasured confounding factors. However, traditional… (see more) MR's reliance on strong parametric assumptions can introduce bias if these are violated. We introduce a new machine learning MR estimator named Quantile Instrumental Variable (IV) that achieves low estimation error in a wide range of plausible MR scenarios. Quantile IV is distinctive in its ability to estimate nonlinear and heterogeneous causal effects and offers a flexible approach for subgroup analysis. Applying Quantile IV, we investigate the impact of circulating sclerostin levels on heel bone mineral density, osteoporosis, and cardiovascular outcomes in the UK Biobank. Employing various MR estimators and colocalization techniques that allow multiple causal variants, our analysis reveals that a genetically predicted reduction in sclerostin levels significantly increases heel bone mineral density and reduces the risk of osteoporosis, while showing no discernible effect on ischemic cardiovascular diseases. Quantile IV contributes to the advancement of MR methodology, and the case study on the impact of circulating sclerostin modulation contributes to our understanding of the on-target effects of sclerostin inhibition.
Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning
Maxime Wabartha
Learning inherently interpretable policies is a central challenge in the path to developing autonomous agents that humans can trust. Linear … (see more)policies can justify their decisions while interacting in a dynamic environment, but their reduced expressivity prevents them from solving hard tasks. Instead, we argue for the use of piecewise-linear policies. We carefully study to what extent they can retain the interpretable properties of linear policies while reaching competitive performance with neural baselines. In particular, we propose the HyperCombinator (HC), a piecewise-linear neural architecture expressing a policy with a controllably small number of sub-policies. Each sub-policy is linear with respect to interpretable features, shedding light on the decision process of the agent without requiring an additional explanation model. We evaluate HC policies in control and navigation experiments, visualize the improved interpretability of the agent and highlight its trade-off with performance. Moreover, we validate that the restricted model class that the HyperCombinator belongs to is compatible with the algorithmic constraints of various reinforcement learning algorithms.
Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning
Maxime Wabartha
Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning
Maxime Wabartha
Learning inherently interpretable policies is a central challenge in the path to developing autonomous agents that humans can trust. We argu… (see more)e for the use of policies that are piecewise-linear. We carefully study to what extent they can retain the interpretable properties of linear policies while performing competitively with neural baselines. In particular, we propose the HyperCombinator (HC), a piecewise-linear neural architecture expressing a policy with a controllably small number of sub-policies. Each sub-policy is linear with respect to interpretable features, shedding light on the agent’s decision process without needing an additional explanation model. We evaluate HC policies in control and navigation experiments, visualize the improved interpretability of the agent and highlight its trade-off with performance.
On the Societal Impact of Open Foundation Models
Sayash Kapoor
Rishi Bommasani
Kevin Klyman
Shayne Longpre
Ashwin Ramaswami
Peter Cihon
Aspen Hopkins
Kevin Bankston
Stella Biderman
Miranda Bogen
Rumman Chowdhury
Alex Engler
Peter Henderson
Yacine Jernite
Seth Lazar
Stefano Maffulli
Alondra Nelson
Aviya Skowron
Dawn Song … (see 5 more)
Victor Storchan
Daniel Zhang
Daniel E. Ho
Percy Liang
Arvind Narayanan
Questions Are All You Need to Train a Dense Passage Retriever
Devendra Singh Sachan
Mike Lewis
Dani Yogatama
Luke Zettlemoyer
Manzil Zaheer
We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training da… (see more)ta. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g., questions and potential answer passages). It uses a new passage-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence passages, and (2) the passages are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both passage and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.1 Our code and model checkpoints are available at: https://github.com/DevSinghSachan/art.
Group Fairness in Reinforcement Learning
Harsh Satija
Alessandro Lazaric
Matteo Pirotta
We pose and study the problem of satisfying fairness in the online Reinforcement Learning (RL) setting. We focus on the group notions of fai… (see more)rness, according to which agents belonging to different groups should have similar performance based on some given measure. We consider the setting of maximizing return in an unknown environment (unknown transition and reward function) and show that it is possible to have RL algorithms that learn the best fair policies without violating the fairness requirements at any point in time during the learning process. In the tabular finite-horizon episodic setting, we provide an algorithm that combines the principle of optimism and pessimism under uncertainty to achieve zero fairness violation with arbitrarily high probability while also maintaining sub-linear regret guarantees. For the high-dimensional Deep-RL setting, we present algorithms based on the performance-difference style approximate policy improvement update step and we report encouraging empirical results on various traditional RL-inspired benchmarks showing that our algorithms display the desired behavior of learning the optimal policy while performing a fair learning process.
Estimating causal effects with optimization-based methods: A review and empirical comparison
Martin Cousineau
Vedat Verter
Susan A. Murphy
Publisher Correction: Advancing ethics review practices in AI research
Madhulika Srikumar
Rebecca Finlay
Grace M. Abuhamad
Carolyn Ashurst
Rosie Campbell
Emily Campbell-Ratcliffe
Hudson Hongo
Sara Rene Jordan
Joseph Lindley
Aviv Ovadya
Improving Passage Retrieval with Zero-Shot Question Generation
Devendra Singh Sachan
Mike Lewis
Mandar Joshi
Armen Aghajanyan
Wen-tau Yih
Luke Zettlemoyer
Low-Rank Representation of Reinforcement Learning Policies
We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional… (see more) embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability and convergence guarantees. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly represented in a low-dimensional space while the embedded policy incurs almost no decrease in returns.