Portrait de Joelle Pineau

Joelle Pineau

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure agrégée, McGill University, École d'informatique
Co-directrice générale, Meta AI (FAIR - Facebook AI Research)
Sujets de recherche
Apprentissage automatique médical
Apprentissage par renforcement
Traitement du langage naturel

Biographie

Joelle Pineau est professeure agrégée et titulaire d’une bourse William Dawson à l'Université McGill, où elle codirige le Laboratoire de raisonnement et d'apprentissage. Elle est membre du corps professoral de Mila – Institut québécois d’intelligence artificielle et titulaire d'une chaire en IA Canada-CIFAR. Elle est également vice-présidente de la recherche en IA chez Meta (anciennement Facebook), où elle dirige l'équipe FAIR (Fundamental AI Research). Elle détient un baccalauréat ès sciences en génie de l'Université de Waterloo et une maîtrise et un doctorat en robotique de l'Université Carnegie Mellon.

Ses recherches sont axées sur le développement de nouveaux modèles et algorithmes pour la planification et l'apprentissage dans des domaines complexes partiellement observables. Elle travaille également sur l'application de ces algorithmes à des problèmes complexes en robotique, dans les soins de santé, dans les jeux et dans les agents conversationnels. Elle est membre du comité de rédaction du Journal of Artificial Intelligence Research et du Journal of Machine Learning Research, et est actuellement présidente de l'International Machine Learning Society. Elle a été lauréate de la bourse commémorative E. W. R. Steacie du Conseil de recherches en sciences naturelles et en génie (CRSNG) 2018 et du Prix du Gouverneur général pour l'innovation 2019. Elle est membre de l'Association pour l'avancement de l'intelligence artificielle (AAAI), membre principal de l'Institut canadien de recherches avancées (CIFAR) et membre de la Société royale du Canada.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Stagiaire de recherche - UdeM

Publications

Rethinking Machine Learning Benchmarks in the Context of Professional Codes of Conduct
Peter Henderson
Jieru Hu
Mona Diab
A novel and efficient machine learning Mendelian randomization estimator applied to predict the safety and efficacy of sclerostin inhibition
Marc-andr'e Legault
Jason Hartford
Benoît J. Arsenault
Y. Archer
Yang
Mendelian Randomization (MR) enables estimation of causal effects while controlling for unmeasured confounding factors. However, traditional… (voir plus) MR's reliance on strong parametric assumptions can introduce bias if these are violated. We introduce a new machine learning MR estimator named Quantile Instrumental Variable (IV) that achieves low estimation error in a wide range of plausible MR scenarios. Quantile IV is distinctive in its ability to estimate nonlinear and heterogeneous causal effects and offers a flexible approach for subgroup analysis. Applying Quantile IV, we investigate the impact of circulating sclerostin levels on heel bone mineral density, osteoporosis, and cardiovascular outcomes in the UK Biobank. Employing various MR estimators and colocalization techniques that allow multiple causal variants, our analysis reveals that a genetically predicted reduction in sclerostin levels significantly increases heel bone mineral density and reduces the risk of osteoporosis, while showing no discernible effect on ischemic cardiovascular diseases. Quantile IV contributes to the advancement of MR methodology, and the case study on the impact of circulating sclerostin modulation contributes to our understanding of the on-target effects of sclerostin inhibition.
Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning
Maxime Wabartha
Learning inherently interpretable policies is a central challenge in the path to developing autonomous agents that humans can trust. Linear … (voir plus)policies can justify their decisions while interacting in a dynamic environment, but their reduced expressivity prevents them from solving hard tasks. Instead, we argue for the use of piecewise-linear policies. We carefully study to what extent they can retain the interpretable properties of linear policies while reaching competitive performance with neural baselines. In particular, we propose the HyperCombinator (HC), a piecewise-linear neural architecture expressing a policy with a controllably small number of sub-policies. Each sub-policy is linear with respect to interpretable features, shedding light on the decision process of the agent without requiring an additional explanation model. We evaluate HC policies in control and navigation experiments, visualize the improved interpretability of the agent and highlight its trade-off with performance. Moreover, we validate that the restricted model class that the HyperCombinator belongs to is compatible with the algorithmic constraints of various reinforcement learning algorithms.
Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning
Maxime Wabartha
Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning
Maxime Wabartha
Learning inherently interpretable policies is a central challenge in the path to developing autonomous agents that humans can trust. We argu… (voir plus)e for the use of policies that are piecewise-linear. We carefully study to what extent they can retain the interpretable properties of linear policies while performing competitively with neural baselines. In particular, we propose the HyperCombinator (HC), a piecewise-linear neural architecture expressing a policy with a controllably small number of sub-policies. Each sub-policy is linear with respect to interpretable features, shedding light on the agent’s decision process without needing an additional explanation model. We evaluate HC policies in control and navigation experiments, visualize the improved interpretability of the agent and highlight its trade-off with performance.
On the Societal Impact of Open Foundation Models
Sayash Kapoor
Rishi Bommasani
Kevin Klyman
Shayne Longpre
Ashwin Ramaswami
Peter Cihon
Aspen Hopkins
Kevin Bankston
Stella Biderman
Miranda Bogen
Rumman Chowdhury
Alex Engler
Peter Henderson
Yacine Jernite
Seth Lazar
Stefano Maffulli
Alondra Nelson
Aviya Skowron
Dawn Song … (voir 5 de plus)
Victor Storchan
Daniel Zhang
Daniel E. Ho
Percy Liang
Arvind Narayanan
Questions Are All You Need to Train a Dense Passage Retriever
Devendra Singh Sachan
Mike Lewis
Dani Yogatama
Luke Zettlemoyer
Manzil Zaheer
We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training da… (voir plus)ta. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g., questions and potential answer passages). It uses a new passage-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence passages, and (2) the passages are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both passage and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.1 Our code and model checkpoints are available at: https://github.com/DevSinghSachan/art.
Group Fairness in Reinforcement Learning
Harsh Satija
Alessandro Lazaric
Matteo Pirotta
We pose and study the problem of satisfying fairness in the online Reinforcement Learning (RL) setting. We focus on the group notions of fai… (voir plus)rness, according to which agents belonging to different groups should have similar performance based on some given measure. We consider the setting of maximizing return in an unknown environment (unknown transition and reward function) and show that it is possible to have RL algorithms that learn the best fair policies without violating the fairness requirements at any point in time during the learning process. In the tabular finite-horizon episodic setting, we provide an algorithm that combines the principle of optimism and pessimism under uncertainty to achieve zero fairness violation with arbitrarily high probability while also maintaining sub-linear regret guarantees. For the high-dimensional Deep-RL setting, we present algorithms based on the performance-difference style approximate policy improvement update step and we report encouraging empirical results on various traditional RL-inspired benchmarks showing that our algorithms display the desired behavior of learning the optimal policy while performing a fair learning process.
Estimating causal effects with optimization-based methods: A review and empirical comparison
Martin Cousineau
Vedat Verter
Susan A. Murphy
Publisher Correction: Advancing ethics review practices in AI research
Madhulika Srikumar
Rebecca Finlay
Grace M. Abuhamad
Carolyn Ashurst
Rosie Campbell
Emily Campbell-Ratcliffe
Hudson Hongo
Sara Rene Jordan
Joseph Lindley
Aviv Ovadya
Improving Passage Retrieval with Zero-Shot Question Generation
Devendra Singh Sachan
Mike Lewis
Mandar Joshi
Armen Aghajanyan
Wen-tau Yih
Luke Zettlemoyer
Low-Rank Representation of Reinforcement Learning Policies
We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional… (voir plus) embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability and convergence guarantees. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly represented in a low-dimensional space while the embedded policy incurs almost no decrease in returns.