Portrait de Joelle Pineau

Joelle Pineau

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure agrégée, McGill University, École d'informatique
Co-directrice générale, Meta AI (FAIR - Facebook AI Research)
Sujets de recherche
Apprentissage automatique médical
Apprentissage par renforcement
Traitement du langage naturel

Biographie

Joelle Pineau est professeure agrégée et titulaire d’une bourse William Dawson à l'Université McGill, où elle codirige le Laboratoire de raisonnement et d'apprentissage. Elle est membre du corps professoral de Mila – Institut québécois d’intelligence artificielle et titulaire d'une chaire en IA Canada-CIFAR. Elle est également vice-présidente de la recherche en IA chez Meta (anciennement Facebook), où elle dirige l'équipe FAIR (Fundamental AI Research). Elle détient un baccalauréat ès sciences en génie de l'Université de Waterloo et une maîtrise et un doctorat en robotique de l'Université Carnegie Mellon.

Ses recherches sont axées sur le développement de nouveaux modèles et algorithmes pour la planification et l'apprentissage dans des domaines complexes partiellement observables. Elle travaille également sur l'application de ces algorithmes à des problèmes complexes en robotique, dans les soins de santé, dans les jeux et dans les agents conversationnels. Elle est membre du comité de rédaction du Journal of Artificial Intelligence Research et du Journal of Machine Learning Research, et est actuellement présidente de l'International Machine Learning Society. Elle a été lauréate de la bourse commémorative E. W. R. Steacie du Conseil de recherches en sciences naturelles et en génie (CRSNG) 2018 et du Prix du Gouverneur général pour l'innovation 2019. Elle est membre de l'Association pour l'avancement de l'intelligence artificielle (AAAI), membre principal de l'Institut canadien de recherches avancées (CIFAR) et membre de la Société royale du Canada.

Étudiants actuels

Maîtrise recherche - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill

Publications

SPeCiaL: Self-Supervised Pretraining for Continual Learning
Lucas Caccia
A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions
Anthony GX-Chen
Blake A. Richards
Estimating value functions is a core component of reinforcement learning algorithms. Temporal difference (TD) learning algorithms use bootst… (voir plus)rapping, i.e. they update the value function toward a learning target using value estimates at subsequent time-steps. Alternatively, the value function can be updated toward a learning target constructed by separately predicting successor features (SF)--a policy-dependent model--and linearly combining them with instantaneous rewards. We focus on bootstrapping targets used when estimating value functions, and propose a new backup target, the
Block Contextual MDPs for Continual Learning
In reinforcement learning (RL), when defining a Markov Decision Process (MDP), the environment dynamics is implicitly assumed to be stationa… (voir plus)ry. This assumption of stationarity, while simplifying, can be unrealistic in many scenarios. In the continual reinforcement learning scenario, the sequence of tasks is another source of nonstationarity. In this work, we propose to examine this continual reinforcement learning setting through the Block Contextual MDP (BC-MDP) framework, which enables us to relax the assumption of stationarity. This framework challenges RL algorithms to handle both nonstationarity and rich observation settings and, by additionally leveraging smoothness properties, enables us to study generalization bounds for this setting. Finally, we take inspiration from adaptive control to propose a novel algorithm that addresses the challenges introduced by this more realistic BC-MDP setting, allows for zero-shot adaptation at evaluation time, and achieves strong performance on several nonstationary environments.
New Insights on Reducing Abrupt Representation Change in Online Continual Learning
In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. E… (voir plus)xperience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream, and new classes must be distinguished from previous ones. We shed new light on this question by showing that applying ER causes the newly added classes' representations to overlap significantly with the previous classes, leading to highly disruptive parameter updates. Based on this empirical analysis, we propose a new method which mitigates this issue by shielding the learned representations from drastic adaptation to accommodate new classes. We show that using an asymmetric update rule pushes new classes to adapt to the older ones (rather than the reverse), which is more effective especially at task boundaries, where much of the forgetting typically occurs. Empirical results show significant gains over strong baselines on standard continual learning benchmarks
Biomedical Research & Informatics Living Laboratory for Innovative Advances of New Technologies in Community Mobility Rehabilitation: Protocol for a longitudinal evaluation of mobility outcomes (Preprint)
Sara Ahmed
Philippe Archambault
Claudine Auger
Joyce Fung
Eva Kehayia
Anouk Lamontagne
Annette Majnemer
Sylvie Nadeau
Alain Ptito
Bonnie Swaine
Efficient Continual Learning Ensembles in Neural Network Subspaces
Thang Doan
Seyed Iman Mirzadeh
Mehrdad Farajtabar
A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to allev… (voir plus)iate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet effective method to improve continual performance. However, the training and inference cost of ensembles can increase linearly with the number of models. Motivated by this limitation, we leverage the recent advances in the deep learning optimization literature, such as mode connectivity and neural network subspaces, to derive a new method that is both computationally advantageous and can outperform the state-of-the-art continual learning algorithms
Low-Rank Representation of Reinforcement Learning Policies
We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional… (voir plus) embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability and convergence guarantees. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly represented in a low-dimensional space while the embedded policy incurs almost no decrease in returns.
Robust Policy Learning over Multiple Uncertainty Sets
Annie Xie
Chelsea Finn
Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods prov… (voir plus)ide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally designed to achieve robustness to a single uncertainty set that must be specified at train time. Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. We then design an algorithm that enjoys the benefits of both system identification and robust RL: it reduces uncertainty where possible given a few interactions, but can still act robustly with respect to the remaining uncertainty. On a diverse set of control tasks, our approach demonstrates improved worst-case performance on new environments compared to prior methods based on system identification and on robust RL alone.
The Curious Case of Absolute Position Embeddings
Transformer language models encode the notion of word order using positional information. Most commonly, this positional information is repr… (voir plus)esented by absolute position embeddings (APEs), that are learned from the pretraining data. However, in natural language, it is not absolute position that matters, but relative position, and the extent to which APEs can capture this type of information has not been investigated. In this work, we observe that models trained with APE over-rely on positional information to the point that they break-down when subjected to sentences with shifted position information. Specifically, when models are subjected to sentences starting from a non-zero position (excluding the effect of priming), they exhibit noticeably degraded performance on zero to full-shot tasks, across a range of model families and model sizes. Our findings raise questions about the efficacy of APEs to model the relativity of position information, and invite further introspection on the sentence and word order processing strategies employed by these models.
Towards Policy-Guided Conversational Recommendation with Dialogue Acts
Paul Crook
Y-Lan Boureau
J. Weston
Akbar Karimi
Leonardo Rossi
Andrea Prati
Wenqiang Lei
Xiangnan He
Qingyun Yisong Miao
Richang Wu
Min-Yen Hong
Kan Tat-Seng
Raymond Li
Hannes Schulz
Zujie Liang
Huang Hu
Can Xu
Jian Miao
Lizi Liao … (voir 47 de plus)
Ryuichi Takanobu
Yunshan Ma
Xun Yang
Wenchang Ma
Minlie Huang
Minghao Tu
Iulian Serban
Aaron C. Courville
David Silver
Julian Schrittwieser
K. Simonyan
Ioannis Antonoglou
Aja Huang
A. Guez
Hanlin Zhu
O. Vinyals
Igor Babuschkin
M. Mathieu
Max Jaderberg
Wojciech M. Czar-725 necki
A. Dudzik
Petko Georgiev
Richard Powell
T. Ewalds
Dan Horgan
M. Kroiss
Ivo Danihelka
J. Agapiou
Junhyuk Oh
Valentin Dalibard
David Choi
L. Sifre
Yury Sulsky
Sasha Vezhnevets
James Molloy
Trevor Cai
D. Budden
T. Paine
Ziyu Wang
Tobias Pfaff
Tobias Pohlen
Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs
Philip S. Thomas
Romain Laroche
We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting. We consider the … (voir plus)scenario where: (i) we have a dataset collected under a known baseline policy, (ii) multiple reward signals are received from the environment inducing as many objectives to optimize. We present an SPI formulation for this RL setting that takes into account the preferences of the algorithm’s user for handling the trade-offs for different reward signals while ensuring that the new policy performs at least as well as the baseline policy along each individual objective. We build on traditional SPI algorithms and propose a novel method based on Safe Policy Iteration with Baseline Bootstrapping (SPIBB, Laroche et al., 2019) that provides high probability guarantees on the performance of the agent in the true environment. We show the effectiveness of our method on a synthetic grid-world safety task as well as in a real-world critical care context to learn a policy for the administration of IV fluids and vasopressors to treat sepsis.
Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little
Robin Jia
Dieuwke Hupkes
Adina Williams
Douwe Kiela
A possible explanation for the impressive performance of masked language model (MLM) pre-training is that such models have learned to repres… (voir plus)ent the syntactic structures prevalent in classical NLP pipelines. In this paper, we propose a different explanation: MLMs succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics. To demonstrate this, we pre-train MLMs on sentences with randomly shuffled word order, and show that these models still achieve high accuracy after fine-tuning on many downstream tasks—including tasks specifically designed to be challenging for models that ignore word order. Our models perform surprisingly well according to some parametric syntactic probes, indicating possible deficiencies in how we test representations for syntactic information. Overall, our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.