Portrait of Doina Precup

Doina Precup

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Research Team Leader, Google DeepMind
Research Topics
Medical Machine Learning
Molecular Modeling
Probabilistic Models
Reasoning
Reinforcement Learning

Biography

Doina Precup combines teaching at McGill University with fundamental research on reinforcement learning, in particular AI applications in areas of significant social impact, such as health care. She is interested in machine decision-making in situations where uncertainty is high.

In addition to heading the Montreal office of Google DeepMind, Precup is a Senior Fellow of the Canadian Institute for Advanced Research and a Fellow of the Association for the Advancement of Artificial Intelligence.

Her areas of speciality are artificial intelligence, machine learning, reinforcement learning, reasoning and planning under uncertainty, and applications.

Current Students

PhD - McGill University
PhD - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Master's Research - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Principal supervisor :
Research Intern - McGill University
Research Intern - McGill University
PhD - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Co-supervisor :
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
Research Intern - McGill University
PhD - McGill University
PhD - McGill University
Master's Research - Université de Montréal
Principal supervisor :
PhD - McGill University
Postdoctorate - McGill University
Master's Research - McGill University
PhD - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
PhD - McGill University
Master's Research - McGill University
Principal supervisor :
Master's Research - McGill University
Collaborating researcher - McGill University
Master's Research - Université de Montréal
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Collaborating Alumni - Université de Montréal
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Master's Research - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :
Research Intern - McGill University
Research Intern - McGill University
Undergraduate - McGill University
PhD - McGill University
PhD - McGill University
Co-supervisor :

Publications

The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation
Kushal Arora
Jason Aaron Edward Weston
Jackie C.K.Cheung
State-of-the-art language generation models can degenerate when applied to open-ended generation problems such as text completion, story gen… (see more)eration, or dialog modeling. This degeneration usually shows up in the form of incoherence, lack of vocabulary diversity, and self-repetition or copying from the context. In this paper, we postulate that ``human-like'' generations usually lie in a narrow and nearly flat entropy band, and violation of these entropy bounds correlates with degenerate behavior. Our experiments show that this stable narrow entropy zone exists across models, tasks, and domains and confirm the hypothesis that violations of this zone correlate with degeneration. We then use this insight to propose an entropy-aware decoding algorithm that respects these entropy bounds resulting in less degenerate, more contextual, and"human-like"language generation in open-ended text generation settings.
Multitask Reinforcement Learning by Optimizing Neural Pathways
Samin Yeasar Arnob
Riyasat Ohib
Amy Zhang
Sergey Plis
Reinforcement learning (RL) algorithms have achieved great success in learning specific tasks, as evidenced by examples such as AlphaGo or f… (see more)usion control. However, it is still difficult for an RL agent to learn how to solve multiple tasks. In this paper, we propose a novel multitask learning framework, in which multiple specialized pathways through a single network are trained simultaneously, with each pathway focusing on a single task. We show that this approach achieves competitive performance with existing multitask RL methods, while using only 5% of the number of neurons per task. We demonstrate empirically the success of our approach on several continuous control tasks, in both online and offline training.
Minimal Value-Equivalent Partial Models for Scalable and Robust Planning in Lifelong Reinforcement Learning
Safa Alver
On the Challenges of using Reinforcement Learning in Precision Drug Dosing: Delay and Prolongedness of Action Effects
Sumana Basu
M. Legault
Drug dosing is an important application of AI, which can be formulated as a Reinforcement Learning (RL) problem. In this paper, we identify … (see more)two major challenges of using RL for drug dosing: delayed and prolonged effects of administering medications, which break the Markov assumption of the RL framework. We focus on prolongedness and define PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process), a subclass of POMDPs in which the Markov assumption does not hold specifically due to prolonged effects of actions. Motivated by the pharmacology literature, we propose a simple and effective approach to converting drug dosing PAE-POMDPs into MDPs, enabling the use of the existing RL algorithms to solve such problems. We validate the proposed approach on a toy task, and a challenging glucose control task, for which we devise a clinically-inspired reward function. Our results demonstrate that: (1) the proposed method to restore the Markov assumption leads to significant improvements over a vanilla baseline; (2) the approach is competitive with recurrent policies which may inherently capture the prolonged affect of actions; (3) it is remarkably more time and memory efficient than the recurrent baseline and hence more suitable for real-time dosing control systems; and (4) it exhibits favourable qualitative behavior in our policy analysis.
Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning
Maziar Gomrokchi
Susan Amin
Hossein Aboutalebi
Alexander Wong
While significant research advances have been made in the field of deep reinforcement learning, there have been no concrete adversarial atta… (see more)ck strategies in literature tailored for studying the vulnerability of deep reinforcement learning algorithms to membership inference attacks. In such attacking systems, the adversary targets the set of collected input data on which the deep reinforcement learning algorithm has been trained. To address this gap, we propose an adversarial attack framework designed for testing the vulnerability of a state-of-the-art deep reinforcement learning algorithm to a membership inference attack. In particular, we design a series of experiments to investigate the impact of temporal correlation, which naturally exists in reinforcement learning training data, on the probability of information leakage. Moreover, we compare the performance of collective and individual membership attacks against the deep reinforcement learning algorithm. Experimental results show that the proposed adversarial attack framework is surprisingly effective at inferring data with an accuracy exceeding 84% in individual and 97% in collective modes in three different continuous control Mujoco tasks, which raises serious privacy concerns in this regard. Finally, we show that the learning state of the reinforcement learning algorithm influences the level of privacy breaches significantly.
Multi-Environment Pretraining Enables Transfer to Action Limited Datasets
David Venuto
Sherry Yang
Pieter Abbeel
Igor Mordatch
Ofir Nachum
Using massive datasets to train large-scale models has emerged as a dominant approach for broad generalization in natural language an… (see more)d vision applications. In reinforcement learning, however, a key challenge is that available data of sequential decision making is often not annotated with actions - for example, videos of game-play are much more available than sequences of frames paired with the logged game controls. We propose to circumvent this challenge by combining large but sparsely-annotated datasets from a \emph{target} environment of interest with fully-annotated datasets from various other \emph{source} environments. Our method, Action Limited PreTraining (ALPT), leverages the generalization capabilities of inverse dynamics modelling (IDM) to label missing action data in the target environment. We show that utilizing even one additional environment dataset of labelled data during IDM pretraining gives rise to substantial improvements in generating action labels for unannotated sequences. We evaluate our method on benchmark game-playing environments and show that we can significantly improve game performance and generalization capability compared to other approaches, even when using annotated datasets equivalent to only 12 minutes of gameplay.
Multi-Environment Pretraining Enables Transfer to Action Limited Datasets
David Venuto
Sherry Yang
Pieter Abbeel
Igor Mordatch
Ofir Nachum
Optimism and Adaptivity in Policy Optimization
Veronica Chelu
Tom Zahavy
Arthur Guez
Sebastian Flennerhag
Temporal Abstraction in Reinforcement Learning with the Successor Representation
Marlos C. Machado
Andre Barreto
Michael Bowling
When Do Graph Neural Networks Help with Node Classification: Investigating the Homophily Principle on Node Distinguishability
Sitao Luan
Chenqing Hua
Minkai Xu
Qincheng Lu
Jiaqi Zhu
Xiao-Wen Chang
Jie Fu
Jure Leskovec
Homophily principle, i.e., nodes with the same labels are more likely to be connected, was believed to be the main reason for the performanc… (see more)e superiority of Graph Neural Networks (GNNs) over Neural Networks (NNs) on Node Classification (NC) tasks. Recently, people have developed theoretical results arguing that, even though the homophily principle is broken, the advantage of GNNs can still hold as long as nodes from the same class share similar neighborhood patterns [29], which questions the validity of homophily. However, this argument only considers intra-class Node Distinguishability (ND) and ignores inter-class ND, which is insufficient to study the effect of homophily. In this paper, we first demonstrate the aforementioned insufficiency with examples and argue that an ideal situation for ND is to have smaller intra-class ND than inter-class ND. To formulate this idea and have a better understanding of homophily, we propose Contextual Stochastic Block Model for Homophily (CSBM-H) and define two metrics, Probabilistic Bayes Error (PBE) and Expected Negative KL-divergence (ENKL), to quantify ND, through which we can also find how intra- and inter-class ND influence ND together. We visualize the results and give detailed analysis. Through experiments, we verified that the superiority of GNNs is
Towards Continual Reinforcement Learning: A Review and Perspectives
Low-Rank Representation of Reinforcement Learning Policies
We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional… (see more) embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability and convergence guarantees. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly represented in a low-dimensional space while the embedded policy incurs almost no decrease in returns.