Portrait of Doina Precup

Doina Precup

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Research Team Leader, Google DeepMind
Research Topics
Medical Machine Learning
Molecular Modeling
Probabilistic Models
Reasoning
Reinforcement Learning

Biography

Doina Precup combines teaching at McGill University with fundamental research on reinforcement learning, in particular AI applications in areas of significant social impact, such as health care. She is interested in machine decision-making in situations where uncertainty is high.

In addition to heading the Montreal office of Google DeepMind, Precup is a Senior Fellow of the Canadian Institute for Advanced Research and a Fellow of the Association for the Advancement of Artificial Intelligence.

Her areas of speciality are artificial intelligence, machine learning, reinforcement learning, reasoning and planning under uncertainty, and applications.

Current Students

PhD - McGill University
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Principal supervisor :
Collaborating researcher - McGill University
Research Intern - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
PhD - McGill University
Master's Research - McGill University
PhD - McGill University
PhD - McGill University
Postdoctorate - McGill University
Master's Research - McGill University
Collaborating Alumni - McGill University
Undergraduate - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
PhD - McGill University
Master's Research - McGill University
Principal supervisor :
Collaborating researcher - McGill University
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
PhD - McGill University
Co-supervisor :
Research Intern - McGill University
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :

Publications

Understanding Decision-Time vs. Background Planning in Model-Based Reinforcement Learning
Safa Alver
In model-based reinforcement learning, an agent can leverage a learned model to improve its way of behaving in different ways. Two prevalent… (see more) approaches are decision-time planning and background planning. In this study, we are interested in understanding under what conditions and in which settings one of these two planning styles will perform better than the other in domains that require fast responses. After viewing them through the lens of dynamic programming, we first consider the classical instantiations of these planning styles and provide theoretical results and hypotheses on which one will perform better in the pure planning, planning&learning, and transfer learning settings. We then consider the modern instantiations of these planning styles and provide hypotheses on which one will perform better in the last two of the considered settings. Lastly, we perform several illustrative experiments to empirically validate both our theoretical results and hypotheses. Overall, our findings suggest that even though decision-time planning does not perform as well as background planning in their classical instantiations, in their modern instantiations, it can perform on par or better than background planning in both the planning&learning and transfer learning settings.
Deep Learning Prediction of Response to Disease Modifying Therapy in Primary Progressive Multiple Sclerosis (P1-1.Virtual)
Jean-Pierre R. Falet
Joshua D. Durso-Finley
Brennan Nichyporuk
Julien Schroeter
Francesca Bovis
Maria-Pia Sormani
Douglas Arnold
Don't Freeze Your Embedding: Lessons from Policy Finetuning in Environment Transfer
Victoria Dean
Daniel Toyama
A common occurrence in reinforcement learning (RL) research is making use of a pretrained vision stack that converts image observations to l… (see more)atent vectors. Using a visual embedding in this way leaves open questions, though: should the vision stack be updated with the policy? In this work, we evaluate the effectiveness of such decisions in RL transfer settings. We introduce policy update formulations for use after pretraining in a different environment and analyze the performance of such formulations. Through this evaluation, we also detail emergent metrics of benchmark suites and present results on Atari and AndroidEnv.
Learning how to Interact with a Complex Interface using Hierarchical Reinforcement Learning
Gheorghe Comanici
Amelia Glaese
Anita Gergely
Daniel Toyama
Zafarali Ahmed
Tyler Jackson
Philippe Hamel
Hierarchical Reinforcement Learning (HRL) allows interactive agents to decompose complex problems into a hierarchy of sub-tasks. Higher-leve… (see more)l tasks can invoke the solutions of lower-level tasks as if they were primitive actions. In this work, we study the utility of hierarchical decompositions for learning an appropriate way to interact with a complex interface. Specifically, we train HRL agents that can interface with applications in a simulated Android device. We introduce a Hierarchical Distributed Deep Reinforcement Learning architecture that learns (1) subtasks corresponding to simple finger gestures, and (2) how to combine these gestures to solve several Android tasks. Our approach relies on goal conditioning and can be used more generally to convert any base RL agent into an HRL agent. We use the AndroidEnv environment to evaluate our approach. For the experiments, the HRL agent uses a distributed version of the popular DQN algorithm to train different components of the hierarchy. While the native action space is completely intractable for simple DQN agents, our architecture can be used to establish an effective way to interact with different tasks, significantly improving the performance of the same DQN agent over different levels of abstraction.
Selective Credit Assignment
Veronica Chelu
Diana Borsa
Hado Philip van Hasselt
Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers
Amir Ardalan Kalantari
Mohammad Saeed Amini
Much of recent Deep Reinforcement Learning success is owed to the neural architecture's potential to learn and use effective internal repres… (see more)entations of the world. While many current algorithms access a simulator to train with a large amount of data, in realistic settings, including while playing games that may be played against people, collecting experience can be quite costly. In this paper, we introduce a deep reinforcement learning architecture whose purpose is to increase sample efficiency without sacrificing performance. We design this architecture by incorporating advances achieved in recent years in the field of Natural Language Processing and Computer Vision. Specifically, we propose a visually attentive model that uses transformers to learn a self-attention mechanism on the feature maps of the state representation, while simultaneously optimizing return. We demonstrate empirically that this architecture improves sample complexity for several Atari environments, while also achieving better performance in some of the games.
Constructing a Good Behavior Basis for Transfer using Generalized Policy Updates
Safa Alver
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforceme… (see more)nt learning tasks with no or very little new data. Specifically, we consider the framework of generalized policy evaluation and improvement, in which the rewards for all tasks of interest are assumed to be expressible as a linear combination of a fixed set of features. We show theoretically that, under certain assumptions, having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance on all possible downstream tasks which are typically more complex than the ones on which the agent was trained. Based on this theoretical analysis, we propose a simple algorithm that iteratively constructs this set of policies. In addition to empirically validating our theoretical results, we compare our approach with recently proposed diverse policy set construction methods and show that, while others fail, our approach is able to build a behavior basis that enables instantaneous transfer to all possible downstream tasks. We also show empirically that having access to a set of independent policies can better bootstrap the learning process on downstream tasks where the new reward function cannot be described as a linear combination of the features. Finally, we demonstrate how this policy set can be useful in a lifelong reinforcement learning setting.
COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation
Jongmin Lee
Cosmin Paduraru
Daniel J Mankowitz
Nicolas Heess
Kee-Eung Kim
Arthur Guez
We consider the offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected… (see more) return while satisfying given cost constraints, learning only from a pre-collected dataset. This problem setting is appealing in many real-world scenarios, where direct interaction with the environment is costly or risky, and where the resulting policy should comply with safety constraints. However, it is challenging to compute a policy that guarantees satisfying the cost constraints in the offline RL setting, since the off-policy evaluation inherently has an estimation error. In this paper, we present an offline constrained RL algorithm that optimizes the policy in the space of the stationary distribution. Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction. Experimental results show that COptiDICE attains better policies in terms of constraint satisfaction and return-maximization, outperforming baseline algorithms.
The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning
Andrei Cristian Nica
Attention Option-Critic
Raviteja Chunduru
Attention Option-Critic
Raviteja Chunduru
Attention Option-Critic
Raviteja Chunduru