Portrait of Doina Precup

Doina Precup

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Research Team Leader, Google DeepMind
Research Topics
Medical Machine Learning
Molecular Modeling
Probabilistic Models
Reasoning
Reinforcement Learning

Biography

Doina Precup combines teaching at McGill University with fundamental research on reinforcement learning, in particular AI applications in areas of significant social impact, such as health care. She is interested in machine decision-making in situations where uncertainty is high.

In addition to heading the Montreal office of Google DeepMind, Precup is a Senior Fellow of the Canadian Institute for Advanced Research and a Fellow of the Association for the Advancement of Artificial Intelligence.

Her areas of speciality are artificial intelligence, machine learning, reinforcement learning, reasoning and planning under uncertainty, and applications.

Current Students

Research Intern - McGill University
PhD - McGill University
Collaborating Alumni - McGill University
Co-supervisor :
Collaborating Alumni - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Principal supervisor :
Collaborating researcher - McGill University
Collaborating researcher - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - McGill University
Principal supervisor :
Collaborating researcher - Birla Institute of Technology
Master's Research - McGill University
PhD - McGill University
Collaborating Alumni - McGill University
Master's Research - McGill University
PhD - Polytechnique Montréal
PhD - McGill University
Postdoctorate - McGill University
Collaborating Alumni - McGill University
Collaborating Alumni - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Collaborating Alumni - McGill University
Master's Research - McGill University
Principal supervisor :
Collaborating researcher - McGill University
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - McGill University
Co-supervisor :
Research Intern - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :
Research Intern - McGill University
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Collaborating Alumni - McGill University
Co-supervisor :

Publications

When Waiting is not an Option: Learning Options with a Deliberation Cost
Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance. W… (see more)hile the problem of "how" to learn options is increasingly well understood, the question of "what" good options should be has remained elusive. We formulate our answer to what "good" options should be in the bounded rationality framework (Simon, 1957) through the notion of deliberation cost. We then derive practical gradient-based learning algorithms to implement this objective. Our results in the Arcade Learning Environment (ALE) show increased performance and interpretability.
Constructing Temporal Abstractions Autonomously in Reinforcement Learning
Learning Robust Options
Daniel J. Mankowitz
Timothy A. Mann
Shie Mannor
Robust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose … (see more)parameters have strong uncertainty. Existing work uses value-based methods and the usual primitive action setting. In this paper, we propose robust methods for learning temporally abstract actions, in the framework of options. We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty. We utilize ROPI to learn robust options with the Robust Options Deep Q Network (RO-DQN) that solves multiple tasks and mitigates model misspecification due to model uncertainty. We present experimental results which suggest that policy iteration with linear features may have an inherent form of robustness when using coarse feature representations. In addition, we present experimental results which demonstrate that robustness helps policy iteration implemented on top of deep neural networks to generalize over a much broader range of dynamics than non-robust policy iteration.
Patterns of reintubation in extremely preterm infants: a longitudinal cohort study
Wissam Shalish
Lara Kanbar
Martin Keszler
Sanjay Chawla
Lajos Kovacs
Smita Rao
Bogdan A Panaitescu
Alyse Laliberte
Karen Brown
Robert E Kearney
Guilherme M Sant'Anna
Analyzing Alzheimer’s Disease Progression from Sequential Magnetic Resonance Imaging Scans Using Deep 3D Convolutional Neural Networks
Konrad Wagstyl
Azar Zandifar
Adriana Romero
Alzheimer’s is a progressive, neurodegenerative disease, that causes irreversible damage to the brain tissue. It impairs the ability to fo… (see more)rm and retrieve memory, and eventually disrupts the natural flow of life, by affecting the ability to carry out even day to day activities. The disease is typically diagnosed from the symptoms (Mini Mental State Examination, [8]), such as decline in cognitive abilities, visual and/or speech impairment, loss of memory, rather than the structural changes in the brain (biomarker) that causes it. But the pathological changes in the brain start decades before the manifestation of the symptoms [7]. Magnetic Resonance Imaging (MRI) is capable of capturing the complex changes in the brain, even if it is difficult for humans to extract those features from the low contrast, multi-dimensional MRIs [1]. There is a considerable amount of work on analyzing Alzheimer’s disease. However, the vast majority intends to predict the state of the disease at the current time step.
Disentangling the independently controllable factors of variation by interacting with the world
Valentin Thomas
Philippe Beaudoin
William Fedus
It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it rema… (see more)ins an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors, and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal.
Learning Safe Policies with Expert Guidance
Je-chun Huang
Fa Wu
Yang Cai
We propose a framework for ensuring safe behavior of a reinforcement learning agent when the reward function may be difficult to specify. In… (see more) order to do this, we rely on the existence of demonstrations from expert policies, and we provide a theoretical framework for the agent to optimize in the space of rewards consistent with its existing knowledge. We propose two methods to solve the resulting optimization: an exact ellipsoid-based method and a method in the spirit of the "follow-the-perturbed-leader" algorithm. Our experiments demonstrate the behavior of our algorithm in both discrete and continuous problems. The trained agent safely avoids states with potential negative effects while imitating the behavior of the expert in the other states.
Optimizing Home Energy Management and Electric Vehicle Charging with Reinforcement Learning
Smart grids are advancing the management efficiency and security of power grids with the integration of energy storage, distributed controll… (see more)ers, and advanced meters. In particular, with the increasing prevalence of residential automation devices and distributed renewable energy generation, residential energy management is now drawing more attention. Meanwhile, the increasing adoption of electric vehicle (EV) brings more challenges and opportunities for smart residential energy management. This paper formalizes energy management for the residential home with EV charging as a Markov Decision Process and proposes reinforcement learning (RL) based control algorithms to address it. The objective of the proposed algorithms is to minimize the long-term operating cost. We further use a recurrent neural network (RNN) to model the electricity demand as a preprocessing step. Both the RNN prediction and latent representations are used as additional state features for the RL based control algorithms. Experiments on real-world data show that the proposed algorithms can significantly reduce the operating cost and peak power consumption compared to baseline control algorithms.
Temporal Regularization for Markov Decision Process
Several applications of Reinforcement Learning suffer from instability due to high variance. This is especially prevalent in high dimensiona… (see more)l domains. Regularization is a commonly used technique in machine learning to reduce variance, at the cost of introducing some bias. Most existing regularization techniques focus on spatial (perceptual) regularization. Yet in reinforcement learning, due to the nature of the Bellman equation, there is an opportunity to also exploit temporal regularization based on smoothness in value estimates over trajectories. This paper explores a class of methods for temporal regularization. We formally characterize the bias induced by this technique using Markov chain concepts. We illustrate the various characteristics of temporal regularization via a sequence of simple discrete and continuous MDPs, and show that the technique provides improvement even in high-dimensional Atari games.
Boosting Based Multiple Kernel Learning and Transfer Regression for Electricity Load Forecasting
Boyu Wang
Benoit Boulet
Predicting Extubation Readiness in Extreme Preterm Infants based on Patterns of Breathing*
Charles C. Onu
Lara J. Kanbar
Wissam Shalish
Karen A. Brown
Guilherme M. Sant'Anna
Robert E. Kearney
Extremely preterm infants commonly require intubation and invasive mechanical ventilation after birth. While the duration of mechanical vent… (see more)ilation should be minimized in order to avoid complications, extubation failure is associated with increases in morbidities and mortality. As part of a prospective observational study aimed at developing an accurate predictor of extubation readiness, Markov and semi-Markov chain models were applied to gain insight into the respiratory patterns of these infants, with more robust time-series modeling using semi-Markov models. This model revealed interesting similarities and differences between newborns who succeeded extubation and those who failed. The parameters of the model were further applied to predict extubation readiness via generative (joint likelihood) and discriminative (support vector machine) approaches. Results showed that up to 84\% of infants who failed extubation could have been accurately identified prior to extubation.
Learnings Options End-to-End for Continuous Action Tasks
We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup… (see more) [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a deliberation cost and train it with proximal policy optimization (Schulmanet al.[2017]) instead of vanilla policy gradient. Results on Mujoco domains arepromising, but lead to interesting questions aboutwhena given option should beused, an issue directly connected to the use of initiation sets.