Portrait of Doina Precup

Doina Precup

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Research Team Leader, Google DeepMind
Research Topics
Medical Machine Learning
Molecular Modeling
Probabilistic Models
Reasoning
Reinforcement Learning

Biography

Doina Precup combines teaching at McGill University with fundamental research on reinforcement learning, in particular AI applications in areas of significant social impact, such as health care. She is interested in machine decision-making in situations where uncertainty is high.

In addition to heading the Montreal office of Google DeepMind, Precup is a Senior Fellow of the Canadian Institute for Advanced Research and a Fellow of the Association for the Advancement of Artificial Intelligence.

Her areas of speciality are artificial intelligence, machine learning, reinforcement learning, reasoning and planning under uncertainty, and applications.

Current Students

PhD - McGill University
Collaborating Alumni - McGill University
Co-supervisor :
Collaborating Alumni - McGill University
Collaborating Alumni - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Principal supervisor :
Collaborating researcher - McGill University
Co-supervisor :
Collaborating researcher - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - McGill University
Principal supervisor :
Collaborating researcher - Birla Institute of Technology
PhD - McGill University
Collaborating Alumni - McGill University
Master's Research - McGill University
Collaborating Alumni - McGill University
PhD - Polytechnique Montréal
PhD - McGill University
Postdoctorate - McGill University
Collaborating Alumni - McGill University
Collaborating Alumni - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Collaborating Alumni - McGill University
Master's Research - McGill University
Principal supervisor :
Collaborating researcher - McGill University
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Research Intern - McGill University
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Collaborating Alumni - McGill University
Co-supervisor :

Publications

Shaping representations through communication
Olivier Tieleman
Angeliki Lazaridou
Shibl Mourad
Charles Blundell
Where Off-Policy Deep Reinforcement Learning Fails
This work examines batch reinforcement learning–the task of maximally exploiting a given batch of off-policy data, without further data co… (see more)llection. We demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are only capable of learning with data correlated to their current policy, making them ineffective for most off-policy applications. We introduce a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space to force the agent towards behaving on-policy with respect to a subset of the given data. We extend this notion to deep reinforcement learning, and to the best of our knowledge, present the first continuous control deep reinforcement learning algorithm which can learn effectively from uncorrelated off-policy data.
Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation
Tanya Nair
Douglas L. Arnold
Deep learning (DL) networks have recently been shown to outperform other segmentation methods on various public, medical-image challenge dat… (see more)asets [3,11,16], especially for large pathologies. However, in the context of diseases such as Multiple Sclerosis (MS), monitoring all the focal lesions visible on MRI sequences, even very small ones, is essential for disease staging, prognosis, and evaluating treatment efficacy. Moreover, producing deterministic outputs hinders DL adoption into clinical routines. Uncertainty estimates for the predictions would permit subsequent revision by clinicians. We present the first exploration of multiple uncertainty estimates based on Monte Carlo (MC) dropout [4] in the context of deep networks for lesion detection and segmentation in medical images. Specifically, we develop a 3D MS lesion segmentation CNN, augmented to provide four different voxel-based uncertainty measures based on MC dropout. We train the network on a proprietary, large-scale, multi-site, multi-scanner, clinical MS dataset, and compute lesion-wise uncertainties by accumulating evidence from voxel-wise uncertainties within detected lesions. We analyze the performance of voxel-based segmentation and lesion-level detection by choosing operating points based on the uncertainty. Empirical evidence suggests that uncertainty measures consistently allow us to choose superior operating points compared only using the network's sigmoid output as a probability.
Attend Before you Act: Leveraging human visual attention for continual learning
When humans perform a task, such as playing a game, they selectively pay attention to certain parts of the visual input, gathering relevant … (see more)information and sequentially combining it to build a representation from the sensory data. In this work, we explore leveraging where humans look in an image as an implicit indication of what is salient for decision making. We build on top of the UNREAL architecture in DeepMind Lab's 3D navigation maze environment. We train the agent both with original images and foveated images, which were generated by overlaying the original images with saliency maps generated using a real-time spectral residual technique. We investigate the effectiveness of this approach in transfer learning by measuring performance in the context of noise in the environment.
Undersampling and Bagging of Decision Trees in the Analysis of Cardiorespiratory Behavior for the Prediction of Extubation Readiness in Extremely Preterm Infants
Lara Kanbar
Wissam Shalish
Karen A. Brown
Guilherme M. Sant’Anna
Robert E. Kearney
Extremely preterm infants often require endotracheal intubation and mechanical ventilation during the first days of life. Due to the detrime… (see more)ntal effects of prolonged invasive mechanical ventilation (IMV), clinicians aim to extubate infants as soon as they deem them ready.Unfortunately, existing strategies for prediction of extubation readiness vary across clinicians and institutions, and lead to high reintubation rates. We present an approach using Random Forest classifiers for the analysis of cardiorespiratory variability to predict extubation readiness. We address the issue of data imbalance by employing random undersampling of examples from the majority class before training each Decision Tree in a bag. By incorporating clinical domain knowledge, we further demonstrate that our classifier could have identified 71% of infants who failed extubation, while maintaining a success detection rate of 78%.
Eligibility Traces for Options
Ayush Jain
Temporally extended actions not only represent knowledge in the hierarchical setup in reinforcement learning, they also improve exploration … (see more)while reducing the complexity of choosing actions. The option framework provides a concrete way to implement and reason about temporal abstraction. This work attempts to test the utility of eligibility traces with options and find good ways of doing multi-step intra-option updates. Three algorithms, based on off-policy methods - importance sampling, tree-backup and retrace, are proposed for using eligibility traces with options.
Convergent Tree Backup and Retrace with Function Approximation
Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by… (see more) a different behavior policy. Unfortunately, it has been challenging to combine off-policy learning with function approximation and multi-step bootstrapping in a way that leads to both stable and efficient algorithms. In this work, we show that the \textsc{Tree Backup} and \textsc{Retrace} algorithms are unstable with linear function approximation, both in theory and in practice with specific examples. Based on our analysis, we then derive stable and efficient gradient-based algorithms using a quadratic convex-concave saddle-point formulation. By exploiting the problem structure proper to these algorithms, we are able to provide convergence guarantees and finite-sample bounds. The applicability of our new analysis also goes beyond \textsc{Tree Backup} and \textsc{Retrace} and allows us to provide new convergence rates for the GTD and GTD2 algorithms without having recourse to projections or Polyak averaging.
Diffusion-Based Approximate Value Functions
We present a novel model-based framework inspired by spectral graph theory and deep geometric learning: the Diffusion-based Approximate Valu… (see more)e Function. Our approach efficiently approximates the graph Laplacian of an MDP’s underlying graph by using Graph Convolutional Networks (GCN). By generating an approximate value function, we diffuse the reward signal much faster than traditional Reinforcement Learning algorithms such as TD(0). This leads to substantial improvements on sparse rewards environments where efficient credit assignment is most demanding.
Resolving Event Coreference with Supervised Representation Learning and Clustering-Oriented Regularization
Jackie CK Cheung
We present an approach to event coreference resolution by developing a general framework for clustering that uses supervised representation … (see more)learning. We propose a neural network architecture with novel Clustering-Oriented Regularization (CORE) terms in the objective function. These terms encourage the model to create embeddings of event mentions that are amenable to clustering. We then use agglomerative clustering on these embeddings to build event coreference chains. For both within- and cross-document coreference on the ECB+ corpus, our model obtains better results than models that require significantly more pre-annotated information. This work provides insight and motivating results for a new general approach to solving coreference and clustering problems with representation learning.
Dyna Planning using a Feature Based Generative Model
Ryan Faulkner
Dyna-style reinforcement learning is a powerful approach for problems where not much real data is available. The main idea is to supplement … (see more)real trajectories, or sequences of sampled states over time, with simulated ones sampled from a learned model of the environment. However, in large state spaces, the problem of learning a good generative model of the environment has been open so far. We propose to use deep belief networks to learn an environment model for use in Dyna. We present our approach and validate it empirically on problems where the state observations consist of images. Our results demonstrate that using deep belief networks, which are full generative models, significantly outperforms the use of linear expectation models, proposed in Sutton et al. (2008)
Deep Reinforcement Learning that Matters
In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning… (see more) (RL). Reproducing existing work and accurately judging the improvements offered by novel methods is vital to sustaining this progress. Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. In particular, non-determinism in standard benchmark environments, combined with variance intrinsic to the methods, can make reported results tough to interpret. Without significance metrics and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the prior state-of-the-art are meaningful. In this paper, we investigate challenges posed by reproducibility, proper experimental techniques, and reporting procedures. We illustrate the variability in reported metrics and results when comparing against common baselines and suggest guidelines to make future results in deep RL more reproducible. We aim to spur discussion about how to ensure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.
Imitation Upper Confidence Bound for Bandits on a Graph
We consider a graph of interconnected agents implementing a common policy and each playing a bandit problem with identical reward distributi… (see more)ons. We restrict the information propagated in the graph such that agents can uniquely observe each other's actions. We propose an extension of the Upper Confidence Bound (UCB) algorithm to this setting and empirically demonstrate that our solution improves the performance over UCB according to multiple metrics and within various graph configurations.