Portrait of Doina Precup

Doina Precup

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Research Team Leader, Google DeepMind
Research Topics
Medical Machine Learning
Molecular Modeling
Probabilistic Models
Reasoning
Reinforcement Learning

Biography

Doina Precup combines teaching at McGill University with fundamental research on reinforcement learning, in particular AI applications in areas of significant social impact, such as health care. She is interested in machine decision-making in situations where uncertainty is high.

In addition to heading the Montreal office of Google DeepMind, Precup is a Senior Fellow of the Canadian Institute for Advanced Research and a Fellow of the Association for the Advancement of Artificial Intelligence.

Her areas of speciality are artificial intelligence, machine learning, reinforcement learning, reasoning and planning under uncertainty, and applications.

Current Students

PhD - McGill University
Collaborating Alumni - McGill University
Co-supervisor :
Collaborating Alumni - McGill University
Collaborating Alumni - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Principal supervisor :
Collaborating researcher - McGill University
Co-supervisor :
Collaborating researcher - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - McGill University
Principal supervisor :
Collaborating researcher - Birla Institute of Technology
PhD - McGill University
Collaborating Alumni - McGill University
Master's Research - McGill University
Collaborating Alumni - McGill University
PhD - Polytechnique Montréal
PhD - McGill University
Postdoctorate - McGill University
Collaborating Alumni - McGill University
Collaborating Alumni - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Collaborating Alumni - McGill University
Master's Research - McGill University
Principal supervisor :
Collaborating researcher - McGill University
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :
Research Intern - McGill University
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Collaborating Alumni - McGill University
Co-supervisor :

Publications

Assessment of Extubation Readiness Using Spontaneous Breathing Trials in Extremely Preterm Neonates
Wissam Shalish
Lara Kanbar
Lajos Kovacs
Sanjay Chawla
Martin Keszler
Smita Rao
Samantha Latremouille
Karen Brown
Robert E. Kearney
Guilherme M. Sant’Anna
Importance Spontaneous breathing trials (SBTs) are used to determine extubation readiness in extremely preterm neonates (gestational age ≤… (see more)28 weeks), but these trials rely on empirical combinations of clinical events during endotracheal continuous positive airway pressure (ET-CPAP). Objectives To describe clinical events during ET-CPAP and to assess accuracy of comprehensive clinical event combinations in predicting successful extubation compared with clinical judgment alone. Design, Setting, and Participants This multicenter diagnostic study used data from 259 neonates seen at 5 neonatal intensive care units from the prospective Automated Prediction of Extubation Readiness (APEX) study from September 1, 2013, through August 31, 2018. Neonates with birth weight less than 1250 g who required mechanical ventilation were eligible. Neonates deemed to be ready for extubation and who underwent ET-CPAP before extubation were included. Interventions In the APEX study, cardiorespiratory signals were recorded during 5-minute ET-CPAP, and signs of clinical instability were monitored. Main Outcomes and Measures Four clinical events were documented during ET-CPAP: apnea requiring stimulation, presence and cumulative durations of bradycardia and desaturation, and increased supplemental oxygen. Clinical event occurrence was assessed and compared between extubation pass and fail (defined as reintubation within 7 days). An automated algorithm was developed to generate SBT definitions using all clinical event combinations and to compute diagnostic accuracies of an SBT in predicting extubation success. Results Of 259 neonates (139 [54%] male) with a median gestational age of 26.1 weeks (interquartile range [IQR], 24.9-27.4 weeks) and median birth weight of 830 g (IQR, 690-1019 g), 147 (57%) had at least 1 clinical event during ET-CPAP. Apneas occurred in 10% (26 of 259) of neonates, bradycardias in 19% (48), desaturations in 53% (138), and increased oxygen needs in 41% (107). Neonates with successful extubation (71% [184 of 259]) had significantly fewer clinical events (51% [93 of 184] vs 72% [54 of 75], P = .002), shorter cumulative bradycardia duration (median, 0 seconds [IQR, 0 seconds] vs 0 seconds [IQR, 0-9 seconds], P  .001), shorter cumulative desaturation duration (median, 0 seconds [IQR, 0-59 seconds] vs 25 seconds [IQR, 0-90 seconds], P = .003),
Shaping representations through communication: community size effect in artificial learning systems
Olivier Tieleman
Angeliki Lazaridou
Shibl Mourad
Charles Blundell
Motivated by theories of language and communication that explain why communities with large numbers of speakers have, on average, simpler la… (see more)nguages with more regularity, we cast the representation learning problem in terms of learning to communicate. Our starting point sees the traditional autoencoder setup as a single encoder with a fixed decoder partner that must learn to communicate. Generalizing from there, we introduce community-based autoencoders in which multiple encoders and decoders collectively learn representations by being randomly paired up on successive training iterations. We find that increasing community sizes reduce idiosyncrasies in the learned codes, resulting in representations that better encode concept categories and correlate with human feature norms.
Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods
The policy gradient theorem is defined based on an objective with respect to the initial distribution over states. In the discounted case, t… (see more)his results in policies that are optimal for one distribution over initial states, but may not be uniformly optimal for others, no matter where the agent starts from. Furthermore, to obtain unbiased gradient estimates, the starting point of the policy gradient estimator requires sampling states from a normalized discounted weighting of states. However, the difficulty of estimating the normalized discounted weighting of states, or the stationary state distribution, is quite well-known. Additionally, the large sample complexity of policy gradient methods is often attributed to insufficient exploration, and to remedy this, it is often assumed that the restart distribution provides sufficient exploration in these algorithms. In this work, we propose exploration in policy gradient methods based on maximizing entropy of the discounted future state distribution. The key contribution of our work includes providing a practically feasible algorithm to estimate the normalized discounted weighting of states, i.e, the \textit{discounted future state distribution}. We propose that exploration can be achieved by entropy regularization with the discounted state distribution in policy gradients, where a metric for maximal coverage of the state space can be based on the entropy of the induced state distribution. The proposed approach can be considered as a three time-scale algorithm and under some mild technical conditions, we prove its convergence to a locally optimal policy. Experimentally, we demonstrate usefulness of regularization with the discounted future state distribution in terms of increased state space coverage and faster learning on a range of complex tasks.
Marginalized State Distribution Entropy Regularization in Policy Optimization
Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning
We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor… (see more)-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new policy after every policy gradient update. Despite enormous success of off-policy policy gradients on control tasks, existing general methods suffer from high variance and instability, partly because the policy improvement depends on gradient of the estimated value function. In this work, we present a new way of off-policy policy evaluation in actor-critic, based on the doubly robust estimators. We extend the doubly robust estimator from off-policy policy evaluation (OPE) to actor-critic algorithms that consist of a reward estimator performance model. We find that doubly robust estimation of the critic can significantly improve performance in continuous control tasks. Furthermore, in cases where the reward function is stochastic that can lead to high variance, doubly robust critic estimation can improve performance under corrupted, stochastic reward signals, indicating its usefulness for robust and safe reinforcement learning.
Actor Critic with Differentially Private Critic
William Hamilton
Borja Balle
Reinforcement learning algorithms are known to be sample inefficient, and often performance on one task can be substantially improved by lev… (see more)eraging information (e.g., via pre-training) on other related tasks. In this work, we propose a technique to achieve such knowledge transfer in cases where agent trajectories contain sensitive or private information, such as in the healthcare domain. Our approach leverages a differentially private policy evaluation algorithm to initialize an actor-critic model and improve the effectiveness of learning in downstream tasks. We empirically show this technique increases sample efficiency in resource-constrained control problems while preserving the privacy of trajectories collected in an upstream task.
Improving Pathological Structure Segmentation via Transfer Learning Across Diseases
Paul Lemaitre
Raghav Mehta
Douglas Arnold
Early Prediction of Alzheimer's Disease Progression Using Variational Autoencoders
Konrad Wagstyl
Azar Zandifar
D. Collins
Adriana Romero
Augmenting learning using symmetry in a biologically-inspired domain
Abbas Abdolmaleki
Arthur Guez
Piotr Trochim
Invariances to translation, rotation and other spatial transformations are a hallmark of the laws of motion, and have widespread use in the … (see more)natural sciences to reduce the dimensionality of systems of equations. In supervised learning, such as in image classification tasks, rotation, translation and scale invariances are used to augment training datasets. In this work, we use data augmentation in a similar way, exploiting symmetry in the quadruped domain of the DeepMind control suite (Tassa et al. 2018) to add to the trajectories experienced by the actor in the actor-critic algorithm of Abdolmaleki et al. (2018). In a data-limited regime, the agent using a set of experiences augmented through symmetry is able to learn faster. Our approach can be used to inject knowledge of invariances in the domain and task to augment learning in robots, and more generally, to speed up learning in realistic robotics applications.
Assessing Generalization in TD methods for Deep Reinforcement Learning
Revisit Policy Optimization in Matrix Form
Xiao-Wen Chang
Neural Transfer Learning for Cry-based Diagnosis of Perinatal Asphyxia
Charles C. Onu
William L. Hamilton
Despite continuing medical advances, the rate of newborn morbidity and mortality globally remains high, with over 6 million casualties every… (see more) year. The prediction of pathologies affecting newborns based on their cry is thus of significant clinical interest, as it would facilitate the development of accessible, low-cost diagnostic tools\cut{ based on wearables and smartphones}. However, the inadequacy of clinically annotated datasets of infant cries limits progress on this task. This study explores a neural transfer learning approach to developing accurate and robust models for identifying infants that have suffered from perinatal asphyxia. In particular, we explore the hypothesis that representations learned from adult speech could inform and improve performance of models developed on infant speech. Our experiments show that models based on such representation transfer are resilient to different types and degrees of noise, as well as to signal loss in time and frequency domains.