Portrait de Doina Precup

Doina Precup

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure agrégée, McGill University, École d'informatique
Chef d'équipe de recherche, Google DeepMind
Sujets de recherche
Apprentissage automatique médical
Apprentissage par renforcement
Modèles probabilistes
Modélisation moléculaire
Raisonnement

Biographie

Doina Precup enseigne à l'Université McGill tout en menant des recherches fondamentales sur l'apprentissage par renforcement, notamment les applications de l'IA dans des domaines ayant des répercussions sociales, tels que les soins de santé. Elle s'intéresse à la prise de décision automatique dans des situations d'incertitude élevée.

Elle est membre de l'Institut canadien de recherches avancées (CIFAR) et de l'Association pour l'avancement de l'intelligence artificielle (AAAI), et dirige le bureau montréalais de DeepMind.

Ses spécialités sont les suivantes : intelligence artificielle, apprentissage machine, apprentissage par renforcement, raisonnement et planification sous incertitude, applications.

Étudiants actuels

Collaborateur·rice alumni - McGill
Co-superviseur⋅e :
Collaborateur·rice alumni - McGill
Collaborateur·rice alumni - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - McGill
Co-superviseur⋅e :
Collaborateur·rice de recherche - UdeM
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - Birla Institute of Technology
Doctorat - McGill
Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Collaborateur·rice alumni - McGill
Doctorat - Polytechnique
Postdoctorat - McGill
Collaborateur·rice alumni - McGill
Collaborateur·rice alumni - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - McGill
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Stagiaire de recherche - McGill
Maîtrise recherche - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Collaborateur·rice alumni - McGill
Co-superviseur⋅e :

Publications

Assessment of Extubation Readiness Using Spontaneous Breathing Trials in Extremely Preterm Neonates
Wissam Shalish
Lara Kanbar
Lajos Kovacs
Sanjay Chawla
Martin Keszler
Smita Rao
Samantha Latremouille
Karen Brown
Robert E. Kearney
Guilherme M. Sant’Anna
Importance Spontaneous breathing trials (SBTs) are used to determine extubation readiness in extremely preterm neonates (gestational age ≤… (voir plus)28 weeks), but these trials rely on empirical combinations of clinical events during endotracheal continuous positive airway pressure (ET-CPAP). Objectives To describe clinical events during ET-CPAP and to assess accuracy of comprehensive clinical event combinations in predicting successful extubation compared with clinical judgment alone. Design, Setting, and Participants This multicenter diagnostic study used data from 259 neonates seen at 5 neonatal intensive care units from the prospective Automated Prediction of Extubation Readiness (APEX) study from September 1, 2013, through August 31, 2018. Neonates with birth weight less than 1250 g who required mechanical ventilation were eligible. Neonates deemed to be ready for extubation and who underwent ET-CPAP before extubation were included. Interventions In the APEX study, cardiorespiratory signals were recorded during 5-minute ET-CPAP, and signs of clinical instability were monitored. Main Outcomes and Measures Four clinical events were documented during ET-CPAP: apnea requiring stimulation, presence and cumulative durations of bradycardia and desaturation, and increased supplemental oxygen. Clinical event occurrence was assessed and compared between extubation pass and fail (defined as reintubation within 7 days). An automated algorithm was developed to generate SBT definitions using all clinical event combinations and to compute diagnostic accuracies of an SBT in predicting extubation success. Results Of 259 neonates (139 [54%] male) with a median gestational age of 26.1 weeks (interquartile range [IQR], 24.9-27.4 weeks) and median birth weight of 830 g (IQR, 690-1019 g), 147 (57%) had at least 1 clinical event during ET-CPAP. Apneas occurred in 10% (26 of 259) of neonates, bradycardias in 19% (48), desaturations in 53% (138), and increased oxygen needs in 41% (107). Neonates with successful extubation (71% [184 of 259]) had significantly fewer clinical events (51% [93 of 184] vs 72% [54 of 75], P = .002), shorter cumulative bradycardia duration (median, 0 seconds [IQR, 0 seconds] vs 0 seconds [IQR, 0-9 seconds], P  .001), shorter cumulative desaturation duration (median, 0 seconds [IQR, 0-59 seconds] vs 25 seconds [IQR, 0-90 seconds], P = .003),
Shaping representations through communication: community size effect in artificial learning systems
Olivier Tieleman
Angeliki Lazaridou
Shibl Mourad
Charles Blundell
Motivated by theories of language and communication that explain why communities with large numbers of speakers have, on average, simpler la… (voir plus)nguages with more regularity, we cast the representation learning problem in terms of learning to communicate. Our starting point sees the traditional autoencoder setup as a single encoder with a fixed decoder partner that must learn to communicate. Generalizing from there, we introduce community-based autoencoders in which multiple encoders and decoders collectively learn representations by being randomly paired up on successive training iterations. We find that increasing community sizes reduce idiosyncrasies in the learned codes, resulting in representations that better encode concept categories and correlate with human feature norms.
Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods
The policy gradient theorem is defined based on an objective with respect to the initial distribution over states. In the discounted case, t… (voir plus)his results in policies that are optimal for one distribution over initial states, but may not be uniformly optimal for others, no matter where the agent starts from. Furthermore, to obtain unbiased gradient estimates, the starting point of the policy gradient estimator requires sampling states from a normalized discounted weighting of states. However, the difficulty of estimating the normalized discounted weighting of states, or the stationary state distribution, is quite well-known. Additionally, the large sample complexity of policy gradient methods is often attributed to insufficient exploration, and to remedy this, it is often assumed that the restart distribution provides sufficient exploration in these algorithms. In this work, we propose exploration in policy gradient methods based on maximizing entropy of the discounted future state distribution. The key contribution of our work includes providing a practically feasible algorithm to estimate the normalized discounted weighting of states, i.e, the \textit{discounted future state distribution}. We propose that exploration can be achieved by entropy regularization with the discounted state distribution in policy gradients, where a metric for maximal coverage of the state space can be based on the entropy of the induced state distribution. The proposed approach can be considered as a three time-scale algorithm and under some mild technical conditions, we prove its convergence to a locally optimal policy. Experimentally, we demonstrate usefulness of regularization with the discounted future state distribution in terms of increased state space coverage and faster learning on a range of complex tasks.
Marginalized State Distribution Entropy Regularization in Policy Optimization
Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning
We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor… (voir plus)-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new policy after every policy gradient update. Despite enormous success of off-policy policy gradients on control tasks, existing general methods suffer from high variance and instability, partly because the policy improvement depends on gradient of the estimated value function. In this work, we present a new way of off-policy policy evaluation in actor-critic, based on the doubly robust estimators. We extend the doubly robust estimator from off-policy policy evaluation (OPE) to actor-critic algorithms that consist of a reward estimator performance model. We find that doubly robust estimation of the critic can significantly improve performance in continuous control tasks. Furthermore, in cases where the reward function is stochastic that can lead to high variance, doubly robust critic estimation can improve performance under corrupted, stochastic reward signals, indicating its usefulness for robust and safe reinforcement learning.
Actor Critic with Differentially Private Critic
William Hamilton
Borja Balle
Reinforcement learning algorithms are known to be sample inefficient, and often performance on one task can be substantially improved by lev… (voir plus)eraging information (e.g., via pre-training) on other related tasks. In this work, we propose a technique to achieve such knowledge transfer in cases where agent trajectories contain sensitive or private information, such as in the healthcare domain. Our approach leverages a differentially private policy evaluation algorithm to initialize an actor-critic model and improve the effectiveness of learning in downstream tasks. We empirically show this technique increases sample efficiency in resource-constrained control problems while preserving the privacy of trajectories collected in an upstream task.
Improving Pathological Structure Segmentation via Transfer Learning Across Diseases
Paul Lemaitre
Raghav Mehta
Douglas Arnold
Early Prediction of Alzheimer's Disease Progression Using Variational Autoencoders
Konrad Wagstyl
Azar Zandifar
D. Collins
Adriana Romero
Augmenting learning using symmetry in a biologically-inspired domain
Abbas Abdolmaleki
Arthur Guez
Piotr Trochim
Invariances to translation, rotation and other spatial transformations are a hallmark of the laws of motion, and have widespread use in the … (voir plus)natural sciences to reduce the dimensionality of systems of equations. In supervised learning, such as in image classification tasks, rotation, translation and scale invariances are used to augment training datasets. In this work, we use data augmentation in a similar way, exploiting symmetry in the quadruped domain of the DeepMind control suite (Tassa et al. 2018) to add to the trajectories experienced by the actor in the actor-critic algorithm of Abdolmaleki et al. (2018). In a data-limited regime, the agent using a set of experiences augmented through symmetry is able to learn faster. Our approach can be used to inject knowledge of invariances in the domain and task to augment learning in robots, and more generally, to speed up learning in realistic robotics applications.
Assessing Generalization in TD methods for Deep Reinforcement Learning
Revisit Policy Optimization in Matrix Form
Xiao-Wen Chang
Neural Transfer Learning for Cry-based Diagnosis of Perinatal Asphyxia
Charles C. Onu
William L. Hamilton
Despite continuing medical advances, the rate of newborn morbidity and mortality globally remains high, with over 6 million casualties every… (voir plus) year. The prediction of pathologies affecting newborns based on their cry is thus of significant clinical interest, as it would facilitate the development of accessible, low-cost diagnostic tools\cut{ based on wearables and smartphones}. However, the inadequacy of clinically annotated datasets of infant cries limits progress on this task. This study explores a neural transfer learning approach to developing accurate and robust models for identifying infants that have suffered from perinatal asphyxia. In particular, we explore the hypothesis that representations learned from adult speech could inform and improve performance of models developed on infant speech. Our experiments show that models based on such representation transfer are resilient to different types and degrees of noise, as well as to signal loss in time and frequency domains.