Portrait de Doina Precup

Doina Precup

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure agrégée, McGill University, École d'informatique
Chef d'équipe de recherche, Google DeepMind
Sujets de recherche
Apprentissage automatique médical
Apprentissage par renforcement
Modèles probabilistes
Modélisation moléculaire
Raisonnement

Biographie

Doina Precup enseigne à l'Université McGill tout en menant des recherches fondamentales sur l'apprentissage par renforcement, notamment les applications de l'IA dans des domaines ayant des répercussions sociales, tels que les soins de santé. Elle s'intéresse à la prise de décision automatique dans des situations d'incertitude élevée.

Elle est membre de l'Institut canadien de recherches avancées (CIFAR) et de l'Association pour l'avancement de l'intelligence artificielle (AAAI), et dirige le bureau montréalais de DeepMind.

Ses spécialités sont les suivantes : intelligence artificielle, apprentissage machine, apprentissage par renforcement, raisonnement et planification sous incertitude, applications.

Étudiants actuels

Stagiaire de recherche - McGill
Collaborateur·rice alumni - McGill
Co-superviseur⋅e :
Collaborateur·rice alumni - McGill
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - McGill
Co-superviseur⋅e :
Collaborateur·rice de recherche - UdeM
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - Birla Institute of Technology
Maîtrise recherche - McGill
Doctorat - McGill
Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Doctorat - Polytechnique
Postdoctorat - McGill
Collaborateur·rice alumni - McGill
Collaborateur·rice alumni - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - McGill
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Stagiaire de recherche - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Stagiaire de recherche - McGill
Doctorat - McGill
Maîtrise recherche - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Collaborateur·rice alumni - McGill
Co-superviseur⋅e :

Publications

Marginalized State Distribution Entropy Regularization in Policy Optimization
Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning
We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor… (voir plus)-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new policy after every policy gradient update. Despite enormous success of off-policy policy gradients on control tasks, existing general methods suffer from high variance and instability, partly because the policy improvement depends on gradient of the estimated value function. In this work, we present a new way of off-policy policy evaluation in actor-critic, based on the doubly robust estimators. We extend the doubly robust estimator from off-policy policy evaluation (OPE) to actor-critic algorithms that consist of a reward estimator performance model. We find that doubly robust estimation of the critic can significantly improve performance in continuous control tasks. Furthermore, in cases where the reward function is stochastic that can lead to high variance, doubly robust critic estimation can improve performance under corrupted, stochastic reward signals, indicating its usefulness for robust and safe reinforcement learning.
Actor Critic with Differentially Private Critic
William Hamilton
Borja Balle
Reinforcement learning algorithms are known to be sample inefficient, and often performance on one task can be substantially improved by lev… (voir plus)eraging information (e.g., via pre-training) on other related tasks. In this work, we propose a technique to achieve such knowledge transfer in cases where agent trajectories contain sensitive or private information, such as in the healthcare domain. Our approach leverages a differentially private policy evaluation algorithm to initialize an actor-critic model and improve the effectiveness of learning in downstream tasks. We empirically show this technique increases sample efficiency in resource-constrained control problems while preserving the privacy of trajectories collected in an upstream task.
Improving Pathological Structure Segmentation via Transfer Learning Across Diseases
Paul Lemaitre
Raghav Mehta
Douglas Arnold
Early Prediction of Alzheimer's Disease Progression Using Variational Autoencoders
Konrad Wagstyl
Azar Zandifar
D. Collins
Adriana Romero
Augmenting learning using symmetry in a biologically-inspired domain
Abbas Abdolmaleki
Arthur Guez
Piotr Trochim
Invariances to translation, rotation and other spatial transformations are a hallmark of the laws of motion, and have widespread use in the … (voir plus)natural sciences to reduce the dimensionality of systems of equations. In supervised learning, such as in image classification tasks, rotation, translation and scale invariances are used to augment training datasets. In this work, we use data augmentation in a similar way, exploiting symmetry in the quadruped domain of the DeepMind control suite (Tassa et al. 2018) to add to the trajectories experienced by the actor in the actor-critic algorithm of Abdolmaleki et al. (2018). In a data-limited regime, the agent using a set of experiences augmented through symmetry is able to learn faster. Our approach can be used to inject knowledge of invariances in the domain and task to augment learning in robots, and more generally, to speed up learning in realistic robotics applications.
Assessing Generalization in TD methods for Deep Reinforcement Learning
Revisit Policy Optimization in Matrix Form
Xiao-Wen Chang
Neural Transfer Learning for Cry-based Diagnosis of Perinatal Asphyxia
Charles C. Onu
William L. Hamilton
Despite continuing medical advances, the rate of newborn morbidity and mortality globally remains high, with over 6 million casualties every… (voir plus) year. The prediction of pathologies affecting newborns based on their cry is thus of significant clinical interest, as it would facilitate the development of accessible, low-cost diagnostic tools\cut{ based on wearables and smartphones}. However, the inadequacy of clinically annotated datasets of infant cries limits progress on this task. This study explores a neural transfer learning approach to developing accurate and robust models for identifying infants that have suffered from perinatal asphyxia. In particular, we explore the hypothesis that representations learned from adult speech could inform and improve performance of models developed on infant speech. Our experiments show that models based on such representation transfer are resilient to different types and degrees of noise, as well as to signal loss in time and frequency domains.
Combined Reinforcement Learning via Abstract Representations
In the quest for efficient and robust reinforcement learning methods, both model-free and model-based approaches offer advantages. In this p… (voir plus)aper we propose a new way of explicitly bridging both approaches via a shared low-dimensional learned encoding of the environment, meant to capture summarizing abstractions. We show that the modularity brought by this approach leads to good generalization while being computationally efficient, with planning happening in a smaller latent state space. In addition, this approach recovers a sufficient low-dimensional representation of the environment, which opens up new strategies for interpretable AI, exploration and transfer learning.
Learning Options with Interest Functions
Learning temporal abstractions which are partial solutions to a task and could be reused for solving other tasks is an ingredient that can h… (voir plus)elp agents to plan and learn efficiently. In this work, we tackle this problem in the options framework. We aim to autonomously learn options which are specialized in different state space regions by proposing a notion of interest functions, which generalizes initiation sets from the options framework for function approximation. We build on the option-critic framework to derive policy gradient theorems for interest functions, leading to a new interest-option-critic architecture.
Leveraging Observations in Bandits: Between Risks and Benefits.
Imitation learning has been widely used to speed up learning in novice agents, by allowing them to leverage existing data from experts. Allo… (voir plus)wing an agent to be influenced by external observations can benefit to the learning process, but it also puts the agent at risk of following sub-optimal behaviours. In this paper, we study this problem in the context of bandits. More specifically, we consider that an agent (learner) is interacting with a bandit-style decision task, but can also observe a target policy interacting with the same environment. The learner observes only the target’s actions, not the rewards obtained. We introduce a new bandit optimism modifier that uses conditional optimism contingent on the actions of the target in order to guide the agent’s exploration. We analyze the effect of this modification on the well-known Upper Confidence Bound algorithm by proving that it preserves a regret upper-bound of order O(lnT), even in the presence of a very poor target, and we derive the dependency of the expected regret on the general target policy. We provide empirical results showing both great benefits as well as certain limitations inherent to observational learning in the multi-armed bandit setting. Experiments are conducted using targets satisfying theoretical assumptions with high probability, thus narrowing the gap between theory and application.