Portrait of Doina Precup

Doina Precup

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Research Team Leader, Google DeepMind
Research Topics
Medical Machine Learning
Molecular Modeling
Probabilistic Models
Reasoning
Reinforcement Learning

Biography

Doina Precup combines teaching at McGill University with fundamental research on reinforcement learning, in particular AI applications in areas of significant social impact, such as health care. She is interested in machine decision-making in situations where uncertainty is high.

In addition to heading the Montreal office of Google DeepMind, Precup is a Senior Fellow of the Canadian Institute for Advanced Research and a Fellow of the Association for the Advancement of Artificial Intelligence.

Her areas of speciality are artificial intelligence, machine learning, reinforcement learning, reasoning and planning under uncertainty, and applications.

Current Students

PhD - McGill University
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Principal supervisor :
Collaborating researcher - McGill University
Research Intern - Université de Montréal
PhD - McGill University
Principal supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
PhD - McGill University
Master's Research - McGill University
PhD - McGill University
PhD - McGill University
Postdoctorate - McGill University
Master's Research - McGill University
Collaborating Alumni - McGill University
Undergraduate - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
PhD - McGill University
Master's Research - McGill University
Principal supervisor :
Collaborating researcher - McGill University
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
PhD - McGill University
Co-supervisor :
Research Intern - McGill University
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :

Publications

When Do We Need GNN for Node Classification?
Sitao Luan
Chenqing Hua
Qincheng Lu
Jiaqi Zhu
Xiao-Wen Chang
Low-Rank Representation of Reinforcement Learning Policies
We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional… (see more) embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability and convergence guarantees. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly represented in a low-dimensional space while the embedded policy incurs almost no decrease in returns.
Simulating Human Gaze with Neural Visual Attention
Leo Schwinn
Bjoern Eskofier
Dario Zanca
The Paradox of Choice: On the Role of Attention in Hierarchical Reinforcement Learning
Andrei Cristian Nica
Decision-making AI agents are often faced with two important challenges: the depth of the planning horizon, and the branching factor due to … (see more)having many choices. Hierarchical reinforcement learning methods aim to solve the first problem, by providing shortcuts that skip over multiple time steps. To cope with the breadth, it is desirable to restrict the agent's attention at each step to a reasonable number of possible choices. The concept of affordances (Gibson, 1977) suggests that only certain actions are feasible in certain states. In this work, we first characterize "affordances" as a "hard" attention mechanism that strictly limits the available choices of temporally extended options. We then investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices. To this end, we present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options. Finally, we identify and empirically demonstrate the settings in which the "paradox of choice" arises, i.e. when having fewer but more meaningful choices improves the learning speed and performance of a reinforcement learning agent.
Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement Learning
Flemming Kondrup
Thomas Jiralerspong
Elaine Lau
Nathan de Lara
Jacob A. Shkrob
My Duc Tran
Sumana Basu
Estimating individual treatment effect on disability progression in multiple sclerosis using deep learning
Jean-Pierre R. Falet
Joshua D. Durso-Finley
Brennan Nichyporuk
Julien Schroeter
Francesca Bovis
Maria-Pia Sormani
Douglas Arnold
Assessing Intrapartum Risk of Hypoxic Ischemic Encephalopathy Using Fetal Heart Rate With Long Short-Term Memory Networks
"Derek Kweku DEGBEDZUI
Michael W Kuzniewicz
Marie-Coralie Cornet
Yvonne Wu
Heather Forquer
Lawrence Gerstley
Emily F. Hamilton
P. Warrick
Robert E. Kearney
This study investigated the prediction of the risk of hypoxic ischemic encephalopathy using intrapartum cardiotocography records with a long… (see more) short-term memory re-current neural network. Across the 12 hours of labour, HIE sensitivity rose from 0.25 to 0.56 as delivery approached while specificity remained approximately constant with a mean of 0.71 and standard deviation of 0.04. The results show that classification improves as delivery approaches but that performance needs improvement. Future work will address the limitations of this preliminary study by investigating input signal transformations and the use of other network architectures to improve the model performance.
Deep learning, reinforcement learning, and world models
Yu Matsuo
Yann LeCun
Maneesh Sahani
David Silver
Masashi Sugiyama
Eiji Uchibe
J. Morimoto
Automated prediction of extubation success in extremely preterm infants: the APEX multicenter study
Lara Kanbar
Wissam Shalish
Charles Onu
Samantha Latremouille
Lajos Kovacs
Martin Keszler
Sanjay Chawla
Karen A. Brown
R. Kearney
Guilherme M. Sant’Anna
On the Expressivity of Markov Reward (Extended Abstract)
David Abel
Will Dabney
Anna Harutyunyan
Mark K. Ho
Michael L. Littman
Satinder Singh
Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error
Scott Fujimoto
Ofir Nachum
Shixiang Shane Gu
In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is… (see more) uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.
PhyloPGM: boosting regulatory function prediction accuracy using evolutionary information
Faizy Ahsan
Zichao Yan
Abstract Motivation The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics … (see more)study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA–protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. Results In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA–RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA–RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. Availability and implementation The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM Supplementary information Supplementary data are available at Bioinformatics online.