Portrait of Doina Precup

Doina Precup

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Research Team Leader, Google DeepMind

Biography

Doina Precup combines teaching at McGill University with fundamental research on reinforcement learning, in particular AI applications in areas of significant social impact, such as health care. She is interested in machine decision-making in situations where uncertainty is high.

In addition to heading the Montreal office of Google DeepMind, Precup is a Senior Fellow of the Canadian Institute for Advanced Research and a Fellow of the Association for the Advancement of Artificial Intelligence.

Her areas of speciality are artificial intelligence, machine learning, reinforcement learning, reasoning and planning under uncertainty, and applications.

Current Students

Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Master's Research - McGill University
Postdoctorate - McGill University
Master's Research - McGill University
Research Intern - McGill University
PhD - McGill University
Postdoctorate - Université de Montréal
Principal supervisor :
PhD - McGill University
Master's Research - McGill University
Principal supervisor :
Research Intern - McGill University
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :
Research Intern - McGill University
PhD - McGill University
Principal supervisor :
Collaborating researcher - McGill University
Master's Research - McGill University
Master's Research - Université de Montréal
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :
Collaborating researcher - McGill University
Principal supervisor :
PhD - McGill University
Undergraduate - McGill University
Master's Research - Université de Montréal
Principal supervisor :
PhD - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Principal supervisor :

Publications

The Paradox of Choice: On the Role of Attention in Hierarchical Reinforcement Learning
Andrei Cristian Nica
Decision-making AI agents are often faced with two important challenges: the depth of the planning horizon, and the branching factor due to … (see more)having many choices. Hierarchical reinforcement learning methods aim to solve the first problem, by providing shortcuts that skip over multiple time steps. To cope with the breadth, it is desirable to restrict the agent's attention at each step to a reasonable number of possible choices. The concept of affordances (Gibson, 1977) suggests that only certain actions are feasible in certain states. In this work, we first characterize "affordances" as a "hard" attention mechanism that strictly limits the available choices of temporally extended options. We then investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices. To this end, we present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options. Finally, we identify and empirically demonstrate the settings in which the "paradox of choice" arises, i.e. when having fewer but more meaningful choices improves the learning speed and performance of a reinforcement learning agent.
Estimating individual treatment effect on disability progression in multiple sclerosis using deep learning
Jean-Pierre R. Falet
Joshua D. Durso-Finley
Brennan Nichyporuk
Julien Schroeter
Francesca Bovis
Maria-Pia Sormani
Douglas Arnold
Assessing Intrapartum Risk of Hypoxic Ischemic Encephalopathy Using Fetal Heart Rate With Long Short-Term Memory Networks
"Derek Kweku DEGBEDZUI
Michael W Kuzniewicz
Marie-Coralie Cornet
Yvonne Wu
Heather Forquer
Lawrence Gerstley
Emily F. Hamilton
P. Warrick
Robert E. Kearney
This study investigated the prediction of the risk of hypoxic ischemic encephalopathy using intrapartum cardiotocography records with a long… (see more) short-term memory re-current neural network. Across the 12 hours of labour, HIE sensitivity rose from 0.25 to 0.56 as delivery approached while specificity remained approximately constant with a mean of 0.71 and standard deviation of 0.04. The results show that classification improves as delivery approaches but that performance needs improvement. Future work will address the limitations of this preliminary study by investigating input signal transformations and the use of other network architectures to improve the model performance.
PhyloPGM: boosting regulatory function prediction accuracy using evolutionary information
Faizy Ahsan
Zichao Yan
Abstract Motivation The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics … (see more)study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA–protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. Results In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA–RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA–RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. Availability and implementation The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM Supplementary information Supplementary data are available at Bioinformatics online.
Deep Learning Prediction of Response to Disease Modifying Therapy in Primary Progressive Multiple Sclerosis (P1-1.Virtual)
Jean-Pierre R. Falet
Joshua D. Durso-Finley
Brennan Nichyporuk
Julien Schroeter
Francesca Bovis
Maria-Pia Sormani
Douglas Arnold
Behind the Machine's Gaze: Neural Networks with Biologically-inspired Constraints Exhibit Human-like Visual Attention
Leo Schwinn
Bjoern Eskofier
Dario Zanca
By and large, existing computational models of visual attention tacitly assume perfect vision and full access to the stimulus and thereby de… (see more)viate from foveated biological vision. Moreover, modeling top-down attention is generally reduced to the integration of semantic features without incorporating the signal of a high-level visual tasks that have been shown to partially guide human attention. We propose the Neural Visual Attention (NeVA) algorithm to generate visual scanpaths in a top-down manner. With our method, we explore the ability of neural networks on which we impose a biologically-inspired foveated vision constraint to generate human-like scanpaths without directly training for this objective. The loss of a neural network performing a downstream visual task (i.e., classification or reconstruction) flexibly provides top-down guidance to the scanpath. Extensive experiments show that our method outperforms state-of-the-art unsupervised human attention models in terms of similarity to human scanpaths. Additionally, the flexibility of the framework allows to quantitatively investigate the role of different tasks in the generated visual behaviors. Finally, we demonstrate the superiority of the approach in a novel experiment that investigates the utility of scanpaths in real-world applications, where imperfect viewing conditions are given.
Towards Painless Policy Optimization for Constrained MDPs
Arushi Jain
Sharan Vaswani
Reza Babanezhad Harikandeh
Csaba Szepesvari
We study policy optimization in an infinite horizon, …
Estimating treatment effect for individuals with progressive multiple sclerosis using deep learning
JR Falet
Joshua D. Durso-Finley
Brennan Nichyporuk
Jan Schroeter
Francesca Bovis
Maria-Pia Sormani
Douglas Arnold
Self-Supervised Attention-Aware Reinforcement Learning
Visual saliency has emerged as a major visualization tool for interpreting deep reinforcement learning (RL) agents. However, much of the exi… (see more)sting research uses it as an analyzing tool rather than an inductive bias for policy learning. In this work, we use visual attention as an inductive bias for RL agents. We propose a novel self-supervised attention learning approach which can 1. learn to select regions of interest without explicit annotations, and 2. act as a plug for existing deep RL methods to improve the learning performance. We empirically show that the self-supervised attention-aware deep RL methods outperform the baselines in the context of both the rate of convergence and performance. Furthermore, the proposed self-supervised attention is not tied with specific policies, nor restricted to a specific scene. We posit that the proposed approach is a general self-supervised attention module for multi-task learning and transfer learning, and empirically validate the generalization ability of the proposed method. Finally, we show that our method learns meaningful object keypoints highlighting improvements both qualitatively and quantitatively.
Variance Penalized On-Policy and Off-Policy Actor-Critic
Arushi Jain
Gandharv Patil
Ayush Jain
Safe option-critic: learning safety in the option-critic architecture
Abstract Designing hierarchical reinforcement learning algorithms that exhibit safe behaviour is not only vital for practical applications b… (see more)ut also facilitates a better understanding of an agent’s decisions. We tackle this problem in the options framework (Sutton, Precup & Singh, 1999), a particular way to specify temporally abstract actions which allow an agent to use sub-policies with start and end conditions. We consider a behaviour as safe that avoids regions of state space with high uncertainty in the outcomes of actions. We propose an optimization objective that learns safe options by encouraging the agent to visit states with higher behavioural consistency. The proposed objective results in a trade-off between maximizing the standard expected return and minimizing the effect of model uncertainty in the return. We propose a policy gradient algorithm to optimize the constrained objective function. We examine the quantitative and qualitative behaviours of the proposed approach in a tabular grid world, continuous-state puddle world, and three games from the Arcade Learning Environment: Ms. Pacman, Amidar, and Q*Bert. Our approach achieves a reduction in the variance of return, boosts performance in environments with intrinsic variability in the reward structure, and compares favourably both with primitive actions and with risk-neutral options.
Optimal Spectral-Norm Approximate Minimization of Weighted Finite Automata
We address the approximate minimization problem for weighted finite automata (WFAs) with weights in …