Portrait of Doina Precup

Doina Precup

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Research Team Leader, Google DeepMind
Research Topics
Medical Machine Learning
Molecular Modeling
Probabilistic Models
Reasoning
Reinforcement Learning

Biography

Doina Precup combines teaching at McGill University with fundamental research on reinforcement learning, in particular AI applications in areas of significant social impact, such as health care. She is interested in machine decision-making in situations where uncertainty is high.

In addition to heading the Montreal office of Google DeepMind, Precup is a Senior Fellow of the Canadian Institute for Advanced Research and a Fellow of the Association for the Advancement of Artificial Intelligence.

Her areas of speciality are artificial intelligence, machine learning, reinforcement learning, reasoning and planning under uncertainty, and applications.

Current Students

PhD - McGill University
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Principal supervisor :
Research Intern - McGill University
Master's Research - McGill University
Research Intern - Université de Montréal
PhD - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
PhD - McGill University
Master's Research - McGill University
PhD - McGill University
PhD - McGill University
Master's Research - Université de Montréal
Principal supervisor :
PhD - McGill University
Postdoctorate - McGill University
Master's Research - McGill University
Collaborating Alumni - McGill University
PhD - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
PhD - McGill University
Master's Research - McGill University
Principal supervisor :
Master's Research - McGill University
Collaborating researcher - McGill University
PhD - Université de Montréal
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Collaborating Alumni - Université de Montréal
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
PhD - McGill University
Co-supervisor :
Research Intern - McGill University
Research Intern - McGill University
Undergraduate - McGill University
Co-supervisor :
Collaborating researcher - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :

Publications

Discrete Probabilistic Inference as Control in Multi-path Environments
Tristan Deleu
Padideh Nouri
Nikolay Malkin
We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem, where the objective is to … (see more)find a stochastic policy such that objects are sampled at the end of this sequential process proportionally to some predefined reward. While we could use maximum entropy Reinforcement Learning (MaxEnt RL) to solve this problem for some distributions, it has been shown that in general, the distribution over states induced by the optimal policy may be biased in cases where there are multiple ways to generate the same object. To address this issue, Generative Flow Networks (GFlowNets) learn a stochastic policy that samples objects proportionally to their reward by approximately enforcing a conservation of flows across the whole Markov Decision Process (MDP). In this paper, we extend recent methods correcting the reward in order to guarantee that the marginal distribution induced by the optimal MaxEnt RL policy is proportional to the original reward, regardless of the structure of the underlying MDP. We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward. Finally, we study empirically the performance of multiple MaxEnt RL and GFlowNet algorithms on multiple problems involving sampling from discrete distributions.
Conditions on Preference Relations that Guarantee the Existence of Optimal Policies
Jonathan Colaco Carr
On learning history-based policies for controlling Markov decision processes
Gandharv Patil
On the Privacy of Selection Mechanisms with Gaussian Noise
Jonathan Lebensold
Borja Balle
Report Noisy Max and Above Threshold are two classical differentially private (DP) selection mechanisms. Their output is obtained by adding … (see more)noise to a sequence of low-sensitivity queries and reporting the identity of the query whose (noisy) answer satisfies a certain condition. Pure DP guarantees for these mechanisms are easy to obtain when Laplace noise is added to the queries. On the other hand, when instantiated using Gaussian noise, standard analyses only yield approximate DP guarantees despite the fact that the outputs of these mechanisms lie in a discrete space. In this work, we revisit the analysis of Report Noisy Max and Above Threshold with Gaussian noise and show that, under the additional assumption that the underlying queries are bounded, it is possible to provide pure ex-ante DP bounds for Report Noisy Max and pure ex-post DP bounds for Above Threshold. The resulting bounds are tight and depend on closed-form expressions that can be numerically evaluated using standard methods. Empirically we find these lead to tighter privacy accounting in the high privacy, low data regime. Further, we propose a simple privacy filter for composing pure ex-post DP guarantees, and use it to derive a fully adaptive Gaussian Sparse Vector Technique mechanism. Finally, we provide experiments on mobility and energy consumption datasets demonstrating that our Sparse Vector Technique is practically competitive with previous approaches and requires less hyper-parameter tuning.
CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
David Budaghyan
Arsenii Gorin
Charles Onu
This paper describes the Ubenwa CryCeleb dataset - a labeled collection of infant cries - and the accompanying CryCeleb 2023 task, which is … (see more)a public speaker verification challenge based on cry sounds. We released more than 6 hours of manually segmented cry sounds from 786 newborns for academic use, aiming to encourage research in infant cry analysis. The inaugural public competition attracted 59 participants, 11 of whom improved the baseline performance. The top-performing system achieved a significant improvement scoring 25.8% equal error rate, which is still far from the performance of state-of-the-art adult speaker verification systems. Therefore, we believe there is room for further research on this dataset, potentially extending beyond the verification task.
Training Matters: Unlocking Potentials of Deeper Graph Convolutional Neural Networks
Sitao Luan
Mingde Zhao
Xiao-Wen Chang
When Do We Need Graph Neural Networks for Node Classification?
Sitao Luan
Chenqing Hua
Qincheng Lu
Jiaqi Zhu
Xiao-Wen Chang
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Johan Samir Obando Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob Nicolaus Foerster
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Johan Samir Obando Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob Nicolaus Foerster
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance s… (see more)cales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Johan Samir Obando Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob Nicolaus Foerster
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance s… (see more)cales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
On the Privacy of Selection Mechanisms with Gaussian Noise
Jonathan Lebensold
Borja Balle
Code as Reward: Empowering Reinforcement Learning with VLMs
David Venuto
Mohammad Sami Nur Islam
Martin Klissarov
Sherry Yang
Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and p… (see more)rovide feedback on task completion. In this paper, we aim to leverage these capabilities to support the training of reinforcement learning (RL) agents. In principle, VLMs are well suited for this purpose, as they can naturally analyze image-based observations and provide feedback (reward) on learning progress. However, inference in VLMs is computationally expensive, so querying them frequently to compute rewards would significantly slowdown the training of an RL agent. To address this challenge, we propose a framework named Code as Reward (VLM-CaR). VLM-CaR produces dense reward functions from VLMs through code generation, thereby significantly reducing the computational burden of querying the VLM directly. We show that the dense rewards generated through our approach are very accurate across a diverse set of discrete and continuous environments, and can be more effective in training RL policies than the original sparse environment rewards.