Portrait of Doina Precup

Doina Precup

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Research Team Leader, Google DeepMind
Research Topics
Medical Machine Learning
Molecular Modeling
Probabilistic Models
Reasoning
Reinforcement Learning

Biography

Doina Precup combines teaching at McGill University with fundamental research on reinforcement learning, in particular AI applications in areas of significant social impact, such as health care. She is interested in machine decision-making in situations where uncertainty is high.

In addition to heading the Montreal office of Google DeepMind, Precup is a Senior Fellow of the Canadian Institute for Advanced Research and a Fellow of the Association for the Advancement of Artificial Intelligence.

Her areas of speciality are artificial intelligence, machine learning, reinforcement learning, reasoning and planning under uncertainty, and applications.

Current Students

PhD - McGill University
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Principal supervisor :
Research Intern - McGill University
Master's Research - McGill University
PhD - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
Master's Research - McGill University
PhD - McGill University
PhD - McGill University
Master's Research - Université de Montréal
Principal supervisor :
PhD - McGill University
Postdoctorate - McGill University
Master's Research - McGill University
Collaborating Alumni - McGill University
PhD - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Master's Research - McGill University
Principal supervisor :
Master's Research - McGill University
Collaborating researcher - McGill University
PhD - Université de Montréal
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Collaborating Alumni - Université de Montréal
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
PhD - McGill University
Co-supervisor :
Research Intern - McGill University
Research Intern - McGill University
Undergraduate - McGill University
Co-supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :

Publications

Training Matters: Unlocking Potentials of Deeper Graph Convolutional Neural Networks
Sitao Luan
Mingde Zhao
Xiao-Wen Chang
When Do We Need Graph Neural Networks for Node Classification?
Sitao Luan
Chenqing Hua
Qincheng Lu
Jiaqi Zhu
Xiao-Wen Chang
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Johan Samir Obando Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob Nicolaus Foerster
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance s… (see more)cales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Johan Samir Obando Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob Nicolaus Foerster
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance s… (see more)cales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Johan Samir Obando Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob Nicolaus Foerster
On the Privacy of Selection Mechanisms with Gaussian Noise
Jonathan Lebensold
Borja Balle
Code as Reward: Empowering Reinforcement Learning with VLMs
David Venuto
Mohammad Sami Nur Islam
Martin Klissarov
Sherry Yang
Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and p… (see more)rovide feedback on task completion. In this paper, we aim to leverage these capabilities to support the training of reinforcement learning (RL) agents. In principle, VLMs are well suited for this purpose, as they can naturally analyze image-based observations and provide feedback (reward) on learning progress. However, inference in VLMs is computationally expensive, so querying them frequently to compute rewards would significantly slowdown the training of an RL agent. To address this challenge, we propose a framework named Code as Reward (VLM-CaR). VLM-CaR produces dense reward functions from VLMs through code generation, thereby significantly reducing the computational burden of querying the VLM directly. We show that the dense rewards generated through our approach are very accurate across a diverse set of discrete and continuous environments, and can be more effective in training RL policies than the original sparse environment rewards.
Effective Protein-Protein Interaction Exploration with PPIretrieval
Chenqing Hua
Connor Coley
Shuangjia Zheng
Effective Protein-Protein Interaction Exploration with PPIretrieval
Chenqing Hua
Connor W. Coley
Shuangjia Zheng
Protein-protein interactions (PPIs) are crucial in regulating numerous cellular functions, including signal transduction, transportation, an… (see more)d immune defense. As the accuracy of multi-chain protein complex structure prediction improves, the challenge has shifted towards effectively navigating the vast complex universe to identify potential PPIs. Herein, we propose PPIretrieval, the first deep learning-based model for protein-protein interaction exploration, which leverages existing PPI data to effectively search for potential PPIs in an embedding space, capturing rich geometric and chemical information of protein surfaces. When provided with an unseen query protein with its associated binding site, PPIretrieval effectively identifies a potential binding partner along with its corresponding binding site in an embedding space, facilitating the formation of protein-protein complexes.
Effective Protein-Protein Interaction Exploration with PPIretrieval
Chenqing Hua
Connor W. Coley
Shuangjia Zheng
Protein-protein interactions (PPIs) are crucial in regulating numerous cellular functions, including signal transduction, transportation, an… (see more)d immune defense. As the accuracy of multi-chain protein complex structure prediction improves, the challenge has shifted towards effectively navigating the vast complex universe to identify potential PPIs. Herein, we propose PPIretrieval, the first deep learning-based model for protein-protein interaction exploration, which leverages existing PPI data to effectively search for potential PPIs in an embedding space, capturing rich geometric and chemical information of protein surfaces. When provided with an unseen query protein with its associated binding site, PPIretrieval effectively identifies a potential binding partner along with its corresponding binding site in an embedding space, facilitating the formation of protein-protein complexes.
Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning
Harry Zhao 0001
Harry Zhao
Mingde Zhao
Safa Alver
Harm van Seijen
Romain Laroche
Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstracti… (see more)ons to generalize better in novel situations. It automatically decomposes the given task into smaller, more manageable subtasks, and thus enables sparse decision-making and focused computation on the relevant parts of the environment. The decomposition relies on the extraction of an abstracted proxy problem represented as a directed graph, in which vertices and edges are learned end-to-end from hindsight. Our theoretical analyses provide performance guarantees under appropriate assumptions and establish where our approach is expected to be helpful. Generalization-focused experiments validate Skipper’s significant advantage in zero-shot generalization, compared to some existing state-of-the-art hierarchical planning methods.
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo
Haque Ishfaq
Qingfeng Lan
Pan Xu
A. Rupam Mahmood
Animashree Anandkumar
Kamyar Azizzadenesheli
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcom… (see more)ings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it has a regret bound of