Portrait of Doina Precup

Doina Precup

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Research Team Leader, Google DeepMind

Biography

Doina Precup combines teaching at McGill University with fundamental research on reinforcement learning, in particular AI applications in areas of significant social impact, such as health care. She is interested in machine decision-making in situations where uncertainty is high.

In addition to heading the Montreal office of Google DeepMind, Precup is a Senior Fellow of the Canadian Institute for Advanced Research and a Fellow of the Association for the Advancement of Artificial Intelligence.

Her areas of speciality are artificial intelligence, machine learning, reinforcement learning, reasoning and planning under uncertainty, and applications.

Current Students

PhD - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Master's Research - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Principal supervisor :
Research Intern - McGill University
PhD - McGill University
PhD - McGill University
Co-supervisor :
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Master's Research - Université de Montréal
Principal supervisor :
Postdoctorate - McGill University
Master's Research - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Collaborating researcher - McGill University
Principal supervisor :
Master's Research - McGill University
Collaborating researcher - McGill University
Master's Research - Université de Montréal
PhD - McGill University
Principal supervisor :
Postdoctorate - Université de Montréal
Principal supervisor :
PhD - McGill University
Co-supervisor :
Master's Research - McGill University
PhD - McGill University
Research Intern - McGill University
Research Intern - McGill University
Undergraduate - McGill University
PhD - McGill University
PhD - McGill University
Co-supervisor :

Publications

When Do We Need Graph Neural Networks for Node Classification?
Sitao Luan
Chenqing Hua
Qincheng Lu
Jiaqi Zhu
Xiao-Wen Chang
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Johan Samir Obando Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob Nicolaus Foerster
On the Privacy of Selection Mechanisms with Gaussian Noise
Jonathan Lebensold
Borja Balle
Effective Protein-Protein Interaction Exploration with PPIretrieval
Chenqing Hua
Connor Coley
Shuangjia Zheng
Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning
Harry Zhao
Harry Zhao 0001
Mingde Zhao
Safa Alver
Harm van Seijen
Romain Laroche
Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstracti… (see more)ons to generalize better in novel situations. It automatically decomposes the given task into smaller, more manageable subtasks, and thus enables sparse decision-making and focused computation on the relevant parts of the environment. The decomposition relies on the extraction of an abstracted proxy problem represented as a directed graph, in which vertices and edges are learned end-to-end from hindsight. Our theoretical analyses provide performance guarantees under appropriate assumptions and establish where our approach is expected to be helpful. Generalization-focused experiments validate Skipper’s significant advantage in zero-shot generalization, compared to some existing state-of-the-art hierarchical planning methods.
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo
Haque Ishfaq
Qingfeng Lan
Pan Xu
A. Rupam Mahmood
Animashree Anandkumar
Kamyar Azizzadenesheli
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcom… (see more)ings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it has a regret bound of
Connecting Weighted Automata, Tensor Networks and Recurrent Neural Networks through Spectral Learning
Policy Gradient Methods in the Presence of Symmetries and State Abstractions
Sahand Rezaei-Shoshtari
Rosie Zhao
Nash Learning from Human Feedback
R'emi Munos
Michal Valko
Daniele Calandriello
M. G. Azar
Mark Rowland
Zhaohan Daniel Guo
Yunhao Tang
Matthieu Geist
Thomas Mesnard
Andrea Michi
Marco Selvi
Sertan Girgin
Nikola Momchev
Olivier Bachem
Daniel J Mankowitz
Bilal Piot
Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (see more)erences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to maximize the reward model through a reinforcement learning algorithm. However, an inherent limitation of current reward models is their inability to fully represent the richness of human preferences and their dependency on the sampling distribution. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a preference model, which is conditioned on two inputs given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. To demonstrate the effectiveness of our approach, we present experimental results involving the fine-tuning of a LLM for a text summarization task. We believe NLHF offers a compelling avenue for preference learning and policy optimization with the potential of advancing the field of aligning LLMs with human preferences.
Learning domain-invariant classifiers for infant cry sounds
Charles Onu
Hemanth K. Sheetha
Arsenii Gorin
MUDiff: Unified Diffusion for Complete Molecule Generation
Chenqing Hua
Sitao Luan
Minkai Xu
Zhitao Ying
Rex Ying
Jie Fu
Stefano Ermon
DGFN: Double Generative Flow Networks
Elaine Lau
Nikhil Murali Vemgal
Emmanuel Bengio