Portrait de Doina Precup

Doina Precup

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure agrégée, McGill University, École d'informatique
Chef d'équipe de recherche, Google DeepMind
Sujets de recherche
Apprentissage automatique médical
Apprentissage par renforcement
Modèles probabilistes
Modélisation moléculaire
Raisonnement

Biographie

Doina Precup enseigne à l'Université McGill tout en menant des recherches fondamentales sur l'apprentissage par renforcement, notamment les applications de l'IA dans des domaines ayant des répercussions sociales, tels que les soins de santé. Elle s'intéresse à la prise de décision automatique dans des situations d'incertitude élevée.

Elle est membre de l'Institut canadien de recherches avancées (CIFAR) et de l'Association pour l'avancement de l'intelligence artificielle (AAAI), et dirige le bureau montréalais de DeepMind.

Ses spécialités sont les suivantes : intelligence artificielle, apprentissage machine, apprentissage par renforcement, raisonnement et planification sous incertitude, applications.

Étudiants actuels

Doctorat - McGill
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Maîtrise recherche - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Stagiaire de recherche - McGill
Stagiaire de recherche - McGill
Doctorat - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Maîtrise recherche - McGill
Co-superviseur⋅e :
Doctorat - McGill
Stagiaire de recherche - McGill
Maîtrise recherche - UdeM
Superviseur⋅e principal⋅e :
Postdoctorat - McGill
Maîtrise recherche - McGill
Doctorat - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Doctorat - McGill
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Collaborateur·rice de recherche - McGill
Maîtrise recherche - UdeM
Doctorat - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Doctorat - McGill
Co-superviseur⋅e :
Maîtrise recherche - McGill
Doctorat - McGill
Doctorat - McGill
Co-superviseur⋅e :
Stagiaire de recherche - McGill
Stagiaire de recherche - McGill
Baccalauréat - McGill
Doctorat - McGill
Doctorat - McGill
Co-superviseur⋅e :

Publications

Towards AI-designed genomes using a variational autoencoder
N.K. Dudek
Synthetic biology holds great promise for bioengineering applications such as environmental bioremediation, probiotic formulation, and produ… (voir plus)ction of renewable biofuels. Humans’ capacity to design biological systems from scratch is limited by their sheer size and complexity. We introduce a framework for training a machine learning model to learn the basic genetic principles underlying the gene composition of bacterial genomes. Our variational autoencoder model, DeepGenomeVector, was trained to take as input corrupted bacterial genetic blueprints (i.e. complete gene sets, henceforth ‘genome vectors’) in which most genes had been “removed”, and re-create the original. The resulting model effectively captures the complex dependencies in genomic networks, as evaluated by both qualitative and quantitative metrics. An in-depth functional analysis of a generated gene vector shows that its encoded pathways are interconnected and nearly complete. On the test set, where the model’s ability to re-generate the original, uncorrupted genome vector was evaluated, an AUC score of 0.98 and an F1 score of 0.82 provide support for the model’s ability to generate diverse, high-quality genome vectors. This work showcases the power of machine learning approaches for synthetic biology and highlights the possibility that just as humans can design an AI that animates a robot, AIs may one day be able to design a genomic blueprint that animates a carbon-based cell. SIGNIFICANCE STATEMENT Genomes serve as the blueprints for life, encoding complex networks of genes whose products must seamlessly interact to result in living organisms. In this work, we develop a framework for training a machine learning algorithm to learn the basic genetic principles that underlie genome composition. This innovation may eventually lead to improvements in the genome design process, increasing the speed and reliability of designs while decreasing cost. It further suggests that AI agents may one day have the potential to design blueprints for carbon-based life.
A cry for help: Early detection of brain injury in newborns
Charles Onu
Samantha Latremouille
Arsenii Gorin
Junhao Wang
Uchenna Ekwochi
P. Ubuane
O. Kehinde
Muhammad A. Salisu
Datonye Briggs
Hybrid Scattering Transform - Long Short-Term Memory Networks for Intrapartum Fetal Heart Rate Classification
"Derek Kweku DEGBEDZUI
Michael W Kuzniewicz
Marie-Coralie Cornet
Yvonne Wu
Heather Forquer
Lawrence Gerstley
Emily F. Hamilton
P. Warrick
Robert E. Kearney
This study assessed the early detection of the increased risk of hypoxic ischemic encephalopathy using raw fetal heart rate and its transfor… (voir plus)mation with scattering transform and a long short-term memory recurrent neural network. There was no significant difference between the two approaches. However, the use of scattering transform produced lower computational demands. Considering scalability to the large data in our database and computational efficiency, the experiments involving scattering transform coefficients will be selected to conduct subsequent experiments. Future works will address the limitations of this study, including the low model performance.
A Definition of Continual Reinforcement Learning
David Abel
Andre Barreto
Benjamin Van Roy
Hado van Hasselt
Satinder Singh
For SALE: State-Action Representation Learning for Deep Reinforcement Learning
Scott Fujimoto
Wei-Di Chang
Edward J. Smith
Shixiang Shane Gu
In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked… (voir plus) for environments with low-level states, such as physical control problems. This paper introduces SALE, a novel approach for learning embeddings that model the nuanced interaction between state and action, enabling effective representation learning from low-level states. We extensively study the design space of these embeddings and highlight important design considerations. We integrate SALE and an adaptation of checkpoints for RL into TD3 to form the TD7 algorithm, which significantly outperforms existing continuous control algorithms. On OpenAI gym benchmark tasks, TD7 has an average performance gain of 276.7% and 50.7% over TD3 at 300k and 5M time steps, respectively, and works in both the online and offline settings.
Prediction and Control in Continual Reinforcement Learning
Nishanth Anand
Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful po… (voir plus)licies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a _permanent_ value function, which holds general knowledge that persists over time, and a _transient_ value function, which allows quick adaptation to new situations. We establish theoretical results showing that our approach is well suited for continual learning and draw connections to the complementary learning systems (CLS) theory from neuroscience. Empirically, this approach improves performance significantly on both prediction and control problems.
Policy composition in reinforcement learning via multi-objective policy optimization
Shruti Mishra
Ankit Anand
Jordan Hoffmann
Nicolas Heess
Martin A. Riedmiller
Abbas Abdolmaleki
We enable reinforcement learning agents to learn successful behavior policies by utilizing relevant pre-existing teacher policies. The teach… (voir plus)er policies are introduced as objectives, in addition to the task objective, in a multi-objective policy optimization setting. Using the Multi-Objective Maximum a Posteriori Policy Optimization algorithm (Abdolmaleki et al. 2020), we show that teacher policies can help speed up learning, particularly in the absence of shaping rewards. In two domains with continuous observation and action spaces, our agents successfully compose teacher policies in sequence and in parallel, and are also able to further extend the policies of the teachers in order to solve the task. Depending on the specified combination of task and teacher(s), teacher(s) may naturally act to limit the final performance of an agent. The extent to which agents are required to adhere to teacher policies are determined by hyperparameters which determine both the effect of teachers on learning speed and the eventual performance of the agent on the task. In the humanoid domain (Tassa et al. 2018), we also equip agents with the ability to control the selection of teachers. With this ability, agents are able to meaningfully compose from the teacher policies to achieve a superior task reward on the walk task than in cases without access to the teacher policies. We show the resemblance of composed task policies with the corresponding teacher policies through videos.
Acceleration in Policy Optimization
Veronica Chelu
Tom Zahavy
Arthur Guez
Sebastian Flennerhag
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) through predictive and adapt… (voir plus)ive directions of (functional) policy ascent. Leveraging the connection between policy iteration and policy gradient methods, we view policy optimization algorithms as iteratively solving a sequence of surrogate objectives, local lower bounds on the original objective. We define optimism as predictive modelling of the future behavior of a policy, and hindsight adaptation as taking immediate and anticipatory corrective actions to mitigate accumulating errors from overshooting predictions or delayed responses to change. We use this shared lens to jointly express other well-known algorithms, including model-based policy improvement based on forward search, and optimistic meta-learning algorithms. We show connections with Anderson acceleration, Nesterov's accelerated gradient, extra-gradient methods, and linear extrapolation in the update rule. We analyze properties of the formulation, design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
On the Convergence of Bounded Agents
David Abel
Andre Barreto
Hado Philip van Hasselt
Benjamin Van Roy
Satinder Singh
When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence:… (voir plus) An agent converges when its behavior or performance in each environment state stops changing. However, as we shift the focus of our learning problem from the environment's state to the agent's state, the concept of an agent's convergence becomes significantly less clear. In this paper, we propose two complementary accounts of agent convergence in a framing of the reinforcement learning problem that centers around bounded agents. The first view says that a bounded agent has converged when the minimal number of states needed to describe the agent's future behavior cannot decrease. The second view says that a bounded agent has converged just when the agent's performance only changes if the agent's internal state changes. We establish basic properties of these two definitions, show that they accommodate typical views of convergence in standard settings, and prove several facts about their nature and relationship. We take these perspectives, definitions, and analysis to bring clarity to a central idea of the field.
Accelerating exploration and representation learning with offline pre-training
Bogdan Mazoure
Jake Bruce
Rob Fergus
Ankit Anand
Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement lea… (voir plus)rning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned from offline data. In this work, we follow the hypothesis that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward separately from a single collection of human demonstrations can significantly improve the sample efficiency on the challenging NetHack benchmark. We also ablate various components of our experimental setting and highlight crucial insights.
An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets
Nikhil Murali Vemgal
Elaine Lau
Reinforcement Learning (RL) algorithms aim to learn an optimal policy by iteratively sampling actions to learn how to maximize the total exp… (voir plus)ected return,
When Do Graph Neural Networks Help with Node Classification? Investigating the Impact of Homophily Principle on Node Distinguishability
Sitao Luan
Chenqing Hua
Minkai Xu
Qincheng Lu
Jiaqi Zhu
Xiao-Wen Chang
Jie Fu
Jure Leskovec