Portrait de Doina Precup

Doina Precup

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure agrégée, McGill University, École d'informatique
Chef d'équipe de recherche, Google DeepMind
Sujets de recherche
Apprentissage automatique médical
Apprentissage par renforcement
Modèles probabilistes
Modélisation moléculaire
Raisonnement

Biographie

Doina Precup enseigne à l'Université McGill tout en menant des recherches fondamentales sur l'apprentissage par renforcement, notamment les applications de l'IA dans des domaines ayant des répercussions sociales, tels que les soins de santé. Elle s'intéresse à la prise de décision automatique dans des situations d'incertitude élevée.

Elle est membre de l'Institut canadien de recherches avancées (CIFAR) et de l'Association pour l'avancement de l'intelligence artificielle (AAAI), et dirige le bureau montréalais de DeepMind.

Ses spécialités sont les suivantes : intelligence artificielle, apprentissage machine, apprentissage par renforcement, raisonnement et planification sous incertitude, applications.

Étudiants actuels

Doctorat - McGill
Co-superviseur⋅e :
Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - McGill
Co-superviseur⋅e :
Stagiaire de recherche - UdeM
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Collaborateur·rice alumni - McGill
Doctorat - Polytechnique
Postdoctorat - McGill
Maîtrise recherche - McGill
Collaborateur·rice alumni - McGill
Baccalauréat - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - McGill
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Stagiaire de recherche - McGill
Maîtrise recherche - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Collaborateur·rice alumni - McGill
Co-superviseur⋅e :

Publications

Phylogenetic Manifold Regularization: A semi-supervised approach to predict transcription factor binding sites
The computational prediction of transcription factor binding sites remains a challenging problems in bioinformatics, despite significant met… (voir plus)hodological developments from the field of machine learning. Such computational models are essential to help interpret the non-coding portion of human genomes, and to learn more about the regulatory mechanisms controlling gene expression. In parallel, massive genome sequencing efforts have produced assembled genomes for hundred of vertebrate species, but this data is underused. We present PhyloReg, a new semi-supervised learning approach that can be used for a wide variety of sequence-to-function prediction problems, and that takes advantage of hundreds of millions of years of evolution to regularize predictors and improve accuracy. We demonstrate that PhyloReg can be used to better train a previously proposed deep learning model of transcription factor binding. Simulation studies further help delineate the benefits of the a pproach. G ains in prediction accuracy are obtained over a broad set of transcription factors and cell types.
Interference and Generalization in Temporal Difference Learning
We study the link between generalization and interference in temporal-difference (TD) learning. Interference is defined as the inner product… (voir plus) of two different gradients, representing their alignment. This quantity emerges as being of interest from a variety of observations about neural networks, parameter sharing and the dynamics of learning. We find that TD easily leads to low-interference, under-generalizing parameters, while the effect seems reversed in supervised learning. We hypothesize that the cause can be traced back to the interplay between the dynamics of interference and bootstrapping. This is supported empirically by several observations: the negative relationship between the generalization gap and interference in TD, the negative effect of bootstrapping on interference and the local coherence of targets, and the contrast between the propagation rate of information in TD(0) versus TD(
Invariant Causal Prediction for Block MDPs
Clare Lyle
Angelos Filos
Marta Z. Kwiatkowska
Yarin Gal
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. … (voir plus)In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.
What can I do here? A Theory of Affordances in Reinforcement Learning
Gheorghe Comanici
David Abel
Gradient Starvation: A Learning Proclivity in Neural Networks
We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks… (voir plus). Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks. Using tools from Dynamical Systems theory, we identify simple properties of learning dynamics during gradient descent that lead to this imbalance, and prove that such a situation can be expected given certain statistical structure in training data. Based on our proposed formalism, we develop guarantees for a novel regularization method aimed at decoupling feature learning dynamics, improving accuracy and robustness in cases hindered by gradient starvation. We illustrate our findings with simple and real-world out-of-distribution (OOD) generalization experiments.
A Study of Policy Gradient on a Class of Exactly Solvable Models
Colin Daniels
Anna M. Brandenberger
Policy gradient methods are extensively used in reinforcement learning as a way to optimize expected return. In this paper, we explore the e… (voir plus)volution of the policy parameters, for a special class of exactly solvable POMDPs, as a continuous-state Markov chain, whose transition probabilities are determined by the gradient of the distribution of the policy's value. Our approach relies heavily on random walk theory, specifically on affine Weyl groups. We construct a class of novel partially observable environments with controllable exploration difficulty, in which the value distribution, and hence the policy parameter evolution, can be derived analytically. Using these environments, we analyze the probabilistic convergence of policy gradient to different local maxima of the value function. To our knowledge, this is the first approach developed to analytically compute the landscape of policy gradient in POMDPs for a class of such environments, leading to interesting insights into the difficulty of this problem.
Diversity-Enriched Option-Critic
Temporal abstraction allows reinforcement learning agents to represent knowledge and develop strategies over different temporal scales. The … (voir plus)option-critic framework has been demonstrated to learn temporally extended actions, represented as options, end-to-end in a model-free setting. However, feasibility of option-critic remains limited due to two major challenges, multiple options adopting very similar behavior, or a shrinking set of task relevant options. These occurrences not only void the need for temporal abstraction, they also affect performance. In this paper, we tackle these problems by learning a diverse set of options. We introduce an information-theoretic intrinsic reward, which augments the task reward, as well as a novel termination objective, in order to encourage behavioral diversity in the option set. We show empirically that our proposed method is capable of learning options end-to-end on several discrete and continuous control tasks, outperforms option-critic by a wide margin. Furthermore, we show that our approach sustainably generates robust, reusable, reliable and interpretable options, in contrast to option-critic.
A Fully Tensorized Recurrent Neural Network
Keynote Lecture - Building Knowledge For AI AgentsWith Reinforcement Learning
Summary form only given, as follows. The complete presentation was not made available for publication as part of the conference proceedings.… (voir plus) Reinforcement learning allows autonomous agents to learn how to act in a stochastic, unknown environment, with which they can interact. Deep reinforcement learning, in particular, has achieved great success in well-defined application domains, such as Go or chess, in which an agent has to learn how to act and there is a clear success criterion. In this talk, I will focus on the potential role of reinforcement learning as a tool for building knowledge representations in AI agents whose goal is to perform continual learning. I will examine a key concept in reinforcement learning, the value function, and discuss its generalization to support various forms of predictive knowledge. I will also discuss the role of temporally extended actions, and their associated predictive models, in learning procedural knowledge. In order to tame the possible complexity of learning knowledge representations, reinforcement learning agents can use the concepts of intents (ie intended consequences of courses of actions) and affordances (which capture knowlege about where actions can be applied). Finally, I will discuss the challenge of how to evaluate reinforcement learning agents whose goal is not just to control their environment, but also to build knowledge about their world.
Fast reinforcement learning with generalized policy updates
Andre Barreto
Shaobo Hou
Diana Borsa
David Silver
The combination of reinforcement learning with deep learning is a promising approach to tackle important sequential decision-making problems… (voir plus) that are currently intractable. One obstacle to overcome is the amount of data needed by learning systems of this type. In this article, we propose to address this issue through a divide-and-conquer approach. We argue that complex decision problems can be naturally decomposed into multiple tasks that unfold in sequence or in parallel. By associating each task with a reward function, this problem decomposition can be seamlessly accommodated within the standard reinforcement-learning formalism. The specific way we do so is through a generalization of two fundamental operations in reinforcement learning: policy improvement and policy evaluation. The generalized version of these operations allow one to leverage the solution of some tasks to speed up the solution of others. If the reward function of a task can be well approximated as a linear combination of the reward functions of tasks previously solved, we can reduce a reinforcement-learning problem to a simpler linear regression. When this is not the case, the agent can still exploit the task solutions by using them to interact with and learn about the environment. Both strategies considerably reduce the amount of data needed to solve a reinforcement-learning problem.
A Brief Look at Generalization in Visual Meta-Reinforcement Learning
Due to the realization that deep reinforcement learning algorithms trained on high-dimensional tasks can strongly overfit to their training … (voir plus)environments, there have been several studies that investigated the generalization performance of these algorithms. However, there has been no similar study that evaluated the generalization performance of algorithms that were specifically designed for generalization, i.e. meta-reinforcement learning algorithms. In this paper, we assess the generalization performance of these algorithms by leveraging high-dimensional, procedurally generated environments. We find that these algorithms can display strong overfitting when they are evaluated on challenging tasks. We also observe that scalability to high-dimensional tasks with sparse rewards remains a significant problem among many of the current meta-reinforcement learning algorithms. With these results, we highlight the need for developing meta-reinforcement learning algorithms that can both generalize and scale.
Learning to Prove from Synthetic Theorems
Eser Aygün
Vlad Firoiu
Laurent Orseau
Shibl Mourad
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in t… (voir plus)raining successful deep learning models. To tackle this problem, we propose an approach that relies on training with synthetic theorems, generated from a set of axioms. We show that such theorems can be used to train an automated prover and that the learned prover transfers successfully to human-generated theorems. We demonstrate that a prover trained exclusively on synthetic theorems can solve a substantial fraction of problems in TPTP, a benchmark dataset that is used to compare state-of-the-art heuristic provers. Our approach outperforms a model trained on human-generated problems in most axiom sets, thereby showing the promise of using synthetic data for this task.