Portrait de Doina Precup

Doina Precup

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure agrégée, McGill University, École d'informatique
Chef d'équipe de recherche, Google DeepMind
Sujets de recherche
Apprentissage automatique médical
Apprentissage par renforcement
Modèles probabilistes
Modélisation moléculaire
Raisonnement

Biographie

Doina Precup enseigne à l'Université McGill tout en menant des recherches fondamentales sur l'apprentissage par renforcement, notamment les applications de l'IA dans des domaines ayant des répercussions sociales, tels que les soins de santé. Elle s'intéresse à la prise de décision automatique dans des situations d'incertitude élevée.

Elle est membre de l'Institut canadien de recherches avancées (CIFAR) et de l'Association pour l'avancement de l'intelligence artificielle (AAAI), et dirige le bureau montréalais de DeepMind.

Ses spécialités sont les suivantes : intelligence artificielle, apprentissage machine, apprentissage par renforcement, raisonnement et planification sous incertitude, applications.

Étudiants actuels

Collaborateur·rice alumni - McGill
Co-superviseur⋅e :
Collaborateur·rice alumni - McGill
Collaborateur·rice alumni - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - McGill
Co-superviseur⋅e :
Collaborateur·rice de recherche - UdeM
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - Birla Institute of Technology
Doctorat - McGill
Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Collaborateur·rice alumni - McGill
Doctorat - Polytechnique
Postdoctorat - McGill
Collaborateur·rice alumni - McGill
Collaborateur·rice alumni - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - McGill
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Stagiaire de recherche - McGill
Maîtrise recherche - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Collaborateur·rice alumni - McGill
Co-superviseur⋅e :

Publications

The Termination Critic
Anna Harutyunyan
Will Dabney
Diana Borsa
Nicolas Heess
Remi Munos
In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We… (voir plus) propose an algorithm that focuses on the termination function, as opposed to - as is common - the policy. The termination function is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option’s encoding - arguably a key reason for using abstractions. To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a "critic" for the termination function. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning.
The Impact of Time Interval between Extubation and Reintubation on Death or Bronchopulmonary Dysplasia in Extremely Preterm Infants
Wissam Shalish
Lara Kanbar
Lajos Kovacs
Sanjay Chawla
Martin Keszler
Smita Rao
Bogdan Panaitescu
Alyse Laliberte
Karen Brown
Robert E. Kearney
Guilherme M. Sant'Anna
1. Searching for Big-Oh in the Data: Inferring Asymptotic Complexity from Experiments
Catherine McGeoch
Peter Sanders 0001
Rudolf Fleischer
Paul R. Cohen
Avoidance Learning Using Observational Reinforcement Learning
David Venuto
Léonard Boussioux
Junhao Wang
Rola Dali
Imitation learning seeks to learn an expert policy from sampled demonstrations. However, in the real world, it is often difficult to find a … (voir plus)perfect expert and avoiding dangerous behaviors becomes relevant for safety reasons. We present the idea of \textit{learning to avoid}, an objective opposite to imitation learning in some sense, where an agent learns to avoid a demonstrator policy given an environment. We define avoidance learning as the process of optimizing the agent's reward while avoiding dangerous behaviors given by a demonstrator. In this work we develop a framework of avoidance learning by defining a suitable objective function for these problems which involves the \emph{distance} of state occupancy distributions of the expert and demonstrator policies. We use density estimates for state occupancy measures and use the aforementioned distance as the reward bonus for avoiding the demonstrator. We validate our theory with experiments using a wide range of partially observable environments. Experimental results show that we are able to improve sample efficiency during training compared to state of the art policy optimization and safety methods.
Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks
Mingde Zhao
Xiao-Wen Chang
Recently, neural network based approaches have achieved significant improvement for solving large, complex, graph-structured problems. Howev… (voir plus)er, their bottlenecks still need to be addressed, and the advantages of multi-scale information and deep architectures have not been sufficiently exploited. In this paper, we theoretically analyze how existing Graph Convolutional Networks (GCNs) have limited expressive power due to the constraint of the activation functions and their architectures. We generalize spectral graph convolution and deep GCN in block Krylov subspace forms and devise two architectures, both with the potential to be scaled deeper but each making use of the multi-scale information in different ways. We further show that the equivalence of these two architectures can be established under certain conditions. On several node classification tasks, with or without the help of validation, the two new architectures achieve better performance compared to many state-of-the-art methods.
Community size effect in artificial learning systems
Olivier Tieleman
Angeliki Lazaridou
Shibl Mourad
Charles Blundell
Motivated by theories of language and communication that explain why communities with large numbers of speakers have, on average, simpler la… (voir plus)nguages with more regularity, we cast the representation learning problem in terms of learning to communicate . Our starting point sees the traditional autoencoder setup as a single encoder with a fixed decoder partner that must learn to communicate. Generalizing from there, we introduce community -based autoencoders in which multiple encoders and decoders collectively learn representations by being randomly paired up on successive training iterations. We find that increasing community sizes reduce idiosyncrasies in the learned codes, resulting in representations that better encode concept categories and correlate with human feature norms.
Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning ( Supplementary Material ) A Proofs
More precisely, the WFA A = (α, {A}σ∈Σ,Ω) with n states and the linear 2-RNN M = (α,A,Ω) with n hidden units, where A ∈ Rn×Σ×n … (voir plus)is defined by A:,σ,: = A for all σ ∈ Σ, are such that fA(σ1σ2 · · ·σk) = fM (x1,x2, · · · ,xk) for all sequences of input symbols σ1, · · · , σk ∈ Σ, where for each i ∈ [k] the input vector xi ∈ RΣ is the one-hot encoding of the symbol σi. Proof. We first show by induction on k that, for any sequence σ1 · · ·σk ∈ Σ∗, the hidden state hk computed by M (see Eq. (1)) on the corresponding one-hot encoded sequence x1, · · · ,xk ∈ R satisfies hk = (A1 · · ·Ak )>α. The case k = 0 is immediate. Suppose the result true for sequences of length up to k. One can check easily check that A •2 xi = Ai for any index i. Using the induction hypothesis it then follows that hk+1 = A •1 hk •2 xk+1 = Ak+1 •1 hk = (Ak+1)hk = (Aσk+1)>(Aσ1 · · ·Ak )>α = (A1 · · ·Aσk+1)>α.
Data-driven Chance Constrained Programming based Electric Vehicle Penetration Analysis
Transportation electrification has been growing rapidly in recent years. The adoption of electric vehicles (EVs) could help to release the d… (voir plus)ependency on oil and reduce greenhouse gas emission. However, the increasing EV adoption will also impose a high demand on the power grid and may jeopardize the grid network infrastructures. For certain high EV penetration areas, the EV charging demand may lead to transformer overloading at peak hours which makes the maximal EV penetration analysis an urgent problem to solve. This paper proposes a data-driven chance constrained programming based framework for maximal EV penetration analysis. Simulation results are presented for a real-world neighborhood level network. The proposed framework could serve as a guidance for utility companies to schedule infrastructure upgrades.
An Empirical Study of Batch Normalization and Group Normalization in Conditional Computation
Batch normalization has been widely used to improve optimization in deep neural networks. While the uncertainty in batch statistics can act … (voir plus)as a regularizer, using these dataset statistics specific to the training set impairs generalization in certain tasks. Recently, alternative methods for normalizing feature activations in neural networks have been proposed. Among them, group normalization has been shown to yield similar, in some domains even superior performance to batch normalization. All these methods utilize a learned affine transformation after the normalization operation to increase representational power. Methods used in conditional computation define the parameters of these transformations as learnable functions of conditioning information. In this work, we study whether and where the conditional formulation of group normalization can improve generalization compared to conditional batch normalization. We evaluate performances on the tasks of visual question answering, few-shot learning, and conditional image generation.
Hindsight Credit Assignment
Anna Harutyunyan
Will Dabney
Mohammad Gheshlaghi Azar
Bilal Piot
Nicolas Heess
Hado van Hasselt
Greg Wayne
Satinder Singh
Remi Munos
Learning proposals for sequential importance samplers using reinforced variational inference
Arjun Karuvally
Simon Gravel
The problem of inferring unobserved values in a partially observed trajectory from a stochastic process can be considered as a structured pr… (voir plus)ediction problem. Traditionally inference is conducted using heuristic-based Monte Carlo methods. This work considers learning heuristics by leveraging a connection between policy optimization reinforcement learning and approximate inference. In particular, we learn proposal distributions used in importance samplers by casting it as a variational inference problem. We then rewrite the variational lower bound as a policy optimization problem similar to Weber et al. (2015) allowing us to transfer techniques from reinforcement learning. We apply this technique to a simple stochastic process as a proof-of-concept and show that while it is viable, it will require more engineering effort to scale inference for rare observations 1 .
Learning Reliable Policies in the Bandit Setting with Application to Adaptive Clinical Trials
The stochastic multi-armed bandit problem is a well-known model for studying the explorationexploitation trade-off. It has significant possi… (voir plus)ble applications in adaptive clinical trials, which allow for a dynamic change of patient allocation ratios. However, most bandit learning algorithms are designed with the goal of minimizing the expected regret. While this approach is useful in many areas, in clinical trials, it can be sensitive to outlier data especially when the sample size is small. In this article, we propose a modification of the BESA algorithm [Baransi, Maillard, and Mannor, 2014] which takes into account the variance in the action outcomes in addition to the mean. We present a regret bound for our approach and evaluate it empirically both on synthetic problems as well as on a dataset form the clinical trial literature. Our approach compares favorably to a suite of standard bandit algorithms.