Portrait de Doina Precup

Doina Precup

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure agrégée, McGill University, École d'informatique
Chef d'équipe de recherche, Google DeepMind
Sujets de recherche
Apprentissage automatique médical
Apprentissage par renforcement
Modèles probabilistes
Modélisation moléculaire
Raisonnement

Biographie

Doina Precup enseigne à l'Université McGill tout en menant des recherches fondamentales sur l'apprentissage par renforcement, notamment les applications de l'IA dans des domaines ayant des répercussions sociales, tels que les soins de santé. Elle s'intéresse à la prise de décision automatique dans des situations d'incertitude élevée.

Elle est membre de l'Institut canadien de recherches avancées (CIFAR) et de l'Association pour l'avancement de l'intelligence artificielle (AAAI), et dirige le bureau montréalais de DeepMind.

Ses spécialités sont les suivantes : intelligence artificielle, apprentissage machine, apprentissage par renforcement, raisonnement et planification sous incertitude, applications.

Étudiants actuels

Collaborateur·rice alumni - McGill
Co-superviseur⋅e :
Collaborateur·rice alumni - McGill
Collaborateur·rice alumni - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - McGill
Co-superviseur⋅e :
Collaborateur·rice de recherche - UdeM
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - Birla Institute of Technology
Doctorat - McGill
Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Collaborateur·rice alumni - McGill
Doctorat - Polytechnique
Postdoctorat - McGill
Collaborateur·rice alumni - McGill
Collaborateur·rice alumni - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - McGill
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Stagiaire de recherche - McGill
Maîtrise recherche - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Collaborateur·rice alumni - McGill
Co-superviseur⋅e :

Publications

Policy Gradients Incorporating the Future
David Venuto
Ofir Nachum
Reasoning about the future -- understanding how decisions in the present time affect outcomes in the future -- is one of the central challen… (voir plus)ges for reinforcement learning (RL), especially in highly-stochastic or partially observable environments. While predicting the future directly is hard, in this work we introduce a method that allows an agent to "look into the future" without explicitly predicting it. Namely, we propose to allow an agent, during its training on past experience, to observe what \emph{actually} happened in the future at that time, while enforcing an information bottleneck to avoid the agent overly relying on this privileged information. This gives our agent the opportunity to utilize rich and useful information about the future trajectory dynamics in addition to the present. Our method, Policy Gradients Incorporating the Future (PGIF), is easy to implement and versatile, being applicable to virtually any policy gradient algorithm. We apply our proposed method to a number of off-the-shelf RL algorithms and show that PGIF is able to achieve higher reward faster in a variety of online and offline RL domains, as well as sparse-reward and partially observable environments.
The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning
Attention Option-Critic
Appendix: On the Expressivity of Markov Reward
David Abel
Will Dabney
Anna Harutyunyan
Mark K. Ho
Michael L. Littman
Satinder Singh
(Q1) What does it mean for Bob to *solve* one of these tasks? That is, if Alice chooses a SOAP, PO, or TO for Bob to learn to solve, when ca… (voir plus)n Alice determine Bob has solved the task? A: Bob can be said to be doing better on a given task if his behavior improves, as is typical in evaluating behavior under reward. The difference with SOAPs, POs, and TOs is that we measure improvement relative to the task rather than reward. For instance, given a SOAP, we might say that Bob has solved the task once he has found one of the good policies, and we might measure Bob’s progress on a task in terms of the distance of his greedy policy to one of the good policies (as done in our learning experiments). The same reasoning applies to POs and TOs: Bob is doing better on a task in so far as his greedy policy (or trajectories) is (are) higher up the ordering.
Behind the Machine's Gaze: Biologically Constrained Neural Networks Exhibit Human-like Visual Attention
B. Eskofier
Dario Zanca
.
Behind the Machine's Gaze: Neural Networks with Biologically-inspired Constraints Exhibit Human-like Visual Attention
Bjoern Eskofier
Dario Zanca
By and large, existing computational models of visual attention tacitly assume perfect vision and full access to the stimulus and thereby de… (voir plus)viate from foveated biological vision. Moreover, modeling top-down attention is generally reduced to the integration of semantic features without incorporating the signal of a high-level visual tasks that have been shown to partially guide human attention. We propose the Neural Visual Attention (NeVA) algorithm to generate visual scanpaths in a top-down manner. With our method, we explore the ability of neural networks on which we impose a biologically-inspired foveated vision constraint to generate human-like scanpaths without directly training for this objective. The loss of a neural network performing a downstream visual task (i.e., classification or reconstruction) flexibly provides top-down guidance to the scanpath. Extensive experiments show that our method outperforms state-of-the-art unsupervised human attention models in terms of similarity to human scanpaths. Additionally, the flexibility of the framework allows to quantitatively investigate the role of different tasks in the generated visual behaviors. Finally, we demonstrate the superiority of the approach in a novel experiment that investigates the utility of scanpaths in real-world applications, where imperfect viewing conditions are given.
Continuous MDP Homomorphisms and Homomorphic Policy Gradient
Abstraction has been widely studied as a way to improve the efficiency and generalization of reinforcement learning algorithms. In this pape… (voir plus)r, we study abstraction in the continuous-control setting. We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces. We derive a policy gradient theorem on the abstract MDP, which allows us to leverage approximate symmetries of the environment for policy optimization. Based on this theorem, we propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. We demonstrate the effectiveness of our method on benchmark tasks in the DeepMind Control Suite. Our method's ability to utilize MDP homomorphisms for representation learning leads to improved performance when learning from pixel observations.
Improving Robustness against Real-World and Worst-Case Distribution Shifts through Decision Region Quantification
Leon Bungert
A. Nguyen
Ren'e Raab
Falk Pulsmeyer
B. Eskofier
Dario Zanca
The reliability of neural networks is essential for their use in safety-critical applications. Existing approaches generally aim at improvin… (voir plus)g the robustness of neural networks to either real-world distribution shifts (e.g., common corruptions and perturbations, spatial transformations, and natural adversarial examples) or worst-case distribution shifts (e.g., optimized adversarial examples). In this work, we propose the Decision Region Quantification (DRQ) algorithm to improve the robustness of any differentiable pre-trained model against both real-world and worst-case distribution shifts in the data. DRQ analyzes the robustness of local decision regions in the vicinity of a given data point to make more reliable predictions. We theoretically motivate the DRQ algorithm by showing that it effectively smooths spurious local extrema in the decision surface. Furthermore, we propose an implementation using targeted and untargeted adversarial attacks. An extensive empirical evaluation shows that DRQ increases the robustness of adversarially and non-adversarially trained models against real-world and worst-case distribution shifts on several computer vision benchmark datasets.
Low-Rank Representation of Reinforcement Learning Policies
We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional… (voir plus) embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability and convergence guarantees. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly represented in a low-dimensional space while the embedded policy incurs almost no decrease in returns.
Proving Theorems using Incremental Learning and Hindsight Experience Replay
Maxwell Crouse
Eser Aygün
Laurent Orseau
Bassem Makni
Vernon Ralph Austel
Cristina Cornelio
Shajith Ikbal
Stephen McAleer
Vlad Firoiu
Pavan Kapanipathi
Lei Zhang
Ndivhuwo Makondo
Shibl Mourad
Traditional automated theorem provers for first-order logic depend on speed-optimized search and many handcrafted heuristics that are design… (voir plus)ed to work best over a wide range of domains. Machine learning approaches in literature either depend on these traditional provers to bootstrap themselves or fall short on reaching comparable performance. In this paper, we propose a general incremental learning algorithm for training domain specific provers for first-order logic without equality, based only on a basic given-clause algorithm, but using a learned clause-scoring function. Clauses are represented as graphs and presented to transformer networks with spectral features. To address the sparsity and the initial lack of training data as well as the lack of a natural curriculum, we adapt hindsight experience replay to theorem proving, so as to be able to learn even when no proof can be found. We show that provers trained this way can match and sometimes surpass state-of-the-art traditional provers on the TPTP dataset in terms of both quantity and quality of the proofs.
Revisiting Heterophily For Graph Neural Networks
Qincheng Lu
Jiaqi Zhu
Mingde Zhao
Xiao-Wen Chang
Graph Neural Networks (GNNs) extend basic Neural Networks (NNs) by using graph structures based on the relational inductive bias (homophily … (voir plus)assumption). While GNNs have been commonly believed to outperform NNs in real-world tasks, recent work has identified a non-trivial set of datasets where their performance compared to NNs is not satisfactory. Heterophily has been considered the main cause of this empirical observation and numerous works have been put forward to address it. In this paper, we first revisit the widely used homophily metrics and point out that their consideration of only graph-label consistency is a shortcoming. Then, we study heterophily from the perspective of post-aggregation node similarity and define new homophily metrics, which are potentially advantageous compared to existing ones. Based on this investigation, we prove that some harmful cases of heterophily can be effectively addressed by local diversification operation. Then, we propose the Adaptive Channel Mixing (ACM), a framework to adaptively exploit aggregation, diversification and identity channels node-wisely to extract richer localized information for diverse node heterophily situations. ACM is more powerful than the commonly used uni-channel framework for node classification tasks on heterophilic graphs and is easy to be implemented in baseline GNN layers. When evaluated on 10 benchmark node classification tasks, ACM-augmented baselines consistently achieve significant performance gain, exceeding state-of-the-art GNNs on most tasks without incurring significant computational burden.
Towards Painless Policy Optimization for Constrained MDPs
We study policy optimization in an infinite horizon, …