Portrait de Doina Precup

Doina Precup

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure agrégée, McGill University, École d'informatique
Chef d'équipe de recherche, Google DeepMind
Sujets de recherche
Apprentissage automatique médical
Apprentissage par renforcement
Modèles probabilistes
Modélisation moléculaire
Raisonnement

Biographie

Doina Precup enseigne à l'Université McGill tout en menant des recherches fondamentales sur l'apprentissage par renforcement, notamment les applications de l'IA dans des domaines ayant des répercussions sociales, tels que les soins de santé. Elle s'intéresse à la prise de décision automatique dans des situations d'incertitude élevée.

Elle est membre de l'Institut canadien de recherches avancées (CIFAR) et de l'Association pour l'avancement de l'intelligence artificielle (AAAI), et dirige le bureau montréalais de DeepMind.

Ses spécialités sont les suivantes : intelligence artificielle, apprentissage machine, apprentissage par renforcement, raisonnement et planification sous incertitude, applications.

Étudiants actuels

Doctorat - McGill
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Maîtrise recherche - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - McGill
Stagiaire de recherche - UdeM
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Maîtrise recherche - McGill
Postdoctorat - McGill
Maîtrise recherche - McGill
Collaborateur·rice alumni - McGill
Baccalauréat - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - McGill
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Doctorat - McGill
Co-superviseur⋅e :
Stagiaire de recherche - McGill
Maîtrise recherche - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Doctorat - McGill
Co-superviseur⋅e :

Publications

Appendix: On the Expressivity of Markov Reward
David Abel
Will Dabney
Anna Harutyunyan
Mark K. Ho
Michael L. Littman
Satinder Singh
(Q1) What does it mean for Bob to *solve* one of these tasks? That is, if Alice chooses a SOAP, PO, or TO for Bob to learn to solve, when ca… (voir plus)n Alice determine Bob has solved the task? A: Bob can be said to be doing better on a given task if his behavior improves, as is typical in evaluating behavior under reward. The difference with SOAPs, POs, and TOs is that we measure improvement relative to the task rather than reward. For instance, given a SOAP, we might say that Bob has solved the task once he has found one of the good policies, and we might measure Bob’s progress on a task in terms of the distance of his greedy policy to one of the good policies (as done in our learning experiments). The same reasoning applies to POs and TOs: Bob is doing better on a task in so far as his greedy policy (or trajectories) is (are) higher up the ordering.
Behind the Machine's Gaze: Biologically Constrained Neural Networks Exhibit Human-like Visual Attention
Leo Schwinn
B. Eskofier
Dario Zanca
.
Behind the Machine's Gaze: Neural Networks with Biologically-inspired Constraints Exhibit Human-like Visual Attention
Leo Schwinn
Bjoern Eskofier
Dario Zanca
By and large, existing computational models of visual attention tacitly assume perfect vision and full access to the stimulus and thereby de… (voir plus)viate from foveated biological vision. Moreover, modeling top-down attention is generally reduced to the integration of semantic features without incorporating the signal of a high-level visual tasks that have been shown to partially guide human attention. We propose the Neural Visual Attention (NeVA) algorithm to generate visual scanpaths in a top-down manner. With our method, we explore the ability of neural networks on which we impose a biologically-inspired foveated vision constraint to generate human-like scanpaths without directly training for this objective. The loss of a neural network performing a downstream visual task (i.e., classification or reconstruction) flexibly provides top-down guidance to the scanpath. Extensive experiments show that our method outperforms state-of-the-art unsupervised human attention models in terms of similarity to human scanpaths. Additionally, the flexibility of the framework allows to quantitatively investigate the role of different tasks in the generated visual behaviors. Finally, we demonstrate the superiority of the approach in a novel experiment that investigates the utility of scanpaths in real-world applications, where imperfect viewing conditions are given.
Continuous MDP Homomorphisms and Homomorphic Policy Gradient
Sahand Rezaei-Shoshtari
Rosie Zhao
Improving Robustness against Real-World and Worst-Case Distribution Shifts through Decision Region Quantification
Leo Schwinn
Leon Bungert
A. Nguyen
Ren'e Raab
Falk Pulsmeyer
B. Eskofier
Dario Zanca
The reliability of neural networks is essential for their use in safety-critical applications. Existing approaches generally aim at improvin… (voir plus)g the robustness of neural networks to either real-world distribution shifts (e.g., common corruptions and perturbations, spatial transformations, and natural adversarial examples) or worst-case distribution shifts (e.g., optimized adversarial examples). In this work, we propose the Decision Region Quantification (DRQ) algorithm to improve the robustness of any differentiable pre-trained model against both real-world and worst-case distribution shifts in the data. DRQ analyzes the robustness of local decision regions in the vicinity of a given data point to make more reliable predictions. We theoretically motivate the DRQ algorithm by showing that it effectively smooths spurious local extrema in the decision surface. Furthermore, we propose an implementation using targeted and untargeted adversarial attacks. An extensive empirical evaluation shows that DRQ increases the robustness of adversarially and non-adversarially trained models against real-world and worst-case distribution shifts on several computer vision benchmark datasets.
Proving theorems using Incremental Learning and Hindsight Experience Replay
Maxwell Crouse
Eser Aygün
Bassem Makni
Laurent Orseau
Vernon Ralph Austel
Xavier Glorot
Cristina Cornelio
Shajith Ikbal
Stephen M Mcaleer
Pavan Kapanipathi
Vlad Firoiu
Ndivhuwo Makondo
Lei M Zhang
Shibl Mourad
The highest performing ATP systems (e.g., [7, 18]) in first order logic have been evolving for decades and have grown to use an increasing n… (voir plus)umber of manually designed heuristics mixed with some machine learning, to obtain a large number of search strategies that are tried sequentially or in parallel. Some recent works [5, 13, 19] build on top of these provers, using modern machine learning techniques to augment, select or prioritize their already existing heuristics, with some success. Other recent works do not build on top of other provers, but still require existing proof examples as input (e.g., [9, 23]). Such machine-learning-based ATP systems can struggle to solve difficult problems when the training dataset does not provide problems of sufficiently diverse difficulties. In this paper, we propose an approach which can build a strong theorem prover without relying on existing domain-specific heuristics or on prior input data (in the form of proofs) to prime the learning. We strive to design a learning methodology for ATP that allows a system to improve even when there are large gaps in the difficulty of given set of theorems. In particular, given a set of conjectures without proofs, our system trains itself, based on its own attempts and (dis)proves an increasing number of conjectures, an approach which can be viewed as a form of incremental learning. Additionally, all the previous approaches [19, 1, 13] learn exclusively on successful proof attempts. When no new theorem can be proven, the learner may not be able to improve anymore and thus the system may not be able to obtain more training data. This could in principle happen even at the very start of training, if all the theorems available are too hard. To tackle this challenge, we adapt the idea of hindsight experience replay (HER) [3] to ATP: Clauses reached during proof attempts (whether successful or not) are turned into goals in hindsight, producing a large amount of ‘auxiliary’ theorems with proofs of varied difficulties for the learner, even in principle when no theorem from the original set can be proven initially. This leads to a smoother learning regime and a constantly improving learner. We evaluate our approach on two popular benchmarks: MPTP2078 [2] and M2k [17] and compare it both with TRAIL [1], a recent machine learning prover as well as with E prover [24, 7], one of the leading heuristic provers. Our proposed approach substantially outperforms TRAIL [1] on both datasets, surpasses E in the auto configuration with a 100s time limit, and is competitive with E in the autoschedule configuration with a 7 days time limit. In addition, our approach almost always (99.5% of cases) finds shorter proofs than E.
Revisiting Heterophily For Graph Neural Networks
Sitao Luan
Chenqing Hua
Qincheng Lu
Jiaqi Zhu
Harry Zhao
Mingde Zhao
Shuyuan Zhang
Xiao-Wen Chang
Graph Neural Networks (GNNs) extend basic Neural Networks (NNs) by using graph structures based on the relational inductive bias (homophily … (voir plus)assumption). While GNNs have been commonly believed to outperform NNs in real-world tasks, recent work has identified a non-trivial set of datasets where their performance compared to NNs is not satisfactory. Heterophily has been considered the main cause of this empirical observation and numerous works have been put forward to address it. In this paper, we first revisit the widely used homophily metrics and point out that their consideration of only graph-label consistency is a shortcoming. Then, we study heterophily from the perspective of post-aggregation node similarity and define new homophily metrics, which are potentially advantageous compared to existing ones. Based on this investigation, we prove that some harmful cases of heterophily can be effectively addressed by local diversification operation. Then, we propose the Adaptive Channel Mixing (ACM), a framework to adaptively exploit aggregation, diversification and identity channels node-wisely to extract richer localized information for diverse node heterophily situations. ACM is more powerful than the commonly used uni-channel framework for node classification tasks on heterophilic graphs and is easy to be implemented in baseline GNN layers. When evaluated on 10 benchmark node classification tasks, ACM-augmented baselines consistently achieve significant performance gain, exceeding state-of-the-art GNNs on most tasks without incurring significant computational burden.
Towards Painless Policy Optimization for Constrained MDPs
Arushi Jain
Sharan Vaswani
Reza Babanezhad Harikandeh
Csaba Szepesvari
We study policy optimization in an infinite horizon, …
Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning
Samin Yeasar Arnob
Riashat Islam
Single-Shot Pruning for Offline Reinforcement Learning
Samin Yeasar Arnob
Riyasat Ohib
Sergey Plis
Flexible Option Learning
Martin Klissarov
Flexible Option Learning
Martin Klissarov