Doina Precup

Jesse Farebrother

Doctorat - McGill

Superviseur⋅e principal⋅e :

Marc Gendron-Bellemare

Doctorat - McGill

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - Birla Institute of Technology

Jonathan Hu

Maîtrise recherche - McGill

Howard Huang

Doctorat - McGill

Haque Ishfaq

Collaborateur·rice alumni - McGill

Site web

Mohammad Sami Nur Islam Islam

Maîtrise recherche - McGill

Hangzhan Jin

Doctorat - Polytechnique

Doctorat - McGill

Postdoctorat - McGill

Jonathan Lebensold

Collaborateur·rice alumni - McGill

Collaborateur·rice alumni - McGill

Ray Luo

Doctorat - McGill

Superviseur⋅e principal⋅e :

G McCracken

Doctorat - McGill

Nazanin Mohammadi Sepahvand

Collaborateur·rice alumni - McGill

Shahrad Mohammadzadeh

Maîtrise recherche - McGill

Superviseur⋅e principal⋅e :

Gabriela Moisescu-Pareja

Collaborateur·rice de recherche - McGill

Co-superviseur⋅e :

Irina Rish

Padideh Nouri

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Stagiaire de recherche - McGill

Nate Rahn

Doctorat - McGill

Superviseur⋅e principal⋅e :

Marc Gendron-Bellemare

Manoosh Samiei

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Nishanth Anand Vemgal

Doctorat - McGill

Doctorat - McGill

Co-superviseur⋅e :

Samira Ebrahimi Kahou

Stagiaire de recherche - McGill

Zihan Wang

Doctorat - McGill

Site web

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Steve Wen

Maîtrise recherche - McGill

Co-superviseur⋅e :

Gregory Dudek

Zijing Wu

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - McGill

Harry Zhao

Collaborateur·rice alumni - McGill

Co-superviseur⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Lire l'article

Publications

Offline Policy Optimization in RL with Variance Regularizaton

Riashat Islam

Samarth Sinha

Homanga Bharadhwaj

Samin Yeasar Arnob

Zhuoran Yang

Animesh Garg

Zhaoran Wang

Lihong Li

2022-12-28

ArXiv (prépublication)

Towards Continual Reinforcement Learning: A Review and Perspectives

Khimya Khetarpal

Matthew D Riemer

Irina Rish

2022-12-21

Journal of Artificial Intelligence Research (publié)

Bayesian Q-learning With Imperfect Expert Demonstrations

Xiru Zhu

Guided exploration with expert demonstrations improves data efficiency for reinforcement learning, but current algorithms often overuse expe… (voir plus)rt information. We propose a novel algorithm to speed up Q-learning with the help of a limited amount of imperfect expert demonstrations. The algorithm avoids excessive reliance on expert data by relaxing the optimal expert assumption and gradually reducing the usage of uninformative expert data. Experimentally, we evaluate our approach on a sparse-reward chain environment and six more complicated Atari games with delayed rewards. With the proposed methods, we can achieve better results than Deep Q-learning from Demonstrations (Hester et al., 2017) in most environments.

2022-12-08

NeurIPS.cc/2022/Workshop/DeepRL (accepté)

Complete the Missing Half: Augmenting Aggregation Filtering with Diversification for Graph Convolutional Networks

Mingde Zhao

Xiao-Wen Chang

The core operation of current Graph Neural Networks (GNNs) is the aggregation enabled by the graph Laplacian or message passing, which filte… (voir plus)rs the neighborhood node information. Though effective for various tasks, in this paper, we show that they are potentially a problematic factor underlying all GNN methods for learning on certain datasets, as they force the node representations similar, making the nodes gradually lose their identity and become indistinguishable. Hence, we augment the aggregation operations with their dual, i.e. diversification operators that make the node more distinct and preserve the identity. Such augmentation replaces the aggregation with a two-channel filtering process that, in theory, is beneficial for enriching the node representations. In practice, the proposed two-channel filters can be easily patched on existing GNN methods with diverse training strategies, including spectral and spatial (message passing) methods. In the experiments, we observe desired characteristics of the models and significant performance boost upon the baselines on 9 node classification tasks.

2022-11-21

NeurIPS.cc/2022/Workshop/GLFrontiers (accepté)

When Do We Need Graph Neural Networks for Node Classification?

Sitao Luan

Chenqing Hua

Qincheng Lu

Jiaqi Zhu

Xiao-Wen Chang

2022-10-29

arXiv.org (prépublication)

Simulating Human Gaze with Neural Visual Attention

Leo Schwinn

Bjoern Eskofier

Dario Zanca

2022-10-19

NeurIPS.cc/2022/Workshop/GMML (présentation orale)

The Paradox of Choice: On the Role of Attention in Hierarchical Reinforcement Learning

Andrei Cristian Nica

Khimya Khetarpal

Decision-making AI agents are often faced with two important challenges: the depth of the planning horizon, and the branching factor due to … (voir plus)having many choices. Hierarchical reinforcement learning methods aim to solve the first problem, by providing shortcuts that skip over multiple time steps. To cope with the breadth, it is desirable to restrict the agent's attention at each step to a reasonable number of possible choices. The concept of affordances (Gibson, 1977) suggests that only certain actions are feasible in certain states. In this work, we first characterize "affordances" as a "hard" attention mechanism that strictly limits the available choices of temporally extended options. We then investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices. To this end, we present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options. Finally, we identify and empirically demonstrate the settings in which the "paradox of choice" arises, i.e. when having fewer but more meaningful choices improves the learning speed and performance of a reinforcement learning agent.

2022-10-19

NeurIPS.cc/2022/Workshop/Attention (poster)

Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement Learning

Jacob Shkrob

My Duc Tran

Sumana Basu

Mechanical ventilation is a key form of life support for patients with pulmonary impairment. Healthcare workers are required to continuously… (voir plus) adjust ventilator settings for each patient, a challenging and time consuming task. Hence, it would be beneficial to develop an automated decision support tool to optimize ventilation treatment. We present DeepVent, a Conservative Q-Learning (CQL) based offline Deep Reinforcement Learning (DRL) agent that learns to predict the optimal ventilator parameters for a patient to promote 90 day survival. We design a clinically relevant intermediate reward that encourages continuous improvement of the patient vitals as well as addresses the challenge of sparse reward in RL. We find that DeepVent recommends ventilation parameters within safe ranges, as outlined in recent clinical trials. The CQL algorithm offers additional safety by mitigating the overestimation of the value estimates of out-of-distribution states/actions. We evaluate our agent using Fitted Q Evaluation (FQE) and demonstrate that it outperforms physicians from the MIMIC-III dataset.

2022-10-04

ArXiv (prépublication)

Assessing Intrapartum Risk of Hypoxic-Ischemic Encephalopathy using Fetal Heart Rate with Long Short-term Memory Networks

"Derek Kweku DEGBEDZUI

Michael Kuzniewicz

Cornet Marie-Coralie

Yvonne Wu

Heather Forquer

Lawrence Gerstley

Emily Hamilton

Philip Warrick

Robert Kearney"

This study investigated the prediction of the risk of hypoxic ischemic encephalopathy using intrapartum cardiotocography records with a long… (voir plus) short-term memory re-current neural network. Across the 12 hours of labour, HIE sensitivity rose from 0.25 to 0.56 as delivery approached while specificity remained approximately constant with a mean of 0.71 and standard deviation of 0.04. The results show that classification improves as delivery approaches but that performance needs improvement. Future work will address the limitations of this preliminary study by investigating input signal transformations and the use of other network architectures to improve the model performance.

2022-09-03

2022 Computing in Cardiology (CinC) (publié)

Automated prediction of extubation success in extremely preterm infants: the APEX multicenter study

Lara J. Kanbar

Wissam Shalish

Charles C. Onu

Samantha Latremouille

Lajos Kovacs

Martin Keszler

Sanjay Chawla

Karen A. Brown

Robert E. Kearney

Guilherme M. Sant’Anna

2022-07-28

Pediatric Research (publié)

On the Expressivity of Markov Reward (Extended Abstract)

David Abel

Will Dabney

Anna Harutyunyan

Mark K. Ho

Michael L. Littman

Satinder Singh

2022-07-22

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (publié)

Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

Scott Fujimoto

David Meger

Ofir Nachum

Shixiang Shane Gu

In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is… (voir plus) uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.

2022-06-27

Proceedings of the 39th International Conference on Machine Learning (publié)