Doina Precup

Sumana Basu

PhD - McGill University

Co-supervisor :

Adriana Romero Soriano

PhD - McGill University

Lynn Cherif

Master's Research - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Principal supervisor :

David Meger

Jonathan Colaço Carr

Master's Research - McGill University

Principal supervisor :

Prakash Panangaden

Élodie Coté-Gauthier

Collaborating researcher - McGill University

Franco Del Balso

Research Intern - Université de Montréal

Jesse Farebrother

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

PhD - McGill University

Principal supervisor :

Eilif Benjamin Muller

PhD - McGill University

Haque Ishfaq

PhD - McGill University

Website

Mohammad Sami Nur Islam Islam

Master's Research - McGill University

Arushi Jain

PhD - McGill University

PhD - McGill University

Postdoctorate - McGill University

Elaine Lau

Master's Research - McGill University

Jonathan Lebensold

Collaborating Alumni - McGill University

Undergraduate - McGill University

Ray Luo

PhD - McGill University

Principal supervisor :

G McCracken

PhD - McGill University

Nazanin Mohammadi Sepahvand

PhD - McGill University

Shahrad Mohammadzadeh

Master's Research - McGill University

Principal supervisor :

Gabriela Moisescu-Pareja

Collaborating researcher - McGill University

Co-supervisor :

Irina Rish

Padideh Nouri

PhD - Université de Montréal

Co-supervisor :

Charles Onu

PhD - McGill University

PhD - McGill University

Co-supervisor :

Nate Rahn

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

Sahand Rezaei-Shoshtari

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

Blake Richards

samiemandana@gmail.com

PhD - McGill University

Website

Nishanth Anand Vemgal

PhD - McGill University

PhD - McGill University

Priyesh Vijayan

PhD - McGill University

Co-supervisor :

Samira Ebrahimi Kahou

Research Intern - McGill University

Steve Wen

Master's Research - McGill University

Co-supervisor :

Gregory Dudek

Zijing Wu

PhD - McGill University

Co-supervisor :

PhD - McGill University

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Harry Zhao

PhD - McGill University

Co-supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Read the article

Publications

When Do We Need GNN for Node Classification?

Sitao Luan

Chenqing Hua

Qincheng Lu

Jiaqi Zhu

Xiao-Wen Chang

2022-10-30

arXiv.org (preprint)

Low-Rank Representation of Reinforcement Learning Policies

Bogdan Mazoure

Thang Doan

Tianyu Li

We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional… (see more) embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability and convergence guarantees. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly represented in a low-dimensional space while the embedded policy incurs almost no decrease in returns.

2022-10-27

Journal of Artificial Intelligence Research (published)

Simulating Human Gaze with Neural Visual Attention

Leo Schwinn

Bjoern Eskofier

Dario Zanca

2022-10-20

NeurIPS.cc/2022/Workshop/GMML (oral)

openreview.net

The Paradox of Choice: On the Role of Attention in Hierarchical Reinforcement Learning

Andrei Cristian Nica

Khimya Khetarpal

Decision-making AI agents are often faced with two important challenges: the depth of the planning horizon, and the branching factor due to … (see more)having many choices. Hierarchical reinforcement learning methods aim to solve the first problem, by providing shortcuts that skip over multiple time steps. To cope with the breadth, it is desirable to restrict the agent's attention at each step to a reasonable number of possible choices. The concept of affordances (Gibson, 1977) suggests that only certain actions are feasible in certain states. In this work, we first characterize "affordances" as a "hard" attention mechanism that strictly limits the available choices of temporally extended options. We then investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices. To this end, we present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options. Finally, we identify and empirically demonstrate the settings in which the "paradox of choice" arises, i.e. when having fewer but more meaningful choices improves the learning speed and performance of a reinforcement learning agent.

2022-10-20

NeurIPS.cc/2022/Workshop/Attention (poster)

openreview.net

Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement Learning

Flemming Kondrup

Thomas Jiralerspong

Elaine Lau

Nathan de Lara

Jacob A. Shkrob

My Duc Tran

Sumana Basu

2022-10-05

ArXiv (preprint)

arxiv.org

Estimating individual treatment effect on disability progression in multiple sclerosis using deep learning

Jean-Pierre R. Falet

Joshua D. Durso-Finley

Brennan Nichyporuk

Julien Schroeter

Francesca Bovis

Maria-Pia Sormani

Tal Arbel

Douglas Arnold

2022-09-26

Nature Communications (published)

Assessing Intrapartum Risk of Hypoxic Ischemic Encephalopathy Using Fetal Heart Rate With Long Short-Term Memory Networks

"Derek Kweku DEGBEDZUI

Michael W Kuzniewicz

Marie-Coralie Cornet

Yvonne Wu

Heather Forquer

Lawrence Gerstley

Emily F. Hamilton

P. Warrick

Robert E. Kearney

This study investigated the prediction of the risk of hypoxic ischemic encephalopathy using intrapartum cardiotocography records with a long… (see more) short-term memory re-current neural network. Across the 12 hours of labour, HIE sensitivity rose from 0.25 to 0.56 as delivery approached while specificity remained approximately constant with a mean of 0.71 and standard deviation of 0.04. The results show that classification improves as delivery approaches but that performance needs improvement. Future work will address the limitations of this preliminary study by investigating input signal transformations and the use of other network architectures to improve the model performance.

2022-09-04

2022 Computing in Cardiology (CinC) (published)

Deep learning, reinforcement learning, and world models

Yu Matsuo

Yann LeCun

Maneesh Sahani

David Silver

Masashi Sugiyama

Eiji Uchibe

J. Morimoto

2022-08-01

Neural Networks (published)

Automated prediction of extubation success in extremely preterm infants: the APEX multicenter study

Lara Kanbar

Wissam Shalish

Charles Onu

Samantha Latremouille

Lajos Kovacs

Martin Keszler

Sanjay Chawla

Karen A. Brown

R. Kearney

Guilherme M. Sant’Anna

2022-07-29

Pediatric Research (published)

On the Expressivity of Markov Reward (Extended Abstract)

David Abel

Will Dabney

Anna Harutyunyan

Mark K. Ho

Michael L. Littman

Satinder Singh

2022-07-23

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (published)

Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

Scott Fujimoto

David Meger

Ofir Nachum

Shixiang Shane Gu

In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is… (see more) uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.

2022-06-28

Proceedings of the 39th International Conference on Machine Learning (published)

proceedings.mlr.press

arxiv.org

PhyloPGM: boosting regulatory function prediction accuracy using evolutionary information

Faizy Ahsan

Zichao Yan

Mathieu Blanchette

Abstract Motivation The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics … (see more)study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA–protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. Results In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA–RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA–RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. Availability and implementation The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM Supplementary information Supplementary data are available at Bioinformatics online.

2022-06-27

Bioinformatics (published)