Doina Precup

Sumana Basu

PhD - McGill University

Co-supervisor :

Adriana Romero Soriano

Collaborating Alumni - McGill University

Lynn Cherif

Master's Research - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Principal supervisor :

David Meger

Jonathan Colaço Carr

Master's Research - McGill University

Principal supervisor :

Prakash Panangaden

Élodie Coté-Gauthier

Collaborating researcher - McGill University

Co-supervisor :

Isabeau Prémont-Schwarz

Franco Del Balso

Research Intern - Université de Montréal

Jesse Farebrother

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

PhD - McGill University

Principal supervisor :

PhD - McGill University

Haque Ishfaq

Collaborating Alumni - McGill University

Mohammad Sami Nur Islam Islam

Master's Research - McGill University

Arushi Jain

Collaborating Alumni - McGill University

PhD - Polytechnique Montréal

Flemming Kondrup

Postdoctorate - McGill University

Elaine Lau

Master's Research - McGill University

Jonathan Lebensold

Collaborating Alumni - McGill University

Undergraduate - McGill University

Ray Luo

PhD - McGill University

Principal supervisor :

G McCracken

PhD - McGill University

Nazanin Mohammadi Sepahvand

Collaborating Alumni - McGill University

Shahrad Mohammadzadeh

Master's Research - McGill University

Principal supervisor :

Gabriela Moisescu-Pareja

Collaborating researcher - McGill University

Co-supervisor :

Irina Rish

Padideh Nouri

PhD - Université de Montréal

Co-supervisor :

PhD - McGill University

Co-supervisor :

Nate Rahn

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

Sahand Rezaei-Shoshtari

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Nishanth Anand Vemgal

PhD - McGill University

PhD - McGill University

Co-supervisor :

Samira Ebrahimi Kahou

Zihan Wang

PhD - McGill University

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Guangyuan Wang

Research Intern - McGill University

Steve Wen

Master's Research - McGill University

Co-supervisor :

Gregory Dudek

Zijing Wu

PhD - McGill University

Principal supervisor :

PhD - McGill University

Harry Zhao

Collaborating Alumni - McGill University

Co-supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Read the article

Publications

Assessing Intrapartum Risk of Hypoxic Ischemic Encephalopathy Using Fetal Heart Rate With Long Short-Term Memory Networks

"Derek Kweku DEGBEDZUI

Michael W Kuzniewicz

Marie-Coralie Cornet

Yvonne Wu

Heather Forquer

Lawrence Gerstley

Emily F. Hamilton

P. Warrick

Robert E. Kearney

This study investigated the prediction of the risk of hypoxic ischemic encephalopathy using intrapartum cardiotocography records with a long… (see more) short-term memory re-current neural network. Across the 12 hours of labour, HIE sensitivity rose from 0.25 to 0.56 as delivery approached while specificity remained approximately constant with a mean of 0.71 and standard deviation of 0.04. The results show that classification improves as delivery approaches but that performance needs improvement. Future work will address the limitations of this preliminary study by investigating input signal transformations and the use of other network architectures to improve the model performance.

2022-09-04

2022 Computing in Cardiology (CinC) (published)

Deep learning, reinforcement learning, and world models

Yu Matsuo

Yann Lecun

Maneesh Sahani

David Silver

Masashi Sugiyama

Eiji Uchibe

J. Morimoto

2022-08-01

Neural Networks (published)

Automated prediction of extubation success in extremely preterm infants: the APEX multicenter study

Lara Kanbar

Wissam Shalish

Charles Onu

Samantha Latremouille

Lajos Kovacs

Martin Keszler

Sanjay Chawla

Karen A. Brown

R. Kearney

Guilherme M. Sant’Anna

2022-07-29

Pediatric Research (published)

On the Expressivity of Markov Reward (Extended Abstract)

David Abel

Will Dabney

Anna Harutyunyan

Mark K. Ho

Michael L. Littman

Satinder Singh

2022-07-23

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (published)

Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

Scott Fujimoto

David Meger

Ofir Nachum

Shixiang Shane Gu

In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is… (see more) uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.

2022-06-28

Proceedings of the 39th International Conference on Machine Learning (published)

proceedings.mlr.press

PhyloPGM: boosting regulatory function prediction accuracy using evolutionary information

Abstract Motivation The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics … (see more)study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA–protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. Results In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA–RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA–RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. Availability and implementation The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM Supplementary information Supplementary data are available at Bioinformatics online.

2022-06-27

Bioinformatics (published)

Understanding Decision-Time vs. Background Planning in Model-Based Reinforcement Learning

Safa Alver

In model-based reinforcement learning, an agent can leverage a learned model to improve its way of behaving in different ways. Two prevalent… (see more) approaches are decision-time planning and background planning. In this study, we are interested in understanding under what conditions and in which settings one of these two planning styles will perform better than the other in domains that require fast responses. After viewing them through the lens of dynamic programming, we first consider the classical instantiations of these planning styles and provide theoretical results and hypotheses on which one will perform better in the pure planning, planning&learning, and transfer learning settings. We then consider the modern instantiations of these planning styles and provide hypotheses on which one will perform better in the last two of the considered settings. Lastly, we perform several illustrative experiments to empirically validate both our theoretical results and hypotheses. Overall, our findings suggest that even though decision-time planning does not perform as well as background planning in their classical instantiations, in their modern instantiations, it can perform on par or better than background planning in both the planning&learning and transfer learning settings.

2022-06-16

ArXiv (preprint)

Deep Learning Prediction of Response to Disease Modifying Therapy in Primary Progressive Multiple Sclerosis (P1-1.Virtual)

Jean-Pierre R. Falet

Joshua D. Durso-Finley

Brennan Nichyporuk

Julien Schroeter

Francesca Bovis

Maria-Pia Sormani

Tal Arbel

Douglas Arnold

2022-05-03

Neurology (published)

Don't Freeze Your Embedding: Lessons from Policy Finetuning in Environment Transfer

Victoria Dean

Daniel Toyama

A common occurrence in reinforcement learning (RL) research is making use of a pretrained vision stack that converts image observations to l… (see more)atent vectors. Using a visual embedding in this way leaves open questions, though: should the vision stack be updated with the policy? In this work, we evaluate the effectiveness of such decisions in RL transfer settings. We introduce policy update formulations for use after pretraining in a different environment and analyze the performance of such formulations. Through this evaluation, we also detail emergent metrics of benchmark suites and present results on Atari and AndroidEnv.

2022-04-27

ICLR.cc/2022/Workshop/GPL (poster)

openreview.net

Learning how to Interact with a Complex Interface using Hierarchical Reinforcement Learning

Gheorghe Comanici

Amelia Glaese

Anita Gergely

Daniel Toyama

Zafarali Ahmed

Tyler Jackson

Philippe Hamel

Hierarchical Reinforcement Learning (HRL) allows interactive agents to decompose complex problems into a hierarchy of sub-tasks. Higher-leve… (see more)l tasks can invoke the solutions of lower-level tasks as if they were primitive actions. In this work, we study the utility of hierarchical decompositions for learning an appropriate way to interact with a complex interface. Specifically, we train HRL agents that can interface with applications in a simulated Android device. We introduce a Hierarchical Distributed Deep Reinforcement Learning architecture that learns (1) subtasks corresponding to simple finger gestures, and (2) how to combine these gestures to solve several Android tasks. Our approach relies on goal conditioning and can be used more generally to convert any base RL agent into an HRL agent. We use the AndroidEnv environment to evaluate our approach. For the experiments, the HRL agent uses a distributed version of the popular DQN algorithm to train different components of the hierarchy. While the native action space is completely intractable for simple DQN agents, our architecture can be used to establish an effective way to interact with different tasks, significantly improving the performance of the same DQN agent over different levels of abstraction.

2022-04-21

ArXiv (preprint)

Selective Credit Assignment

Veronica Chelu

Diana Borsa

Hado Philip van Hasselt

2022-02-20

ArXiv (preprint)

Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers

Amir Ardalan Kalantari

Mohammad Saeed Amini

Sarath Chandar

Much of recent Deep Reinforcement Learning success is owed to the neural architecture's potential to learn and use effective internal repres… (see more)entations of the world. While many current algorithms access a simulator to train with a large amount of data, in realistic settings, including while playing games that may be played against people, collecting experience can be quite costly. In this paper, we introduce a deep reinforcement learning architecture whose purpose is to increase sample efficiency without sacrificing performance. We design this architecture by incorporating advances achieved in recent years in the field of Natural Language Processing and Computer Vision. Specifically, we propose a visually attentive model that uses transformers to learn a self-attention mechanism on the feature maps of the state representation, while simultaneously optimizing return. We demonstrate empirically that this architecture improves sample complexity for several Atari environments, while also achieving better performance in some of the games.

2022-02-01

ArXiv (preprint)