Doina Precup

guangyuan.wang@mila.quebec

Guangyuan Wang

Research Intern - McGill University

Haque Ishfaq

PhD - McGill University

PhD - McGill University

huanghow@mila.quebec

Janarthanan Rajendran

Postdoctorate - Université de Montréal

Principal supervisor :

Sarath Chandar Anbil Parthipan

janarthanan.rajendran@mila.quebec

jonathan.colaco-carr@mila.quebec

Jaume Minano Masip

PhD - McGill University

masipmij@mila.quebec

Jesse Farebrother

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

Master's Research - McGill University

Principal supervisor :

Prakash Panangaden

Jonathan Lebensold

PhD - McGill University

Research Intern - McGill University

keyu.wang@mila.quebec

Kushal Arora

PhD - McGill University

Principal supervisor :

Lynn Cherif

Master's Research - McGill University

Co-supervisor :

lynn.cherif@mila.quebec

Mohammad Sami Nur Islam Islam

Mandana Samiei

PhD - McGill University

Co-supervisor :

PhD - McGill University

delvermm@mila.quebec

Martin Klissarov

PhD - McGill University

Harry Zhao

PhD - McGill University

Co-supervisor :

Research Intern - McGill University

mohammad-sami-nur.islam@mila.quebec

nathan.de-lara@mila.quebec

Nathan de Lara

Research Intern - McGill University

Nate Rahn

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

nathan.rahn@mila.quebec

Girdhar Neil Girdhar

Collaborating researcher - McGill University

neil.girdhar@mila.quebec

Nikhil Vemgal

Master's Research - McGill University

nikhil-murali.vemgal@mila.quebec

padideh.nouri@mila.quebec

Nishanth Anand Vemgal

PhD - McGill University

Master's Research - Université de Montréal

PhD - McGill University

Ray Chua

PhD - McGill University

Co-supervisor :

Blake Richards

chuaraym@mila.quebec

Riashat Islam

PhD - McGill University

Safa Alver

PhD - McGill University

alversaf@mila.quebec

Sahand Rezaei-Shoshtari

PhD - McGill University

Co-supervisor :

David Meger

sahand.rezaei-shoshtari@mila.quebec

PhD - McGill University

PhD - McGill University

Co-supervisor :

David Meger

fujimots@mila.quebec

Shahrad Mohammadzadeh

Collaborating researcher - McGill University

Principal supervisor :

Reihaneh Rabbany

shahrad.mohammadzadeh@mila.quebec

PhD - McGill University

PhD - McGill University

shuyuan.zhang@mila.quebec

Sitao Luan

PhD - McGill University

Steve Wen

Undergraduate - McGill University

steve.wen@mila.quebec

Sumana Basu

PhD - McGill University

Co-supervisor :

Adriana Romero Soriano

Master's Research - Université de Montréal

Principal supervisor :

Yoshua Bengio

thomas.jiralerspong@mila.quebec

PhD - McGill University

cheluver@mila.quebec

Wesley Chung

PhD - McGill University

Principal supervisor :

David Meger

chungwes@mila.quebec

Ray Luo

PhD - McGill University

Principal supervisor :

Xujie Si

luo.ziyan@mila.quebec

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Read the article

Publications

The Paradox of Choice: On the Role of Attention in Hierarchical Reinforcement Learning

Andrei Cristian Nica

Decision-making AI agents are often faced with two important challenges: the depth of the planning horizon, and the branching factor due to … (see more)having many choices. Hierarchical reinforcement learning methods aim to solve the first problem, by providing shortcuts that skip over multiple time steps. To cope with the breadth, it is desirable to restrict the agent's attention at each step to a reasonable number of possible choices. The concept of affordances (Gibson, 1977) suggests that only certain actions are feasible in certain states. In this work, we first characterize "affordances" as a "hard" attention mechanism that strictly limits the available choices of temporally extended options. We then investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices. To this end, we present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options. Finally, we identify and empirically demonstrate the settings in which the "paradox of choice" arises, i.e. when having fewer but more meaningful choices improves the learning speed and performance of a reinforcement learning agent.

2022-10-20

NeurIPS.cc/2022/Workshop/Attention (poster)

openreview.net

Estimating individual treatment effect on disability progression in multiple sclerosis using deep learning

Jean-Pierre R. Falet

Joshua D. Durso-Finley

Brennan Nichyporuk

Julien Schroeter

Francesca Bovis

Maria-Pia Sormani

Tal Arbel

Douglas Arnold

2022-09-26

Nature Communications (published)

Assessing Intrapartum Risk of Hypoxic Ischemic Encephalopathy Using Fetal Heart Rate With Long Short-Term Memory Networks

"Derek Kweku DEGBEDZUI

Michael W Kuzniewicz

Marie-Coralie Cornet

Yvonne Wu

Heather Forquer

Lawrence Gerstley

Emily F. Hamilton

P. Warrick

Robert E. Kearney

This study investigated the prediction of the risk of hypoxic ischemic encephalopathy using intrapartum cardiotocography records with a long… (see more) short-term memory re-current neural network. Across the 12 hours of labour, HIE sensitivity rose from 0.25 to 0.56 as delivery approached while specificity remained approximately constant with a mean of 0.71 and standard deviation of 0.04. The results show that classification improves as delivery approaches but that performance needs improvement. Future work will address the limitations of this preliminary study by investigating input signal transformations and the use of other network architectures to improve the model performance.

2022-09-04

2022 Computing in Cardiology (CinC) (published)

PhyloPGM: boosting regulatory function prediction accuracy using evolutionary information

Faizy Ahsan

Zichao Yan

Mathieu Blanchette

Abstract Motivation The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics … (see more)study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA–protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. Results In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA–RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA–RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. Availability and implementation The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM Supplementary information Supplementary data are available at Bioinformatics online.

2022-06-27

Bioinformatics (published)

Deep Learning Prediction of Response to Disease Modifying Therapy in Primary Progressive Multiple Sclerosis (P1-1.Virtual)

Jean-Pierre R. Falet

Joshua D. Durso-Finley

Brennan Nichyporuk

Julien Schroeter

Francesca Bovis

Maria-Pia Sormani

Tal Arbel

Douglas Arnold

2022-05-03

Neurology (published)

Behind the Machine's Gaze: Neural Networks with Biologically-inspired Constraints Exhibit Human-like Visual Attention

Leo Schwinn

Bjoern Eskofier

Dario Zanca

By and large, existing computational models of visual attention tacitly assume perfect vision and full access to the stimulus and thereby de… (see more)viate from foveated biological vision. Moreover, modeling top-down attention is generally reduced to the integration of semantic features without incorporating the signal of a high-level visual tasks that have been shown to partially guide human attention. We propose the Neural Visual Attention (NeVA) algorithm to generate visual scanpaths in a top-down manner. With our method, we explore the ability of neural networks on which we impose a biologically-inspired foveated vision constraint to generate human-like scanpaths without directly training for this objective. The loss of a neural network performing a downstream visual task (i.e., classification or reconstruction) flexibly provides top-down guidance to the scanpath. Extensive experiments show that our method outperforms state-of-the-art unsupervised human attention models in terms of similarity to human scanpaths. Additionally, the flexibility of the framework allows to quantitatively investigate the role of different tasks in the generated visual behaviors. Finally, we demonstrate the superiority of the approach in a novel experiment that investigates the utility of scanpaths in real-world applications, where imperfect viewing conditions are given.

2022-01-01

Trans. Mach. Learn. Res. (published)

openreview.net

Towards Painless Policy Optimization for Constrained MDPs

Arushi Jain

Sharan Vaswani

Reza Babanezhad Harikandeh

Csaba Szepesvari

We study policy optimization in an infinite horizon, …

2022-01-01

UAI (published)

openreview.net

Estimating treatment effect for individuals with progressive multiple sclerosis using deep learning

JR Falet

Joshua D. Durso-Finley

Brennan Nichyporuk

Jan Schroeter

Francesca Bovis

Maria-Pia Sormani

Tal Arbel

Douglas Arnold

2021-11-01

medRxiv (preprint)

Self-Supervised Attention-Aware Reinforcement Learning

Haiping Wu

Visual saliency has emerged as a major visualization tool for interpreting deep reinforcement learning (RL) agents. However, much of the exi… (see more)sting research uses it as an analyzing tool rather than an inductive bias for policy learning. In this work, we use visual attention as an inductive bias for RL agents. We propose a novel self-supervised attention learning approach which can 1. learn to select regions of interest without explicit annotations, and 2. act as a plug for existing deep RL methods to improve the learning performance. We empirically show that the self-supervised attention-aware deep RL methods outperform the baselines in the context of both the rate of convergence and performance. Furthermore, the proposed self-supervised attention is not tied with specific policies, nor restricted to a specific scene. We posit that the proposed approach is a general self-supervised attention module for multi-task learning and transfer learning, and empirically validate the generalization ability of the proposed method. Finally, we show that our method learns meaningful object keypoints highlighting improvements both qualitatively and quantitatively.

2021-05-18

AAAI Conference on Artificial Intelligence (published)

Variance Penalized On-Policy and Off-Policy Actor-Critic

Arushi Jain

Gandharv Patil

Ayush Jain

2021-05-18

Proceedings of the AAAI Conference on Artificial Intelligence (published)

arxiv.org

Safe option-critic: learning safety in the option-critic architecture

Arushi Jain

Abstract Designing hierarchical reinforcement learning algorithms that exhibit safe behaviour is not only vital for practical applications b… (see more)ut also facilitates a better understanding of an agent’s decisions. We tackle this problem in the options framework (Sutton, Precup & Singh, 1999), a particular way to specify temporally abstract actions which allow an agent to use sub-policies with start and end conditions. We consider a behaviour as safe that avoids regions of state space with high uncertainty in the outcomes of actions. We propose an optimization objective that learns safe options by encouraging the agent to visit states with higher behavioural consistency. The proposed objective results in a trade-off between maximizing the standard expected return and minimizing the effect of model uncertainty in the return. We propose a policy gradient algorithm to optimize the constrained objective function. We examine the quantitative and qualitative behaviours of the proposed approach in a tabular grid world, continuous-state puddle world, and three games from the Arcade Learning Environment: Ms. Pacman, Amidar, and Q*Bert. Our approach achieves a reduction in the variance of return, boosts performance in environments with intrinsic variability in the reward structure, and compares favourably both with primitive actions and with risk-neutral options.

2021-04-07

The Knowledge Engineering Review (published)

arxiv.org

Optimal Spectral-Norm Approximate Minimization of Weighted Finite Automata

Borja Balle

Clara Lacroce

Prakash Panangaden