Doina Precup

Samin Yeasar Arnob

PhD - McGill University

Sumana Basu

Collaborating Alumni - McGill University

Co-supervisor :

Adriana Romero Soriano

Collaborating Alumni - McGill University

Raymond Chua

PhD - McGill University

Co-supervisor :

PhD - McGill University

Principal supervisor :

David Meger

Jonathan Colaço Carr

Master's Research - McGill University

Principal supervisor :

Prakash Panangaden

Élodie Coté-Gauthier

Collaborating researcher - McGill University

Franco Del Balso

Collaborating researcher - Université de Montréal

Jesse Farebrother

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

PhD - McGill University

Principal supervisor :

Collaborating researcher - Birla Institute of Technology

Jonathan Hu

Master's Research - McGill University

Howard Huang

PhD - McGill University

Haque Ishfaq

Collaborating Alumni - McGill University

Mohammad Sami Nur Islam Islam

Master's Research - McGill University

Hangzhan Jin

PhD - Polytechnique Montréal

Martin Klissarov

PhD - McGill University

Postdoctorate - McGill University

Jonathan Lebensold

Collaborating Alumni - McGill University

Collaborating Alumni - McGill University

Ray Luo

PhD - McGill University

Principal supervisor :

G McCracken

PhD - McGill University

Nazanin Mohammadi Sepahvand

Collaborating Alumni - McGill University

Shahrad Mohammadzadeh

Master's Research - McGill University

Principal supervisor :

Gabriela Moisescu-Pareja

Collaborating researcher - McGill University

Co-supervisor :

Irina Rish

Padideh Nouri

PhD - Université de Montréal

Co-supervisor :

PhD - McGill University

Co-supervisor :

Research Intern - McGill University

Nate Rahn

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

Manoosh Samiei

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Nishanth Anand Vemgal

PhD - McGill University

PhD - McGill University

Co-supervisor :

Samira Ebrahimi Kahou

Research Intern - McGill University

Zihan Wang

PhD - McGill University

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Steve Wen

Master's Research - McGill University

Co-supervisor :

Gregory Dudek

Zijing Wu

PhD - McGill University

Principal supervisor :

PhD - McGill University

Harry Zhao

Collaborating Alumni - McGill University

Co-supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Read the article

Publications

When Waiting is not an Option: Learning Options with a Deliberation Cost

Jean Harb

Martin Klissarov

Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance. W… (see more)hile the problem of "how" to learn options is increasingly well understood, the question of "what" good options should be has remained elusive. We formulate our answer to what "good" options should be in the bounded rationality framework (Simon, 1957) through the notion of deliberation cost. We then derive practical gradient-based learning algorithms to implement this objective. Our results in the Arcade Learning Environment (ALE) show increased performance and interpretability.

2018-04-28

Proceedings of the AAAI Conference on Artificial Intelligence (published)

Constructing Temporal Abstractions Autonomously in Reinforcement Learning

2018-02-28

AI Magazine (published)

Learning Robust Options

Daniel J. Mankowitz

Timothy A. Mann

Shie Mannor

Robust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose … (see more)parameters have strong uncertainty. Existing work uses value-based methods and the usual primitive action setting. In this paper, we propose robust methods for learning temporally abstract actions, in the framework of options. We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty. We utilize ROPI to learn robust options with the Robust Options Deep Q Network (RO-DQN) that solves multiple tasks and mitigates model misspecification due to model uncertainty. We present experimental results which suggest that policy iteration with linear features may have an inherent form of robustness when using coarse feature representations. In addition, we present experimental results which demonstrate that robustness helps policy iteration implemented on top of deep neural networks to generalize over a much broader range of dynamics than non-robust policy iteration.

2018-02-08

ArXiv (preprint)

Patterns of reintubation in extremely preterm infants: a longitudinal cohort study

Wissam Shalish

Lara Kanbar

Martin Keszler

Sanjay Chawla

Lajos Kovacs

Smita Rao

Bogdan A Panaitescu

Alyse Laliberte

Karen Brown

Robert E Kearney

Guilherme M Sant'Anna

2018-01-30

Pediatric Research (published)

Analyzing Alzheimer’s Disease Progression from Sequential Magnetic Resonance Imaging Scans Using Deep 3D Convolutional Neural Networks

Sumana Basu

Konrad Wagstyl

Azar Zandifar

Louis Collins

Adriana Romero

Alzheimer’s is a progressive, neurodegenerative disease, that causes irreversible damage to the brain tissue. It impairs the ability to fo… (see more)rm and retrieve memory, and eventually disrupts the natural flow of life, by affecting the ability to carry out even day to day activities. The disease is typically diagnosed from the symptoms (Mini Mental State Examination, [8]), such as decline in cognitive abilities, visual and/or speech impairment, loss of memory, rather than the structural changes in the brain (biomarker) that causes it. But the pathological changes in the brain start decades before the manifestation of the symptoms [7]. Magnetic Resonance Imaging (MRI) is capable of capturing the complex changes in the brain, even if it is difficult for humans to extract those features from the low contrast, multi-dimensional MRIs [1]. There is a considerable amount of work on analyzing Alzheimer’s disease. However, the vast majority intends to predict the state of the disease at the current time step.

2017-12-31

(published)

www.semanticscholar.org

Disentangling the independently controllable factors of variation by interacting with the world

Valentin Thomas

Philippe Beaudoin

William Fedus

It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it rema… (see more)ins an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors, and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal.

2017-12-31

arXiv (preprint)

Learning Safe Policies with Expert Guidance

Jessie Huang

Je-chun Huang

Fa Wu

Yang Cai

We propose a framework for ensuring safe behavior of a reinforcement learning agent when the reward function may be difficult to specify. In… (see more) order to do this, we rely on the existence of demonstrations from expert policies, and we provide a theoretical framework for the agent to optimize in the space of rewards consistent with its existing knowledge. We propose two methods to solve the resulting optimization: an exact ellipsoid-based method and a method in the spirit of the "follow-the-perturbed-leader" algorithm. Our experiments demonstrate the behavior of our algorithm in both discrete and continuous problems. The trained agent safely avoids states with potential negative effects while imitating the behavior of the expert in the other states.

2017-12-31

Advances in Neural Information Processing Systems 31 (NeurIPS 2018) (published)

Optimizing Home Energy Management and Electric Vehicle Charging with Reinforcement Learning

Di Wu

Guillaume Rabusseau

Vincent François-Lavet

Benoit Boulet

Smart grids are advancing the management efficiency and security of power grids with the integration of energy storage, distributed controll… (see more)ers, and advanced meters. In particular, with the increasing prevalence of residential automation devices and distributed renewable energy generation, residential energy management is now drawing more attention. Meanwhile, the increasing adoption of electric vehicle (EV) brings more challenges and opportunities for smart residential energy management. This paper formalizes energy management for the residential home with EV charging as a Markov Decision Process and proposes reinforcement learning (RL) based control algorithms to address it. The objective of the proposed algorithms is to minimize the long-term operating cost. We further use a recurrent neural network (RNN) to model the electricity demand as a preprocessing step. Both the RNN prediction and latent representations are used as additional state features for the RL based control algorithms. Experiments on real-world data show that the proposed algorithms can significantly reduce the operating cost and peak power consumption compared to baseline control algorithms.

2017-12-31

(published)

www.semanticscholar.org

Temporal Regularization for Markov Decision Process

Several applications of Reinforcement Learning suffer from instability due to high variance. This is especially prevalent in high dimensiona… (see more)l domains. Regularization is a commonly used technique in machine learning to reduce variance, at the cost of introducing some bias. Most existing regularization techniques focus on spatial (perceptual) regularization. Yet in reinforcement learning, due to the nature of the Bellman equation, there is an opportunity to also exploit temporal regularization based on smoothness in value estimates over trajectories. This paper explores a class of methods for temporal regularization. We formally characterize the bias induced by this technique using Markov chain concepts. We illustrate the various characteristics of temporal regularization via a sequence of simple discrete and continuous MDPs, and show that the technique provides improvement even in high-dimensional Atari games.

2017-12-31

Advances in Neural Information Processing Systems 31 (NeurIPS 2018) (published)

dblp.uni-trier.de

Boosting Based Multiple Kernel Learning and Transfer Regression for Electricity Load Forecasting

Di Wu

Boyu Wang

Benoit Boulet

2017-12-29

Machine Learning and Knowledge Discovery in Databases (published)

Predicting Extubation Readiness in Extreme Preterm Infants based on Patterns of Breathing*

Charles C. Onu

Lara J. Kanbar

Wissam Shalish

Karen A. Brown

Guilherme M. Sant'Anna

Robert E. Kearney

Extremely preterm infants commonly require intubation and invasive mechanical ventilation after birth. While the duration of mechanical vent… (see more)ilation should be minimized in order to avoid complications, extubation failure is associated with increases in morbidities and mortality. As part of a prospective observational study aimed at developing an accurate predictor of extubation readiness, Markov and semi-Markov chain models were applied to gain insight into the respiratory patterns of these infants, with more robust time-series modeling using semi-Markov models. This model revealed interesting similarities and differences between newborns who succeeded extubation and those who failed. The parameters of the model were further applied to predict extubation readiness via generative (joint likelihood) and discriminative (support vector machine) approaches. Results showed that up to 84\% of infants who failed extubation could have been accurately identified prior to extubation.

2017-11-30

2017 IEEE Symposium Series on Computational Intelligence (SSCI) (published)

Learnings Options End-to-End for Continuous Action Tasks

Martin Klissarov

Jean Harb

We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup… (see more) [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a deliberation cost and train it with proximal policy optimization (Schulmanet al.[2017]) instead of vanilla policy gradient. Results on Mujoco domains arepromising, but lead to interesting questions aboutwhena given option should beused, an issue directly connected to the use of initiation sets.

2017-11-29

ArXiv (preprint)