Doina Precup

Jesse Farebrother

Doctorat - McGill

Superviseur⋅e principal⋅e :

Marc Gendron-Bellemare

Doctorat - McGill

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - Birla Institute of Technology

Jonathan Hu

Maîtrise recherche - McGill

Howard Huang

Doctorat - McGill

Haque Ishfaq

Collaborateur·rice alumni - McGill

Site web

Mohammad Sami Nur Islam Islam

Maîtrise recherche - McGill

Hangzhan Jin

Doctorat - Polytechnique

Doctorat - McGill

Postdoctorat - McGill

Jonathan Lebensold

Collaborateur·rice alumni - McGill

Collaborateur·rice alumni - McGill

Ray Luo

Doctorat - McGill

Superviseur⋅e principal⋅e :

G McCracken

Doctorat - McGill

Nazanin Mohammadi Sepahvand

Collaborateur·rice alumni - McGill

Shahrad Mohammadzadeh

Maîtrise recherche - McGill

Superviseur⋅e principal⋅e :

Gabriela Moisescu-Pareja

Collaborateur·rice de recherche - McGill

Co-superviseur⋅e :

Irina Rish

Padideh Nouri

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Stagiaire de recherche - McGill

Nate Rahn

Doctorat - McGill

Superviseur⋅e principal⋅e :

Marc Gendron-Bellemare

Manoosh Samiei

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Nishanth Anand Vemgal

Doctorat - McGill

Doctorat - McGill

Co-superviseur⋅e :

Samira Ebrahimi Kahou

Stagiaire de recherche - McGill

Zihan Wang

Doctorat - McGill

Site web

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Steve Wen

Maîtrise recherche - McGill

Co-superviseur⋅e :

Gregory Dudek

Zijing Wu

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - McGill

Harry Zhao

Collaborateur·rice alumni - McGill

Co-superviseur⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Lire l'article

Publications

When Waiting is not an Option: Learning Options with a Deliberation Cost

Jean Harb

Martin Klissarov

Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance. W… (voir plus)hile the problem of "how" to learn options is increasingly well understood, the question of "what" good options should be has remained elusive. We formulate our answer to what "good" options should be in the bounded rationality framework (Simon, 1957) through the notion of deliberation cost. We then derive practical gradient-based learning algorithms to implement this objective. Our results in the Arcade Learning Environment (ALE) show increased performance and interpretability.

2018-04-28

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

Constructing Temporal Abstractions Autonomously in Reinforcement Learning

2018-02-28

AI Magazine (publié)

Learning Robust Options

Daniel J. Mankowitz

Timothy A. Mann

Shie Mannor

Robust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose … (voir plus)parameters have strong uncertainty. Existing work uses value-based methods and the usual primitive action setting. In this paper, we propose robust methods for learning temporally abstract actions, in the framework of options. We present a Robust Options Policy Iteration (ROPI) algorithm with convergence guarantees, which learns options that are robust to model uncertainty. We utilize ROPI to learn robust options with the Robust Options Deep Q Network (RO-DQN) that solves multiple tasks and mitigates model misspecification due to model uncertainty. We present experimental results which suggest that policy iteration with linear features may have an inherent form of robustness when using coarse feature representations. In addition, we present experimental results which demonstrate that robustness helps policy iteration implemented on top of deep neural networks to generalize over a much broader range of dynamics than non-robust policy iteration.

2018-02-08

ArXiv (prépublication)

Patterns of reintubation in extremely preterm infants: a longitudinal cohort study

Wissam Shalish

Lara Kanbar

Martin Keszler

Sanjay Chawla

Lajos Kovacs

Smita Rao

Bogdan A Panaitescu

Alyse Laliberte

Karen Brown

Robert E Kearney

Guilherme M Sant'Anna

2018-01-30

Pediatric Research (publié)

Analyzing Alzheimer’s Disease Progression from Sequential Magnetic Resonance Imaging Scans Using Deep 3D Convolutional Neural Networks

Sumana Basu

Konrad Wagstyl

Azar Zandifar

Louis Collins

Adriana Romero

Alzheimer’s is a progressive, neurodegenerative disease, that causes irreversible damage to the brain tissue. It impairs the ability to fo… (voir plus)rm and retrieve memory, and eventually disrupts the natural flow of life, by affecting the ability to carry out even day to day activities. The disease is typically diagnosed from the symptoms (Mini Mental State Examination, [8]), such as decline in cognitive abilities, visual and/or speech impairment, loss of memory, rather than the structural changes in the brain (biomarker) that causes it. But the pathological changes in the brain start decades before the manifestation of the symptoms [7]. Magnetic Resonance Imaging (MRI) is capable of capturing the complex changes in the brain, even if it is difficult for humans to extract those features from the low contrast, multi-dimensional MRIs [1]. There is a considerable amount of work on analyzing Alzheimer’s disease. However, the vast majority intends to predict the state of the disease at the current time step.

2017-12-31

(publié)

www.semanticscholar.org

Disentangling the independently controllable factors of variation by interacting with the world

Valentin Thomas

Philippe Beaudoin

William Fedus

It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it rema… (voir plus)ins an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors, and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal.

2017-12-31

arXiv (prépublication)

Learning Safe Policies with Expert Guidance

Jessie Huang

Je-chun Huang

Fa Wu

Yang Cai

We propose a framework for ensuring safe behavior of a reinforcement learning agent when the reward function may be difficult to specify. In… (voir plus) order to do this, we rely on the existence of demonstrations from expert policies, and we provide a theoretical framework for the agent to optimize in the space of rewards consistent with its existing knowledge. We propose two methods to solve the resulting optimization: an exact ellipsoid-based method and a method in the spirit of the "follow-the-perturbed-leader" algorithm. Our experiments demonstrate the behavior of our algorithm in both discrete and continuous problems. The trained agent safely avoids states with potential negative effects while imitating the behavior of the expert in the other states.

2017-12-31

Advances in Neural Information Processing Systems 31 (NeurIPS 2018) (publié)

Optimizing Home Energy Management and Electric Vehicle Charging with Reinforcement Learning

Di Wu

Guillaume Rabusseau

Vincent François-Lavet

Benoit Boulet

Smart grids are advancing the management efficiency and security of power grids with the integration of energy storage, distributed controll… (voir plus)ers, and advanced meters. In particular, with the increasing prevalence of residential automation devices and distributed renewable energy generation, residential energy management is now drawing more attention. Meanwhile, the increasing adoption of electric vehicle (EV) brings more challenges and opportunities for smart residential energy management. This paper formalizes energy management for the residential home with EV charging as a Markov Decision Process and proposes reinforcement learning (RL) based control algorithms to address it. The objective of the proposed algorithms is to minimize the long-term operating cost. We further use a recurrent neural network (RNN) to model the electricity demand as a preprocessing step. Both the RNN prediction and latent representations are used as additional state features for the RL based control algorithms. Experiments on real-world data show that the proposed algorithms can significantly reduce the operating cost and peak power consumption compared to baseline control algorithms.

2017-12-31

(publié)

www.semanticscholar.org

Temporal Regularization for Markov Decision Process

Several applications of Reinforcement Learning suffer from instability due to high variance. This is especially prevalent in high dimensiona… (voir plus)l domains. Regularization is a commonly used technique in machine learning to reduce variance, at the cost of introducing some bias. Most existing regularization techniques focus on spatial (perceptual) regularization. Yet in reinforcement learning, due to the nature of the Bellman equation, there is an opportunity to also exploit temporal regularization based on smoothness in value estimates over trajectories. This paper explores a class of methods for temporal regularization. We formally characterize the bias induced by this technique using Markov chain concepts. We illustrate the various characteristics of temporal regularization via a sequence of simple discrete and continuous MDPs, and show that the technique provides improvement even in high-dimensional Atari games.

2017-12-31

Advances in Neural Information Processing Systems 31 (NeurIPS 2018) (publié)

dblp.uni-trier.de

Boosting Based Multiple Kernel Learning and Transfer Regression for Electricity Load Forecasting

Di Wu

Boyu Wang

Benoit Boulet

2017-12-29

Machine Learning and Knowledge Discovery in Databases (publié)

Predicting Extubation Readiness in Extreme Preterm Infants based on Patterns of Breathing*

Charles C. Onu

Lara J. Kanbar

Wissam Shalish

Karen A. Brown

Guilherme M. Sant'Anna

Robert E. Kearney

Extremely preterm infants commonly require intubation and invasive mechanical ventilation after birth. While the duration of mechanical vent… (voir plus)ilation should be minimized in order to avoid complications, extubation failure is associated with increases in morbidities and mortality. As part of a prospective observational study aimed at developing an accurate predictor of extubation readiness, Markov and semi-Markov chain models were applied to gain insight into the respiratory patterns of these infants, with more robust time-series modeling using semi-Markov models. This model revealed interesting similarities and differences between newborns who succeeded extubation and those who failed. The parameters of the model were further applied to predict extubation readiness via generative (joint likelihood) and discriminative (support vector machine) approaches. Results showed that up to 84\% of infants who failed extubation could have been accurately identified prior to extubation.

2017-11-30

2017 IEEE Symposium Series on Computational Intelligence (SSCI) (publié)

Learnings Options End-to-End for Continuous Action Tasks

Martin Klissarov

Jean Harb

We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup… (voir plus) [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a deliberation cost and train it with proximal policy optimization (Schulmanet al.[2017]) instead of vanilla policy gradient. Results on Mujoco domains arepromising, but lead to interesting questions aboutwhena given option should beused, an issue directly connected to the use of initiation sets.

2017-11-29

ArXiv (prépublication)