Doina Precup

Samin Yeasar Arnob

PhD - McGill University

Sumana Basu

Collaborating Alumni - McGill University

Co-supervisor :

Adriana Romero Soriano

Collaborating Alumni - McGill University

Raymond Chua

PhD - McGill University

Co-supervisor :

PhD - McGill University

Principal supervisor :

David Meger

Jonathan Colaço Carr

Master's Research - McGill University

Principal supervisor :

Prakash Panangaden

Élodie Coté-Gauthier

Collaborating researcher - McGill University

Franco Del Balso

Collaborating researcher - Université de Montréal

Jesse Farebrother

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

PhD - McGill University

Principal supervisor :

Collaborating researcher - Birla Institute of Technology

Jonathan Hu

Master's Research - McGill University

Howard Huang

PhD - McGill University

Haque Ishfaq

Collaborating Alumni - McGill University

Mohammad Sami Nur Islam Islam

Master's Research - McGill University

Hangzhan Jin

PhD - Polytechnique Montréal

Martin Klissarov

PhD - McGill University

Postdoctorate - McGill University

Jonathan Lebensold

Collaborating Alumni - McGill University

Collaborating Alumni - McGill University

Ray Luo

PhD - McGill University

Principal supervisor :

G McCracken

PhD - McGill University

Nazanin Mohammadi Sepahvand

Collaborating Alumni - McGill University

Shahrad Mohammadzadeh

Master's Research - McGill University

Principal supervisor :

Gabriela Moisescu-Pareja

Collaborating researcher - McGill University

Co-supervisor :

Irina Rish

Padideh Nouri

PhD - Université de Montréal

Co-supervisor :

PhD - McGill University

Co-supervisor :

Research Intern - McGill University

Nate Rahn

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

Manoosh Samiei

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Nishanth Anand Vemgal

PhD - McGill University

PhD - McGill University

Co-supervisor :

Samira Ebrahimi Kahou

Research Intern - McGill University

Zihan Wang

PhD - McGill University

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Steve Wen

Master's Research - McGill University

Co-supervisor :

Gregory Dudek

Zijing Wu

PhD - McGill University

Principal supervisor :

PhD - McGill University

Harry Zhao

Collaborating Alumni - McGill University

Co-supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Read the article

Publications

Recurrent Value Functions

Pierre Thodoroff

Nishanth Anand

Lucas Caccia

Joelle Pineau

Despite recent successes in Reinforcement Learning, value-based methods often suffer from high variance hindering performance. In this paper… (see more), we illustrate this in a continuous control setting where state of the art methods perform poorly whenever sensor noise is introduced. To overcome this issue, we introduce Recurrent Value Functions (RVFs) as an alternative to estimate the value function of a state. We propose to estimate the value function of the current state using the value function of past states visited along the trajectory. Due to the nature of their formulation, RVFs have a natural way of learning an emphasis function that selectively emphasizes important states. First, we establish RVF's asymptotic convergence properties in tabular settings. We then demonstrate their robustness on a partially observable domain and continuous control tasks. Finally, we provide a qualitative interpretation of the learned emphasis function.

2018-12-31

arXiv (preprint)

Temporally Extended Metrics for Markov Decision Processes.

Philip Amortila

Bellemare Marc-Emmanuel

Prakash Panangaden

2018-12-31

SafeAI@AAAI (published)

dblp.uni-trier.de

The Option Keyboard: Combining Skills in Reinforcement Learning

Andre Barreto

Diana Borsa

Shaobo Hou

Gheorghe Comanici

Eser Aygün

Philippe Hamel

Daniel Toyama

Jonathan J. Hunt

Shibl Mourad

David Silver

The ability to combine known skills to create new ones may be crucial in the solution of complex reinforcement learning problems that unfold… (see more) over extended periods. We argue that a robust way of combining skills is to define and manipulate them in the space of pseudo-rewards (or "cumulants"). Based on this premise, we propose a framework for combining skills using the formalism of options. We show that every deterministic option can be unambiguously represented as a cumulant defined in an extended domain. Building on this insight and on previous results on transfer learning, we show how to approximate options whose cumulants are linear combinations of the cumulants of known options. This means that, once we have learned options associated with a set of cumulants, we can instantaneously synthesise options induced by any linear combination of them, without any learning involved. We describe how this framework provides a hierarchical interface to the environment whose abstract actions correspond to combinations of basic skills. We demonstrate the practical benefits of our approach in a resource management problem and a navigation task involving a quadrupedal simulated robot.

2018-12-31

NeurIPS (published)

The Termination Critic

Anna Harutyunyan

Will Dabney

Diana Borsa

Nicolas Heess

Remi Munos

In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We… (see more) propose an algorithm that focuses on the termination function, as opposed to - as is common - the policy. The termination function is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option’s encoding - arguably a key reason for using abstractions.To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a “critic” for the termination function. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning.

2018-12-31

AISTATS (published)

Uncertainty Aware Learning from Demonstrations in Multiple Contexts using Bayesian Neural Networks

Sanjay Thakur

Herke van Hoof

Juan Camilo Gamboa Higuera

David Meger

Diversity of environments is a key challenge that causes learned robotic controllers to fail due to the discrepancies between the training a… (see more)nd evaluation conditions. Training from demonstrations in various conditions can mitigate---but not completely prevent---such failures. Learned controllers such as neural networks typically do not have a notion of uncertainty that allows to diagnose an offset between training and testing conditions, and potentially intervene. In this work, we propose to use Bayesian Neural Networks, which have such a notion of uncertainty. We show that uncertainty can be leveraged to consistently detect situations in high-dimensional simulated and real robotic domains in which the performance of the learned controller would be sub-par. Also, we show that such an uncertainty based solution allows making an informed decision about when to invoke a fallback strategy. One fallback strategy is to request more data. We empirically show that providing data only when requested results in increased data-efficiency.

2018-12-31

ICRA (published)

Clustering-Oriented Representation Learning with Attractive-Repulsive Loss

Lucas Caccia

Jackie CK Cheung

The standard loss function used to train neural network classifiers, categorical cross-entropy (CCE), seeks to maximize accuracy on the trai… (see more)ning data; building useful representations is not a necessary byproduct of this objective. In this work, we propose clustering-oriented representation learning (COREL) as an alternative to CCE in the context of a generalized attractive-repulsive loss framework. COREL has the consequence of building latent representations that collectively exhibit the quality of natural clustering within the latent space of the final hidden layer, according to a predefined similarity function. Despite being simple to implement, COREL variants outperform or perform equivalently to CCE in a variety of scenarios, including image and news article classification using both feed-forward and convolutional neural networks. Analysis of the latent spaces created with different similarity functions facilitates insights on the different use cases COREL variants can satisfy, where the Cosine-COREL variant makes a consistently clusterable latent space, while Gaussian-COREL consistently obtains better classification accuracy than CCE.

2018-12-17

ArXiv (preprint)

Environments for Lifelong Reinforcement Learning

To achieve general artificial intelligence, reinforcement learning (RL) agents should learn not only to optimize returns for one specific ta… (see more)sk but also to constantly build more complex skills and scaffold their knowledge about the world, without forgetting what has already been learned. In this paper, we discuss the desired characteristics of environments that can support the training and evaluation of lifelong reinforcement learning agents, review existing environments from this perspective, and propose recommendations for devising suitable environments in the future.

2018-11-25

ArXiv (preprint)

The Barbados 2018 List of Open Issues in Continual Learning

Tom Schaul

Hado van Hasselt

Joseph. Modayil

Martha White

Adam White

Pierre-Luc Bacon

Jean Harb

Shibl Mourad

Bellemare Marc-Emmanuel

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently ac… (see more)t in complex environments. The purpose of this report is to sketch a research outline, share some of the most important open issues we are facing, and stimulate further discussion in the community. The content is based on some of our discussions during a week-long workshop held in Barbados in February 2018.

2018-11-15

ArXiv (preprint)

Knowledge Representation for Reinforcement Learning using General Value Functions

Gheorghe Comanici

Andre Barreto

Daniel Toyama

Eser Aygün

Philippe Hamel

Sasha Vezhnevets

Shaobo Hou

Shibl Mourad

2018-09-26

(published)

openreview.net

Shaping representations through communication

Olivier Tieleman

Angeliki Lazaridou

Shibl Mourad

Charles Blundell

2018-09-26

(published)

openreview.net

Where Off-Policy Deep Reinforcement Learning Fails

Scott Fujimoto

David Meger

This work examines batch reinforcement learning–the task of maximally exploiting a given batch of off-policy data, without further data co… (see more)llection. We demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are only capable of learning with data correlated to their current policy, making them ineffective for most off-policy applications. We introduce a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space to force the agent towards behaving on-policy with respect to a subset of the given data. We extend this notion to deep reinforcement learning, and to the best of our knowledge, present the first continuous control deep reinforcement learning algorithm which can learn effectively from uncorrelated off-policy data.

2018-09-26

(published)

openreview.net

Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation

Tanya Nair

Douglas L. Arnold

Tal Arbel

Deep learning (DL) networks have recently been shown to outperform other segmentation methods on various public, medical-image challenge dat… (see more)asets [3,11,16], especially for large pathologies. However, in the context of diseases such as Multiple Sclerosis (MS), monitoring all the focal lesions visible on MRI sequences, even very small ones, is essential for disease staging, prognosis, and evaluating treatment efficacy. Moreover, producing deterministic outputs hinders DL adoption into clinical routines. Uncertainty estimates for the predictions would permit subsequent revision by clinicians. We present the first exploration of multiple uncertainty estimates based on Monte Carlo (MC) dropout [4] in the context of deep networks for lesion detection and segmentation in medical images. Specifically, we develop a 3D MS lesion segmentation CNN, augmented to provide four different voxel-based uncertainty measures based on MC dropout. We train the network on a proprietary, large-scale, multi-site, multi-scanner, clinical MS dataset, and compute lesion-wise uncertainties by accumulating evidence from voxel-wise uncertainties within detected lesions. We analyze the performance of voxel-based segmentation and lesion-level detection by choosing operating points based on the uncertainty. Empirical evidence suggests that uncertainty measures consistently allow us to choose superior operating points compared only using the network's sigmoid output as a probability.

2018-09-25

Medical Image Computing and Computer Assisted Intervention – MICCAI 2018 (published)