Doina Precup

Jesse Farebrother

Doctorat - McGill

Superviseur⋅e principal⋅e :

Marc Gendron-Bellemare

Doctorat - McGill

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - Birla Institute of Technology

Jonathan Hu

Maîtrise recherche - McGill

Howard Huang

Doctorat - McGill

Haque Ishfaq

Collaborateur·rice alumni - McGill

Site web

Mohammad Sami Nur Islam Islam

Maîtrise recherche - McGill

Hangzhan Jin

Doctorat - Polytechnique

Doctorat - McGill

Postdoctorat - McGill

Jonathan Lebensold

Collaborateur·rice alumni - McGill

Collaborateur·rice alumni - McGill

Ray Luo

Doctorat - McGill

Superviseur⋅e principal⋅e :

G McCracken

Doctorat - McGill

Nazanin Mohammadi Sepahvand

Collaborateur·rice alumni - McGill

Shahrad Mohammadzadeh

Maîtrise recherche - McGill

Superviseur⋅e principal⋅e :

Gabriela Moisescu-Pareja

Collaborateur·rice de recherche - McGill

Co-superviseur⋅e :

Irina Rish

Padideh Nouri

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Stagiaire de recherche - McGill

Nate Rahn

Doctorat - McGill

Superviseur⋅e principal⋅e :

Marc Gendron-Bellemare

Manoosh Samiei

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Nishanth Anand Vemgal

Doctorat - McGill

Doctorat - McGill

Co-superviseur⋅e :

Samira Ebrahimi Kahou

Stagiaire de recherche - McGill

Zihan Wang

Doctorat - McGill

Site web

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Steve Wen

Maîtrise recherche - McGill

Co-superviseur⋅e :

Gregory Dudek

Zijing Wu

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - McGill

Harry Zhao

Collaborateur·rice alumni - McGill

Co-superviseur⋅e :

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Lire l'article

Publications

Attend Before you Act: Leveraging human visual attention for continual learning

Khimya Khetarpal

When humans perform a task, such as playing a game, they selectively pay attention to certain parts of the visual input, gathering relevant … (voir plus)information and sequentially combining it to build a representation from the sensory data. In this work, we explore leveraging where humans look in an image as an implicit indication of what is salient for decision making. We build on top of the UNREAL architecture in DeepMind Lab's 3D navigation maze environment. We train the agent both with original images and foveated images, which were generated by overlaying the original images with saliency maps generated using a real-time spectral residual technique. We investigate the effectiveness of this approach in transfer learning by measuring performance in the context of noise in the environment.

2018-07-24

ArXiv (prépublication)

Undersampling and Bagging of Decision Trees in the Analysis of Cardiorespiratory Behavior for the Prediction of Extubation Readiness in Extremely Preterm Infants

Lara Kanbar

Charles Onu

Wissam Shalish

Karen A. Brown

Guilherme M. Sant’Anna

Robert E. Kearney

Extremely preterm infants often require endotracheal intubation and mechanical ventilation during the first days of life. Due to the detrime… (voir plus)ntal effects of prolonged invasive mechanical ventilation (IMV), clinicians aim to extubate infants as soon as they deem them ready.Unfortunately, existing strategies for prediction of extubation readiness vary across clinicians and institutions, and lead to high reintubation rates. We present an approach using Random Forest classifiers for the analysis of cardiorespiratory variability to predict extubation readiness. We address the issue of data imbalance by employing random undersampling of examples from the majority class before training each Decision Tree in a bag. By incorporating clinical domain knowledge, we further demonstrate that our classifier could have identified 71% of infants who failed extubation, while maintaining a success detection rate of 78%.

2018-07-17

2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (publié)

Eligibility Traces for Options

Ayush Jain

Temporally extended actions not only represent knowledge in the hierarchical setup in reinforcement learning, they also improve exploration … (voir plus)while reducing the complexity of choosing actions. The option framework provides a concrete way to implement and reason about temporal abstraction. This work attempts to test the utility of eligibility traces with options and find good ways of doing multi-step intra-option updates. Three algorithms, based on off-policy methods - importance sampling, tree-backup and retrace, are proposed for using eligibility traces with options.

2018-07-08

International Joint Conference on Autonomous Agents and Multiagent Systems (publié)

Convergent Tree Backup and Retrace with Function Approximation

Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by… (voir plus) a different behavior policy. Unfortunately, it has been challenging to combine off-policy learning with function approximation and multi-step bootstrapping in a way that leads to both stable and efficient algorithms. In this work, we show that the \textsc{Tree Backup} and \textsc{Retrace} algorithms are unstable with linear function approximation, both in theory and in practice with specific examples. Based on our analysis, we then derive stable and efficient gradient-based algorithms using a quadratic convex-concave saddle-point formulation. By exploiting the problem structure proper to these algorithms, we are able to provide convergence guarantees and finite-sample bounds. The applicability of our new analysis also goes beyond \textsc{Tree Backup} and \textsc{Retrace} and allows us to provide new convergence rates for the GTD and GTD2 algorithms without having recourse to projections or Polyak averaging.

2018-07-02

Proceedings of the 35th International Conference on Machine Learning (publié)

proceedings.mlr.press

Diffusion-Based Approximate Value Functions

Martin Klissarov

We present a novel model-based framework inspired by spectral graph theory and deep geometric learning: the Diffusion-based Approximate Valu… (voir plus)e Function. Our approach efficiently approximates the graph Laplacian of an MDP’s underlying graph by using Graph Convolutional Networks (GCN). By generating an approximate value function, we diffuse the reward signal much faster than traditional Reinforcement Learning algorithms such as TD(0). This leads to substantial improvements on sparse rewards environments where efficient credit assignment is most demanding.

2018-06-26

ICML.cc/2018/ECA (accepté)

openreview.net

Resolving Event Coreference with Supervised Representation Learning and Clustering-Oriented Regularization

Kian Kenyon-Dean

Jackie CK Cheung

We present an approach to event coreference resolution by developing a general framework for clustering that uses supervised representation … (voir plus)learning. We propose a neural network architecture with novel Clustering-Oriented Regularization (CORE) terms in the objective function. These terms encourage the model to create embeddings of event mentions that are amenable to clustering. We then use agglomerative clustering on these embeddings to build event coreference chains. For both within- and cross-document coreference on the ECB+ corpus, our model obtains better results than models that require significantly more pre-annotated information. This work provides insight and motivating results for a new general approach to solving coreference and clustering problems with representation learning.

2018-05-31

Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics (publié)

Dyna Planning using a Feature Based Generative Model

Ryan Faulkner

Dyna-style reinforcement learning is a powerful approach for problems where not much real data is available. The main idea is to supplement … (voir plus)real trajectories, or sequences of sampled states over time, with simulated ones sampled from a learned model of the environment. However, in large state spaces, the problem of learning a good generative model of the environment has been open so far. We propose to use deep belief networks to learn an environment model for use in Dyna. We present our approach and validate it empirically on problems where the state observations consist of images. Our results demonstrate that using deep belief networks, which are full generative models, significantly outperforms the use of linear expectation models, proposed in Sutton et al. (2008)

2018-05-22

ArXiv (prépublication)

Deep Reinforcement Learning that Matters

Philip Bachman

In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning… (voir plus) (RL). Reproducing existing work and accurately judging the improvements offered by novel methods is vital to sustaining this progress. Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. In particular, non-determinism in standard benchmark environments, combined with variance intrinsic to the methods, can make reported results tough to interpret. Without significance metrics and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the prior state-of-the-art are meaningful. In this paper, we investigate challenges posed by reproducibility, proper experimental techniques, and reporting procedures. We illustrate the variability in reported metrics and results when comparing against common baselines and suggest guidelines to make future results in deep RL more reproducible. We aim to spur discussion about how to ensure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.

2018-04-28

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

Imitation Upper Confidence Bound for Bandits on a Graph

Andrei Lupu

We consider a graph of interconnected agents implementing a common policy and each playing a bandit problem with identical reward distributi… (voir plus)ons. We restrict the information propagated in the graph such that agents can uniquely observe each other's actions. We propose an extension of the Upper Confidence Bound (UCB) algorithm to this setting and empirically demonstrate that our solution improves the performance over UCB according to multiple metrics and within various graph configurations.

2018-04-28

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

Learning Predictive State Representations From Non-Uniform Sampling

Yuri Grinberg

Hossein Aboutalebi

Melanie Lyman-Abramovitch

Borja Balle

Predictive state representations (PSR) have emerged as a powerful method for modelling partially observable environments. PSR learning algor… (voir plus)ithms can build models for predicting all observable variables, or predicting only some of them conditioned on others (e.g., actions or exogenous variables). In the latter case, which we call conditional modelling, the accuracy of different estimates of the conditional probabilities for a fixed dataset can vary significantly, due to the limited sampling of certain conditions. This can have negative consequences on the PSR parameter estimation process, which are not taken into account by the current state-of-the-art PSR spectral learning algorithms. In this paper, we examine closely conditional modelling within the PSR framework. We first establish a new positive but surprisingly non-trivial result: a conditional model can never be larger than the complete model. Then, we address the core shortcoming of existing PSR spectral learning methods for conditional models by incorporating an additional step in the process, which can be seen as a type of matrix denoising. We further refine this objective by adding penalty terms for violations of the system dynamics matrix structure, which improves the PSR predictive performance. Empirical evaluations on both synthetic and real datasets highlight the advantages of the proposed approach.

2018-04-28

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

Learning with Options that Terminate Off-Policy

Anna Harutyunyan

Peter Vrancx

Pierre-Luc Bacon

Ann Nowé

A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the… (voir plus) termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optimal policy exactly, shorter options offer more flexibility and can yield a better solution. Thus, the termination condition puts learning efficiency at odds with solution quality. We propose to resolve this dilemma by decoupling the behavior and target terminations, just like it is done with policies in off-policy learning. To this end, we give a new algorithm, Q(β), that learns the solution with respect to any termination condition, regardless of how the options actually terminate. We derive Q(β) by casting learning with options into a common framework with well-studied multi-step off-policy learning. We validate our algorithm empirically, and show that it holds up to its motivating claims.

2018-04-28

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

Wei-Di Chang

Reinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward fu… (voir plus)nction can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectories arising from a diverse set of underlying reward functions rather than a single one. Thus, in inverse reinforcement learning, it is useful to consider such a decomposition. The options framework in reinforcement learning is specifically designed to decompose policies in a similar light. We therefore extend the options framework and propose a method to simultaneously recover reward options in addition to policy options. We leverage adversarial methods to learn joint reward-policy options using only observed expert states. We show that this approach works well in both simple and complex continuous control tasks and shows significant performance increases in one-shot transfer learning.

2018-04-28

Proceedings of the AAAI Conference on Artificial Intelligence (publié)