Doina Precup

guangyuan.wang@mila.quebec

Guangyuan Wang

Stagiaire de recherche - McGill University

Haque Ishfaq

Doctorat - McGill University

Doctorat - McGill University

huanghow@mila.quebec

Janarthanan Rajendran

Postdoctorat - Université de Montréal

Superviseur⋅e principal⋅e :

Sarath Chandar Anbil Parthipan

janarthanan.rajendran@mila.quebec

Jaume Minano Masip

Doctorat - McGill University

masipmij@mila.quebec

Jesse Farebrother

Doctorat - McGill University

Superviseur⋅e principal⋅e :

jonathan.colaco-carr@mila.quebec

Maîtrise recherche - McGill University

Superviseur⋅e principal⋅e :

Prakash Panangaden

Jonathan Lebensold

Doctorat - McGill University

Stagiaire de recherche - McGill University

keyu.wang@mila.quebec

Kushal Arora

Doctorat - McGill University

Superviseur⋅e principal⋅e :

Lynn Cherif

Maîtrise recherche - McGill University

Co-superviseur⋅e :

lynn.cherif@mila.quebec

Mohammad Sami Nur Islam Islam

Mandana Samiei

Doctorat - McGill University

Co-superviseur⋅e :

Doctorat - McGill University

delvermm@mila.quebec

Martin Klissarov

Doctorat - McGill University

Harry Zhao

Doctorat - McGill University

Co-superviseur⋅e :

Stagiaire de recherche - McGill University

mohammad-sami-nur.islam@mila.quebec

nathan.de-lara@mila.quebec

Nathan de Lara

Stagiaire de recherche - McGill University

Nate Rahn

Doctorat - McGill University

Superviseur⋅e principal⋅e :

nikhil-murali.vemgal@mila.quebec

nathan.rahn@mila.quebec

Girdhar Neil Girdhar

Collaborateur·rice de recherche - McGill University

neil.girdhar@mila.quebec

Nikhil Vemgal

Maîtrise recherche - McGill University

padideh.nouri@mila.quebec

Nishanth Anand Vemgal

Doctorat - McGill University

Maîtrise recherche - Université de Montréal

Doctorat - McGill University

Ray Chua

Doctorat - McGill University

Co-superviseur⋅e :

Blake Richards

chuaraym@mila.quebec

Riashat Islam

Doctorat - McGill University

Safa Alver

Doctorat - McGill University

alversaf@mila.quebec

Sahand Rezaei-Shoshtari

Doctorat - McGill University

Co-superviseur⋅e :

David Meger

sahand.rezaei-shoshtari@mila.quebec

Doctorat - McGill University

Doctorat - McGill University

Co-superviseur⋅e :

David Meger

fujimots@mila.quebec

Shahrad Mohammadzadeh

Collaborateur·rice de recherche - McGill University

Superviseur⋅e principal⋅e :

Reihaneh Rabbany

shahrad.mohammadzadeh@mila.quebec

Doctorat - McGill University

Doctorat - McGill University

shuyuan.zhang@mila.quebec

Sitao Luan

Doctorat - McGill University

Steve Wen

Baccalauréat - McGill University

steve.wen@mila.quebec

Sumana Basu

Doctorat - McGill University

Co-superviseur⋅e :

Adriana Romero Soriano

Maîtrise recherche - Université de Montréal

Superviseur⋅e principal⋅e :

Yoshua Bengio

thomas.jiralerspong@mila.quebec

Doctorat - McGill University

cheluver@mila.quebec

Wesley Chung

Doctorat - McGill University

Superviseur⋅e principal⋅e :

David Meger

chungwes@mila.quebec

Ray Luo

Doctorat - McGill University

Superviseur⋅e principal⋅e :

Xujie Si

luo.ziyan@mila.quebec

Skipper : combiner l’abstraction spatiale et temporelle afin d’améliorer la généralisation

Billets de blogue

Generic thumbnail for Mila Blog articles.

22 février 2024

par

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Lire l'article

Publications

Temporally Abstract Partial Models

Zafarali Ahmed

Gheorghe Comanici

Humans and animals have the ability to reason and make predictions about different courses of action at many time scales. In reinforcement l… (voir plus)earning, option models (Sutton, Precup \& Singh, 1999; Precup, 2000) provide the framework for this kind of temporally abstract prediction and reasoning. Natural intelligent agents are also able to focus their attention on courses of action that are relevant or feasible in a given situation, sometimes termed affordable actions. In this paper, we define a notion of affordances for options, and develop temporally abstract partial option models, that take into account the fact that an option might be affordable only in certain situations. We analyze the trade-offs between estimation and approximation error in planning and learning when using such models, and identify some interesting special cases. Additionally, we empirically demonstrate the ability to learn both affordances and partial option models online resulting in improved sample efficiency and planning time in the Taxi domain.

openreview.net

Phylogenetic Manifold Regularization: A semi-supervised approach to predict transcription factor binding sites

Faizy Ahsan

Alexandre Drouin

François Laviolette

Mathieu Blanchette

The computational prediction of transcription factor binding sites remains a challenging problems in bioinformatics, despite significant met… (voir plus)hodological developments from the field of machine learning. Such computational models are essential to help interpret the non-coding portion of human genomes, and to learn more about the regulatory mechanisms controlling gene expression. In parallel, massive genome sequencing efforts have produced assembled genomes for hundred of vertebrate species, but this data is underused. We present PhyloReg, a new semi-supervised learning approach that can be used for a wide variety of sequence-to-function prediction problems, and that takes advantage of hundreds of millions of years of evolution to regularize predictors and improve accuracy. We demonstrate that PhyloReg can be used to better train a previously proposed deep learning model of transcription factor binding. Simulation studies further help delineate the benefits of the a pproach. G ains in prediction accuracy are obtained over a broad set of transcription factors and cell types.

2020-12-16

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (publié)

doi.org

What can I do here? A Theory of Affordances in Reinforcement Learning

Zafarali Ahmed

Gheorghe Comanici

David Abel

2020-11-21

Proceedings of the 37th International Conference on Machine Learning (publié)

Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

Tianyu Li

Bogdan Mazoure

Guillaume Rabusseau

2020-06-03

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (publié)

Value Preserving State-Action Abstractions

David Abel

Nathan Umbanhowar

Dilip Arumugam

Michael L. Littman

Abstraction can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information… (voir plus), potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.ion can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information, potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.

2020-06-03

International Conference on Artificial Intelligence and Statistics (publié)

Value Preserving State-Action Abstractions

David Abel

Nathan Umbanhowar

Dilip Arumugam

Michael L. Littman

2020-06-03

International Conference on Artificial Intelligence and Statistics (published)

dblp.uni-trier.de

Options of Interest: Temporal Abstraction with Interest Functions

Martin Klissarov

Maxime Chevalier-Boisvert

Pierre-Luc Bacon

Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. Th… (voir plus)e options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments.

2020-04-03

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

doi.org

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

Philip. Amortila

Prakash Panangaden

We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate it… (voir plus)s effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(

2020-03-27

ArXiv (preprint)

Provably efficient reconstruction of policy networks

Bogdan Mazoure

Thang Doan

Tianyu Li

Recent research has shown that learning poli-cies parametrized by large neural networks can achieve significant success on challenging reinf… (voir plus)orcement learning problems. However, when memory is limited, it is not always possible to store such models exactly for inference, and com-pressing the policy into a compact representation might be necessary. We propose a general framework for policy representation, which reduces this problem to finding a low-dimensional embedding of a given density function in a separable inner product space. Our framework allows us to de-rive strong theoretical guarantees, controlling the error of the reconstructed policies. Such guaran-tees are typically lacking in black-box models, but are very desirable in risk-sensitive tasks. Our experimental results suggest that the reconstructed policies can use less than 10%of the number of parameters in the original networks, while incurring almost no decrease in rewards.

2020-02-07

ArXiv (prépublication)

Representation of Reinforcement Learning Policies in Reproducing Kernel Hilbert Spaces.

Bogdan Mazoure

Thang Doan

Tianyu Li

We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional… (voir plus) embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly embedded in a low-dimensional space while the embedded policy incurs almost no decrease in return.

2020-02-07

(publié)

www.semanticscholar.org

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

Philip Amortila

Prakash Panangaden

2020-01-01

AISTATS (publié)

Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

Tianyu Li

Bogdan Mazoure

Guillaume Rabusseau

Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning. Traditional methods c… (voir plus)onsider these two problems as independent, resulting in a classical two-stage paradigm: first learn the environment dynamics and then plan accordingly. This approach, however, disconnects the two problems and can consequently lead to algorithms that are sample inefficient and time consuming. In this paper, we propose a novel algorithm that combines learning and planning together. Our algorithm is closely related to the spectral learning algorithm for predicitive state representations and offers appealing theoretical guarantees and time complexity. We empirically show on two domains that our approach is more sample and time efficient compared to classical methods.

2019-11-01

ArXiv (preprint)