Adrien Ali Taiga

A Geometric Perspective on Optimal Representations for Reinforcement Learning

Marc Gendron-Bellemare

Will Dabney

Robert Dadashi

Adrien Ali Taiga

Pablo Samuel Castro

Nicolas Le Roux

Dale Eric. Schuurmans

Tor Lattimore

Clare Lyle

We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functi… (see more)ons. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate predictions regarding a special class of value functions which we call adversarial value functions (AVFs). We demonstrate that using value functions as auxiliary tasks corresponds to an expected-error relaxation of our formulation, with AVFs a natural candidate, and identify a close relationship with proto-value functions (Mahadevan, 2005). We highlight characteristics of AVFs and their usefulness as auxiliary tasks in a series of experiments on the four-room domain.

2019-01-31

ArXiv (preprint)

arxiv.org

A Geometric Perspective on Optimal Representations for Reinforcement Learning

Marc Gendron-Bellemare

Will Dabney

Robert Dadashi

Adrien Ali Taiga

Pablo Samuel Castro

Nicolas Le Roux

Dale Schuurmans

Tor Lattimore

Clare Lyle

openreview.net

A Geometric Perspective on Optimal Representations for Reinforcement Learning

Marc Gendron-Bellemare

Will Dabney

Robert Dadashi

Adrien Ali Taiga

Pablo Samuel Castro

Nicolas Le Roux

Dale Schuurmans

Tor Lattimore

Clare Lyle

We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functi… (see more)ons. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate predictions regarding a special class of value functions which we call adversarial value functions (AVFs). We demonstrate that using value functions as auxiliary tasks corresponds to an expected-error relaxation of our formulation, with AVFs a natural candidate, and identify a close relationship with proto-value functions (Mahadevan, 2005). We highlight characteristics of AVFs and their usefulness as auxiliary tasks in a series of experiments on the four-room domain.

arxiv.org

Sim-to-Real Transfer with Neural-Augmented Robot Simulation

Despite the recent successes of deep reinforcement learning, teaching complex motor skills to a physical robot remains a hard problem. While… (see more) learning directly on a real system is usually impractical, doing so in simulation has proven to be fast and safe. Nevertheless, because of the "reality gap," policies trained in simulation often perform poorly when deployed on a real system. In this work, we introduce a method for training a recurrent neural network on the differences between simulated and real robot trajectories and then using this model to augment the simulator. This Neural-Augmented Simulation (NAS) can be used to learn control policies that transfer significantly better to real environments than policies learned on existing simulators. We demonstrate the potential of our approach through a set of experiments on the Mujoco simulator with added backlash and the Poppy Ergo Jr robot. NAS allows us to learn policies that are competitive with ones that would have been learned directly on the real robot.

2018-10-23

Proceedings of The 2nd Conference on Robot Learning (published)

proceedings.mlr.press

Approximate Exploration through State Abstraction

Adrien Ali Taiga

Aaron Courville

Marc Gendron-Bellemare

Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impracti… (see more)cal. In this paper we study the interplay between exploration and approximation, what we call approximate exploration. Our main goal is to further our theoretical understanding of pseudo-count based exploration bonuses (Bellemare et al., 2016), a practical exploration scheme based on density modelling. As a warm-up, we quantify the performance of an exploration algorithm, MBIE-EB (Strehl and Littman, 2008), when explicitly combined with state aggregation. This allows us to confirm that, as might be expected, approximation allows the agent to trade off between learning speed and quality of the learned policy. Next, we show how a given density model can be related to an abstraction and that the corresponding pseudo-count bonus can act as a substitute in MBIE-EB combined with this abstraction, but may lead to either under- or over-exploration. Then, we show that a given density model also defines an implicit abstraction, and find a surprising mismatch between pseudo-counts derived either implicitly or explicitly. Finally we derive a new pseudo-count bonus alleviating this issue.

2018-08-29

ArXiv (preprint)

arxiv.org

PixelVAE: A Latent Variable Model for Natural Images

Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representatio… (see more)n and model global structure well but have difficulty capturing small details. PixelCNN models details very well, but lacks a latent code and is difficult to scale for capturing large structures. We present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. Our model requires very few expensive autoregressive layers compared to PixelCNN and learns latent codes that are more compressed than a standard VAE while still capturing most non-trivial structure. Finally, we extend our model to a hierarchy of latent variables at different scales. Our model achieves state-of-the-art performance on binarized MNIST, competitive performance on 64 × 64 ImageNet, and high-quality samples on the LSUN bedrooms dataset.

2017-01-01

ICLR.cc/2017/conference (poster)

openreview.net

Speed Science

Leading in a New Era

Supervision Requests

Adrien Ali Taiga

Publications

Speed Science

Leading in a New Era

Supervision Requests

Popular keywords:

Adrien Ali Taiga

Publications