Philippe Hamel

Hierarchical Reinforcement Learning (HRL) allows interactive agents to decompose complex problems into a hierarchy of sub-tasks. Higher-leve… (voir plus)l tasks can invoke the solutions of lower-level tasks as if they were primitive actions. In this work, we study the utility of hierarchical decompositions for learning an appropriate way to interact with a complex interface. Specifically, we train HRL agents that can interface with applications in a simulated Android device. We introduce a Hierarchical Distributed Deep Reinforcement Learning architecture that learns (1) subtasks corresponding to simple finger gestures, and (2) how to combine these gestures to solve several Android tasks. Our approach relies on goal conditioning and can be used more generally to convert any base RL agent into an HRL agent. We use the AndroidEnv environment to evaluate our approach. For the experiments, the HRL agent uses a distributed version of the popular DQN algorithm to train different components of the hierarchy. While the native action space is completely intractable for simple DQN agents, our architecture can be used to establish an effective way to interact with different tasks, significantly improving the performance of the same DQN agent over different levels of abstraction.

2022-04-20

ArXiv (prépublication)

doi.org

AndroidEnv: A Reinforcement Learning Platform for Android

Daniel Toyama

Anita Gergely

Gheorghe Comanici

Amelia Glaese

Zafarali Ahmed

Tyler Jackson

Shibl Mourad

We introduce AndroidEnv, an open-source platform for Reinforcement Learning (RL) research built on top of the Android ecosystem. AndroidEnv … (voir plus)allows RL agents to interact with a wide variety of apps and services commonly used by humans through a universal touchscreen interface. Since agents train on a realistic simulation of an Android device, they have the potential to be deployed on real devices. In this report, we give an overview of the environment, highlighting the significant features it provides for research, and we present an empirical evaluation of some popular reinforcement learning agents on a set of tasks built on this platform.

2021-05-26

ArXiv (prépublication)

Per-Decision Option Discounting

Anna Harutyunyan

Peter Vrancx

Ann Nowé

In order to solve complex problems an agent must be able to reason over a sufficiently long horizon. Temporal abstraction, commonly modeled … (voir plus)through options, offers the ability to reason at many timescales, but the horizon length is still determined by the discount factor of the underlying Markov Decision Process. We propose a modification to the options framework that naturally scales the agent’s horizon with option length. We show that the proposed option-step discount controls a bias-variance trade-off, with larger discounts (counter-intuitively) leading to less estimation variance.

2019-05-23

Proceedings of the 36th International Conference on Machine Learning (publié)

proceedings.mlr.press

The Option Keyboard: Combining Skills in Reinforcement Learning

Andre Barreto

Diana Borsa

Shaobo Hou

Gheorghe Comanici

Eser Aygün

Daniel Toyama

Jonathan J. Hunt

Shibl Mourad

David Silver

The ability to combine known skills to create new ones may be crucial in the solution of complex reinforcement learning problems that unfold… (voir plus) over extended periods. We argue that a robust way of combining skills is to define and manipulate them in the space of pseudo-rewards (or "cumulants"). Based on this premise, we propose a framework for combining skills using the formalism of options. We show that every deterministic option can be unambiguously represented as a cumulant defined in an extended domain. Building on this insight and on previous results on transfer learning, we show how to approximate options whose cumulants are linear combinations of the cumulants of known options. This means that, once we have learned options associated with a set of cumulants, we can instantaneously synthesise options induced by any linear combination of them, without any learning involved. We describe how this framework provides a hierarchical interface to the environment whose abstract actions correspond to combinations of basic skills. We demonstrate the practical benefits of our approach in a resource management problem and a navigation task involving a quadrupedal simulated robot.

2018-12-31

NeurIPS (publié)

Knowledge Representation for Reinforcement Learning using General Value Functions

Gheorghe Comanici

Andre Barreto

Daniel Toyama

Eser Aygün

Nicolas Boulanger-Lewandowski

Sasha Vezhnevets

Shaobo Hou

Shibl Mourad

2018-09-26

(publié)

openreview.net

Theano: A Python framework for fast computation of mathematical expressions

Rami Al-Rfou

Guillaume Alain

Amjad Almahairi

Christof Angermueller

Dzmitry Bahdanau

Nicolas Ballas

Frédéric Bastien

Justin Bayer

Anatoly Belikov

Alexander Belopolsky

Josh Bleecher Snyder

Xavier Bouthillier

Alexandre De Brébisson

Olivier Breuleux … (voir 92 de plus)

Pierre-Luc Carrier

Paul Christiano

Myriam Côté

Yann N. Dauphin

Julien Demouth

Sander Dieleman

Samira Ebrahimi Kahou

Ziye Fan

Mathieu Germain

Matt Graham

Balázs Hidasi

Arjun Jain

Kai Jia

Mikhail Korobov

Vivek Kulkarni

Alex Lamb

Pascal Lamblin

Eric Larsen

César Laurent

Sean Lee

Simon Lefrancois

Simon Lemieux

Nicholas Léonard

Zhouhan Lin

Jesse A. Livezey

Cory Lorenz

Jeremiah Lowin

Qianli Ma

Pierre-Antoine Manzagol

Robert T. McGibbon

Mehdi Mirza

Alberto Orlandi

Christopher Pal

Razvan Pascanu

Mohammad Pezeshki

Colin Raffel

Daniel Renshaw

Matthew Rocklin

Adriana Romero

Markus Roth

Peter Sadowski

John Salvatier

François Savard

Jan Schlüter

John Schulman

Gabriel Schwartz

Iulian Vlad Serban

Dmitriy Serdyuk

Samira Shabanian

Etienne Simon

Sigurd Spieckermann

S. Ramana Subramanyam

Gijs van Tulder

Sebastian Urban

Dustin J. Webb

Matthew Willson

Lijun Xue

Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (voir plus)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.

2015-12-31

arXiv (prépublication)

doi.org