Zafarali Ahmed

Tianqi Liu

Richard Powell

Vijay Bolina

Mariko Iinuma

Polina Zablotskaia

James Besley

Da-Woon Chung

Timothy Dozat

Ramona Comanescu

Xiance Si

Jeremy Greer

Guolong Su

M. Polacek

Raphael Lopez Kaufman

Simon Tokumine

Hexiang Hu

Elena Buchatskaya

Yingjie Miao

Mohamed Elhawaty

Aditya Siddhant

Nenad Tomasev

Jinwei Xing

Christina Greer

Helen Miller

Shereen Ashraf

Aurko Roy

Zizhao Zhang

Ada Ma

Angelos Filos

Milos Besta

Rory Blevins

Ted Klimenko

Chih-Kuan Yeh

Soravit Changpinyo

Jiaqi Mu

Oscar Chang

Mantas Pajarskas

Carrie Muir

Vered Cohen

Charline Le Lan

Krishna S Haridasan

Amit Marathe

Steven Stenberg Hansen

Sholto Douglas

Rajkumar Samuel

Mingqiu Wang

Sophia Austin

Chang Lan

Jiepu Jiang

Justin Chiu

Jaime Alonso Lorenzo

Lars Lowe Sjosund

S'ebastien Cevey

Zach Gleicher

Thi Avrahami

Anudhyan Boral

Hansa Srinivasan

Vittorio Selo

Rhys May

Konstantinos Aisopos

L'eonard Hussenot

Livio Baldini Soares

Kate Baumli

Michael B. Chang

Adria Recasens

Benjamin Caine

Alexander Pritzel

Filip Pavetic

Fabio Pardo

Anita Gergely

Justin Frye

Vinay Venkatesh Ramasesh

Dan Horgan

Kartikeya Badola

Nora Kassner

Subhrajit Roy

Ethan Dyer

V'ictor Campos

Alex Tomala

Yunhao Tang

Dalia El Badawy

Elspeth White

Basil Mustafa

Oran Lang

Abhishek Jindal

Sharad Mandyam Vikram

Zhitao Gong

Sergi Caelles

Ross Hemsley

Gregory Thornton

Fangxiaoyu Feng

Wojciech Stokowiec

Ce Zheng

Phoebe Thacker

cCauglar Unlu

Zhishuai Zhang

Mohammad Saleh

James Svensson

Maxwell L. Bileschi

Piyush Patil

Ankesh Anand

Roman Ring

Katerina Tsihlas

Arpi Vezer

Marco Selvi

Toby Shevlane

Mikel Rodriguez

Tom Kwiatkowski

Samira Daruki

Keran Rong

Allan Dafoe

Nicholas Fitzgerald

Keren Gu-Lemberg

Mina Khan

Lisa Anne Hendricks

Marie Pellat

Vladimir Feinberg

James Cobon-Kerr

Tara N. Sainath

Maribeth Rauh

Sayed Hadi Hashemi

Richard Ives

Yana Hasson

YaGuang Li

Eric Noland

Yuan Cao

Nathan Byrd

Le Hou

Qingze Wang

Thibault Sottiaux

Michela Paganini

Jean-Baptiste Lespiau

Alexandre Moufarek

Samer Hassan

Kaushik Shivakumar

Joost Van Amersfoort

Amol Mandhane

Pratik M. Joshi

Anirudh Goyal

Matthew Tung

Andy Brock

Hannah Rachel Sheahan

Vedant Misra

Cheng Li

Nemanja Raki'cevi'c

Mostafa Dehghani

Fangyu Liu

Sid Mittal

Junhyuk Oh

Seb Noury

Eren Sezener

Fantine Huot

Matthew Lamm

Nicola De Cao

Charlie Chen

Gamaleldin Elsayed

Ed Huai-hsin Chi

Mahdis Mahdieh

Ian F. Tenney

Nan Hua

Ivan Petrychenko

Patrick Kane

Dylan Scandinaro

Rishub Jain

Jonathan Uesato

Romina Datta

Adam Sadovsky

Oskar Bunyan

Dominik Rabiej

Shimu Wu

John Zhang

Gautam Vasudevan

Edouard Leurent

Mahmoud Alnahlawi

Ionut-Razvan Georgescu

Nan Wei

Ivy Zheng

Betty Chan

Pam G Rabinovitch

Piotr Stańczyk

Ye Zhang

David Steiner

Subhajit Naskar

Michael Azzam

Matthew Johnson

Adam Paszke

Chung-Cheng Chiu

Jaume Sanchez Elias

Afroz Mohiuddin

Faizan Muhammad

Jin Miao

Andrew Lee

Nino Vieillard

Sahitya Potluri

Jane Park

Elnaz Davoodi

Jiageng Zhang

Jeff Stanway

Drew Garmon

Abhijit Karmarkar

Zhe Dong

2023-12-18

ArXiv (preprint)

Learning how to Interact with a Complex Interface using Hierarchical Reinforcement Learning

Gheorghe Comanici

Amelia Glaese

Anita Gergely

Daniel Toyama

Tyler Jackson

Philippe Hamel

Hierarchical Reinforcement Learning (HRL) allows interactive agents to decompose complex problems into a hierarchy of sub-tasks. Higher-leve… (see more)l tasks can invoke the solutions of lower-level tasks as if they were primitive actions. In this work, we study the utility of hierarchical decompositions for learning an appropriate way to interact with a complex interface. Specifically, we train HRL agents that can interface with applications in a simulated Android device. We introduce a Hierarchical Distributed Deep Reinforcement Learning architecture that learns (1) subtasks corresponding to simple finger gestures, and (2) how to combine these gestures to solve several Android tasks. Our approach relies on goal conditioning and can be used more generally to convert any base RL agent into an HRL agent. We use the AndroidEnv environment to evaluate our approach. For the experiments, the HRL agent uses a distributed version of the popular DQN algorithm to train different components of the hierarchy. While the native action space is completely intractable for simple DQN agents, our architecture can be used to establish an effective way to interact with different tasks, significantly improving the performance of the same DQN agent over different levels of abstraction.

2022-04-20

ArXiv (preprint)

AndroidEnv: A Reinforcement Learning Platform for Android

Daniel Toyama

Philippe Hamel

Anita Gergely

Gheorghe Comanici

Amelia Glaese

Tyler Jackson

Shibl Mourad

We introduce AndroidEnv, an open-source platform for Reinforcement Learning (RL) research built on top of the Android ecosystem. AndroidEnv … (see more)allows RL agents to interact with a wide variety of apps and services commonly used by humans through a universal touchscreen interface. Since agents train on a realistic simulation of an Android device, they have the potential to be deployed on real devices. In this report, we give an overview of the environment, highlighting the significant features it provides for research, and we present an empirical evaluation of some popular reinforcement learning agents on a set of tasks built on this platform.

2021-05-26

ArXiv (preprint)

Training a First-Order Theorem Prover from Synthetic Data

Vlad Firoiu

Eser Aygün

Ankit Anand

Xavier Glorot

Laurent Orseau

Lei Zhang

Shibl Mourad

2021-03-04

ArXiv (preprint)

Temporally Abstract Partial Models

Khimya Khetarpal

Gheorghe Comanici

Humans and animals have the ability to reason and make predictions about different courses of action at many time scales. In reinforcement l… (see more)earning, option models (Sutton, Precup \& Singh, 1999; Precup, 2000) provide the framework for this kind of temporally abstract prediction and reasoning. Natural intelligent agents are also able to focus their attention on courses of action that are relevant or feasible in a given situation, sometimes termed affordable actions. In this paper, we define a notion of affordances for options, and develop temporally abstract partial option models, that take into account the fact that an option might be affordable only in certain situations. We analyze the trade-offs between estimation and approximation error in planning and learning when using such models, and identify some interesting special cases. Additionally, we demonstrate empirically the potential impact of partial option models on the efficiency of planning.

2020-12-31

Advances in Neural Information Processing Systems 34 (NeurIPS 2021) (published)

openreview.net

What can I do here? A Theory of Affordances in Reinforcement Learning

Khimya Khetarpal

Gheorghe Comanici

David Abel

Reinforcement learning algorithms usually assume that all actions are always available to an agent. However, both people and animals underst… (see more)and the general link between the features of their environment and the actions that are feasible. Gibson (1977) coined the term "affordances" to describe the fact that certain states enable an agent to do certain actions, in the context of embodied agents. In this paper, we develop a theory of affordances for agents who learn and plan in Markov Decision Processes. Affordances play a dual role in this case. On one hand, they allow faster planning, by reducing the number of actions available in any given situation. On the other hand, they facilitate more efficient and precise learning of transition models from data, especially when such models require function approximation. We establish these properties through theoretical results as well as illustrative examples. We also propose an approach to learn affordances and use it to estimate transition models that are simpler and generalize better.

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (published)

proceedings.mlr.press

Learning to Prove from Synthetic Theorems

Eser Aygün

Vlad Firoiu

Laurent Orseau

Shibl Mourad

A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in t… (see more)raining successful deep learning models. To tackle this problem, we propose an approach that relies on training with synthetic theorems, generated from a set of axioms. We show that such theorems can be used to train an automated prover and that the learned prover transfers successfully to human-generated theorems. We demonstrate that a prover trained exclusively on synthetic theorems can solve a substantial fraction of problems in TPTP, a benchmark dataset that is used to compare state-of-the-art heuristic provers. Our approach outperforms a model trained on human-generated problems in most axiom sets, thereby showing the promise of using synthetic data for this task.

2020-06-18

ArXiv (preprint)

Marginalized State Distribution Entropy Regularization in Policy Optimization

Riashat Islam

2019-12-10

ArXiv (preprint)

Understanding the impact of entropy on policy optimization

Nicolas Roux

Mohammad Norouzi

Dale Schuurmans

Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{explorat… (see more)ion} by encouraging the selection of more stochastic policies. In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We first show that even with access to the exact gradient, policy optimization is difficult due to the geometry of the objective function. Then, we qualitatively show that in some environments, a policy with higher entropy can make the optimization landscape smoother, thereby connecting local optima and enabling the use of larger learning rates. This paper presents new tools for understanding the optimization landscape, shows that policy entropy serves as a regularizer, and highlights the challenge of designing general-purpose policy optimization algorithms.

2019-05-23

Proceedings of the 36th International Conference on Machine Learning (published)

proceedings.mlr.press

Learning proposals for sequential importance samplers using reinforced variational inference

Arjun Karuvally

Simon Gravel

The problem of inferring unobserved values in a partially observed trajectory from a stochastic process can be considered as a structured pr… (see more)ediction problem. Traditionally inference is conducted using heuristic-based Monte Carlo methods. This work considers learning heuristics by leveraging a connection between policy optimization reinforcement learning and approximate inference. In particular, we learn proposal distributions used in importance samplers by casting it as a variational inference problem. We then rewrite the variational lower bound as a policy optimization problem similar to Weber et al. (2015) allowing us to transfer techniques from reinforcement learning. We apply this technique to a simple stochastic process as a proof-of-concept and show that while it is viable, it will require more engineering effort to scale inference for rare observations 1 .

2019-03-15

DeepRLStructPred@ICLR (published)

openreview.net

InfoBot: Structured Exploration in ReinforcementLearning Using Information Bottleneck

D. Strouse

Matthew Botvinick

Sergey Levine

2018-12-31

(published)

www.semanticscholar.org

InfoBot: Transfer and Exploration via the Information Bottleneck

Daniel Strouse

Matthew Botvinick

Sergey Levine

A central challenge in reinforcement learning is discovering effective policies for tasks where rewards are sparsely distributed. We postula… (see more)te that in the absence of useful reward signals, an effective exploration strategy should seek out {\it decision states}. These states lie at critical junctions in the state space from where the agent can transition to new, potentially unexplored regions. We propose to learn about decision states from prior experience. By training a goal-conditioned policy with an information bottleneck, we can identify decision states by examining where the model actually leverages the goal state. We find that this simple mechanism effectively identifies decision states, even in partially observed settings. In effect, the model learns the sensory cues that correlate with potential subgoals. In new environments, this model can then identify novel subgoals for further exploration, guiding the agent through a sequence of potential decision states and through new regions of the state space.

2018-12-31

ICLR.cc/2019/Conference (poster)