Charline Le Lan

Krishna S Haridasan

Amit Marathe

Steven Hansen

Sholto Douglas

Rajkumar Samuel

Mingqiu Wang

Sophia Austin

Chang Lan

Jiepu Jiang

Justin Chiu

Jaime Alonso Lorenzo

Lars Lowe Sjosund

S'ebastien Cevey

Zach Gleicher

Thi Avrahami

Anudhyan Boral

Hansa Srinivasan

Vittorio Selo

Rhys May

Konstantinos Aisopos

L'eonard Hussenot

Livio Baldini Soares

Kate Baumli

Michael B. Chang

Adria Recasens

Benjamin Caine

Alexander Pritzel

Filip Pavetic

Fabio Pardo

Anita Gergely

Justin Frye

Vinay Venkatesh Ramasesh

Dan Horgan

Kartikeya Badola

Nora Kassner

Subhrajit Roy

Ethan Dyer

V'ictor Campos

Alex Tomala

Yunhao Tang

Dalia El Badawy

Elspeth White

Basil Mustafa

Oran Lang

Abhishek Jindal

Sharad Mandyam Vikram

Zhitao Gong

Sergi Caelles

Ross Hemsley

Gregory Thornton

Fangxiaoyu Feng

Wojciech Stokowiec

Ce Zheng

Phoebe Thacker

cCauglar Unlu

Zhishuai Zhang

Mohammad Saleh

James Svensson

Maxwell L. Bileschi

Piyush Patil

Ankesh Anand

Roman Ring

Katerina Tsihlas

Arpi Vezer

Marco Selvi

Toby Shevlane

Mikel Rodriguez

Tom Kwiatkowski

Samira Daruki

Keran Rong

Allan Dafoe

Nicholas Fitzgerald

Keren Gu-Lemberg

Mina Khan

Lisa Anne Hendricks

Marie Pellat

Vladimir Feinberg

James Cobon-Kerr

Tara N. Sainath

Maribeth Rauh

Sayed Hadi Hashemi

Richard Ives

Yana Hasson

YaGuang Li

Eric Noland

Yuan Cao

Nathan Byrd

Le Hou

Qingze Wang

Thibault Sottiaux

Michela Paganini

Jean-Baptiste Lespiau

Alexandre Moufarek

Samer Hassan

Kaushik Shivakumar

Joost Van Amersfoort

Amol Mandhane

Pratik M. Joshi

Anirudh Goyal

Matthew Tung

Andy Brock

Hannah Rachel Sheahan

Vedant Misra

Cheng Li

Nemanja Raki'cevi'c

Mostafa Dehghani

Fangyu Liu

Sid Mittal

Junhyuk Oh

Seb Noury

Eren Sezener

Fantine Huot

Matthew Lamm

Nicola De Cao

Charlie Chen

Gamaleldin Elsayed

Ed Huai-hsin Chi

Mahdis Mahdieh

Ian F. Tenney

Nan Hua

Ivan Petrychenko

Patrick Kane

Dylan Scandinaro

Rishub Jain

Jonathan Uesato

Romina Datta

Adam Sadovsky

Oskar Bunyan

Dominik Rabiej

Shimu Wu

John Zhang

Gautam Vasudevan

Edouard Leurent

Mahmoud Alnahlawi

Ionut-Razvan Georgescu

Nan Wei

Ivy Zheng

Betty Chan

Pam G Rabinovitch

Piotr Stańczyk

Ye Zhang

David Steiner

Subhajit Naskar

Michael Azzam

Matthew Johnson

Adam Paszke

Chung-Cheng Chiu

Jaume Sanchez Elias

Afroz Mohiuddin

Faizan Muhammad

Jin Miao

Andrew Lee

Nino Vieillard

Sahitya Potluri

Jane Park

Elnaz Davoodi

Jiageng Zhang

Jeff Stanway

Drew Garmon

Abhijit Karmarkar

Zhe Dong

2023-12-19

ArXiv (preprint)

arxiv.org

Bootstrapped Representations in Reinforcement Learning

Stephen Tu

Mark Rowland

Anna Harutyunyan

Rishabh Agarwal

Will Dabney

In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of… (see more) deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated into the learning process and help shape the learnt state representation. Bootstrapping methods are today's method of choice to make these additional predictions. Yet, it is unclear which features these algorithms capture and how they relate to those from other auxiliary-task-based approaches. In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988). Surprisingly, we find that this representation differs from the features learned by Monte Carlo and residual gradient algorithms for most transition structures of the environment in the policy evaluation setting. We describe the efficacy of these representations for policy evaluation, and use our theoretical analysis to design new auxiliary learning rules. We complement our theoretical results with an empirical comparison of these learning rules for different cumulant functions on classic domains such as the four-room domain (Sutton et al, 1999) and Mountain Car (Moore, 1990).

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)

openreview.net

A Novel Stochastic Gradient Descent Algorithm for LearningPrincipal Subspaces

Joshua Greaves

Jesse Farebrother

Mark Rowland

Fabian Pedregosa

Rishabh Agarwal

In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace i… (see more)s represented by a neural network, and hence can bescaled to datasets with an effectively infinite number of rows and columns. Our method consistsin defining a loss function whose minimizer is the desired principal subspace, and constructing agradient estimate of this loss whose bias can be controlled.

2023-04-11

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics (published)

openreview.net

Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

Joshua Greaves

Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well-und… (see more)erstood; in practice, how-ever, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent’s network are simultaneously increased. For this purpose, we derive a new family of auxiliary tasks based on the successor measure. These tasks are easy to implement and have appealing theoretical properties. Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)’s proto-value functions to deep reinforcement learning – accordingly, we call the resulting object proto-value networks. Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment’s reward function.

2023-02-01

ICLR.cc/2023/Conference (poster)

openreview.net

Metrics and continuity in reinforcement learning

Pablo Samuel Castro

2021-05-18

Proceedings of the AAAI Conference on Artificial Intelligence (published)