Scott Fujimoto

An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay

Prioritized Experience Replay (PER) is a deep reinforcement learning technique in which agents learn from transitions sampled with non-unifo… (see more)rm probability proportionate to their temporal-difference error. We show that any loss function evaluated with non-uniformly sampled data can be transformed into another uniformly sampled loss function with the same expected gradient. Surprisingly, we find in some environments PER can be replaced entirely by this new loss function without impact to empirical performance. Furthermore, this relationship suggests a new branch of improvements to PER by correcting its uniformly sampled loss function equivalent. We demonstrate the effectiveness of our proposed modifications to PER and the equivalent loss function in several MuJoCo and Atari environments.

2019-12-31

Advances in Neural Information Processing Systems 33 (NeurIPS 2020) (published)

doi.org

arxiv.org

GeoMetrics: Exploiting Geometric Structure for Graph-Encoded Objects

Edward J. Smith

Scott Fujimoto

Adriana Romero

David Meger

Mesh models are a promising approach for encoding the structure of 3D objects. Current mesh reconstruction systems predict uniformly distrib… (see more)uted vertex locations of a predetermined graph through a series of graph convolutions, leading to compromises with respect to performance or resolution. In this paper, we argue that the graph representation of geometric objects allows for additional structure, which should be leveraged for enhanced reconstruction. Thus, we propose a system which properly benefits from the advantages of the geometric structure of graph encoded objects by introducing (1) a graph convolutional update preserving vertex information; (2) an adaptive splitting heuristic allowing detail to emerge; and (3) a training objective operating both on the local surfaces defined by vertices as well as the global structure defined by the mesh. Our proposed method is evaluated on the task of 3D object reconstruction from images with the ShapeNet dataset, where we demonstrate state of the art performance, both visually and numerically, while having far smaller space requirements by generating adaptive meshes

2019-05-23

Proceedings of the 36th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Off-Policy Deep Reinforcement Learning without Exploration

Scott Fujimoto

David Meger

Doina Precup

Many practical applications of reinforcement learning constrain agents to learn from a fixed batch of data which has already been gathered, … (see more)without offering further possibility for data collection. In this paper, we demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are incapable of learning with data uncorrelated to the distribution under the current policy, making them ineffective for this fixed batch setting. We introduce a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data. We present the first continuous control deep reinforcement learning algorithm which can learn effectively from arbitrary, fixed batch data, and empirically demonstrate the quality of its behavior in several tasks.

2019-05-23

Proceedings of the 36th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Where Off-Policy Deep Reinforcement Learning Fails

Scott Fujimoto

David Meger

Doina Precup

This work examines batch reinforcement learning–the task of maximally exploiting a given batch of off-policy data, without further data co… (see more)llection. We demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are only capable of learning with data correlated to their current policy, making them ineffective for most off-policy applications. We introduce a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space to force the agent towards behaving on-policy with respect to a subset of the given data. We extend this notion to deep reinforcement learning, and to the best of our knowledge, present the first continuous control deep reinforcement learning algorithm which can learn effectively from uncorrelated off-policy data.

2018-09-26

(published)

openreview.net

Addressing Function Approximation Error in Actor-Critic Methods

Scott Fujimoto

Herke van Hoof

David Meger

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated valu… (see more)e estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.

2018-07-02

Proceedings of the 35th International Conference on Machine Learning (published)

proceedings.mlr.press

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Scott Fujimoto

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Scott Fujimoto

Publications