Pierre Thodoroff

Alumni

Publications

Recurrent Value Functions

Pierre Thodoroff

Nishanth Anand

Lucas Caccia

Doina Precup

Joelle Pineau

Despite recent successes in Reinforcement Learning, value-based methods often suffer from high variance hindering performance. In this paper… (see more), we illustrate this in a continuous control setting where state of the art methods perform poorly whenever sensor noise is introduced. To overcome this issue, we introduce Recurrent Value Functions (RVFs) as an alternative to estimate the value function of a state. We propose to estimate the value function of the current state using the value function of past states visited along the trajectory. Due to the nature of their formulation, RVFs have a natural way of learning an emphasis function that selectively emphasizes important states. First, we establish RVF's asymptotic convergence properties in tabular settings. We then demonstrate their robustness on a partially observable domain and continuous control tasks. Finally, we provide a qualitative interpretation of the learned emphasis function.

2018-12-31

arXiv (preprint)

doi.org

arxiv.org

Temporal Regularization for Markov Decision Process

Several applications of Reinforcement Learning suffer from instability due to high variance. This is especially prevalent in high dimensiona… (see more)l domains. Regularization is a commonly used technique in machine learning to reduce variance, at the cost of introducing some bias. Most existing regularization techniques focus on spatial (perceptual) regularization. Yet in reinforcement learning, due to the nature of the Bellman equation, there is an opportunity to also exploit temporal regularization based on smoothness in value estimates over trajectories. This paper explores a class of methods for temporal regularization. We formally characterize the bias induced by this technique using Markov chain concepts. We illustrate the various characteristics of temporal regularization via a sequence of simple discrete and continuous MDPs, and show that the technique provides improvement even in high-dimensional Atari games.

2017-12-31

Advances in Neural Information Processing Systems 31 (NeurIPS 2018) (published)

dblp.uni-trier.de

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Pierre Thodoroff

Publications

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Popular keywords:

Pierre Thodoroff

Publications