Nicolas Le Roux

A Geometric Perspective on Optimal Representations for Reinforcement Learning

Marc Gendron-Bellemare

Will Dabney

Robert Dadashi

Adrien Ali Taiga

Pablo Samuel Castro

Dale Eric. Schuurmans

Tor Lattimore

Clare Lyle

We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functi… (see more)ons. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate predictions regarding a special class of value functions which we call adversarial value functions (AVFs). We demonstrate that using value functions as auxiliary tasks corresponds to an expected-error relaxation of our formulation, with AVFs a natural candidate, and identify a close relationship with proto-value functions (Mahadevan, 2005). We highlight characteristics of AVFs and their usefulness as auxiliary tasks in a series of experiments on the four-room domain.

2019-01-31

ArXiv (preprint)

A Geometric Perspective on Optimal Representations for Reinforcement Learning

Marc Gendron-Bellemare

Will Dabney

Robert Dadashi

Adrien Ali Taiga

Pablo Samuel Castro

Dale Schuurmans

Tor Lattimore

Clare Lyle

We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functi… (see more)ons. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate predictions regarding a special class of value functions which we call adversarial value functions (AVFs). We demonstrate that using value functions as auxiliary tasks corresponds to an expected-error relaxation of our formulation, with AVFs a natural candidate, and identify a close relationship with proto-value functions (Mahadevan, 2005). We highlight characteristics of AVFs and their usefulness as auxiliary tasks in a series of experiments on the four-room domain.

A Geometric Perspective on Optimal Representations for Reinforcement Learning

Marc Gendron-Bellemare

Will Dabney

Robert Dadashi

Adrien Ali Taiga

Pablo Samuel Castro

Dale Schuurmans

Tor Lattimore

Clare Lyle

Reducing the variance in online optimization by transporting past gradients

Sébastien M. R. Arnold

Pierre-Antoine Manzagol

Reza Babanezhad Harikandeh

Ioannis Mitliagkas

Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing pas… (see more)t gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of implicit gradient transport (IGT) which transforms gradients computed at previous iterates into gradients evaluated at the current iterate without using the Hessian explicitly. In addition to reducing the variance and bias of our updates over time, IGT can be used as a drop-in replacement for the gradient estimate in a number of well-understood methods such as heavy ball or Adam. We show experimentally that it achieves state-of-the-art results on a wide range of architectures and benchmarks. Additionally, the IGT gradient estimator yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal.

Understanding the impact of entropy in policy learning

Zafarali Ahmed

Mohammad Norouzi

Dale Schuurmans

Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{explorat… (see more)ion} by encouraging the selection of more stochastic policies. In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We first show that even with access to the exact gradient, policy optimization is difficult due to the geometry of the objective function. Then, we qualitatively show that in some environments, a policy with higher entropy can make the optimization landscape smoother, thereby connecting local optima and enabling the use of larger learning rates. This paper presents new tools for understanding the optimization landscape, shows that policy entropy serves as a regularizer, and highlights the challenge of designing general-purpose policy optimization algorithms.

Combining adaptive algorithms and hypergradient method: a performance and robustness study

Akram Erraqabi

2018-09-27

(published)

www.semanticscholar.org

Online Hyper-Parameter Optimization

Damien Vincent

Sylvain Gelly

Olivier Bousquet

2018-02-15

(published)

Online variance-reducing optimization

Reza Babanezhad Harikandeh

Reza Babanezhad

Pierre-Antoine Manzagol

2018-02-12

International Conference on Learning Representations (published)

Negative eigenvalues of the Hessian in deep neural networks

Guillaume Alain

Pierre-Antoine Manzagol

The loss function of deep networks is known to be non-convex but the precise nature of this nonconvexity is still an active area of research… (see more). In this work, we study the loss landscape of deep networks through the eigendecompositions of their Hessian matrix. In particular, we examine how important the negative eigenvalues are and the benefits one can observe in handling them appropriately.

2018-01-01

ICLR (Workshop) (published)

BOUNDS LEAD TO IMPROVED CLASSIFIERS