Nicolas Le Roux

Distributional reinforcement learning with linear function approximation

Subhodeep Moitra

Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited.… (voir plus) One exception is Rowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cramer distance, but their results only apply to the tabular setting and ignore C51's use of a softmax to produce normalized distributions. In this paper we adapt the Cramer distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cramer-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model's prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cramer-based distributional methods may perform worse than directly approximating the value function.

2019-02-08

ArXiv (prépublication)

Distributional reinforcement learning with linear function approximation

Subhodeep Moitra

Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited.… (voir plus) One exception is Rowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cramer distance, but their results only apply to the tabular setting and ignore C51's use of a softmax to produce normalized distributions. In this paper we adapt the Cramer distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cramer-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model's prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cramer-based distributional methods may perform worse than directly approximating the value function.

2019-02-08

ArXiv (preprint)

A Geometric Perspective on Optimal Representations for Reinforcement Learning

Will Dabney

Robert Dadashi

Adrien Ali Taiga

Dale Eric. Schuurmans

Tor Lattimore

Clare Lyle

We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functi… (voir plus)ons. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate predictions regarding a special class of value functions which we call adversarial value functions (AVFs). We demonstrate that using value functions as auxiliary tasks corresponds to an expected-error relaxation of our formulation, with AVFs a natural candidate, and identify a close relationship with proto-value functions (Mahadevan, 2005). We highlight characteristics of AVFs and their usefulness as auxiliary tasks in a series of experiments on the four-room domain.

2019-01-31

ArXiv (prépublication)

A Geometric Perspective on Optimal Representations for Reinforcement Learning

Will Dabney

Robert Dadashi

Adrien Ali Taiga

Dale Schuurmans

Tor Lattimore

Clare Lyle

We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functi… (voir plus)ons. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate predictions regarding a special class of value functions which we call adversarial value functions (AVFs). We demonstrate that using value functions as auxiliary tasks corresponds to an expected-error relaxation of our formulation, with AVFs a natural candidate, and identify a close relationship with proto-value functions (Mahadevan, 2005). We highlight characteristics of AVFs and their usefulness as auxiliary tasks in a series of experiments on the four-room domain.

A Geometric Perspective on Optimal Representations for Reinforcement Learning

Will Dabney

Robert Dadashi

Adrien Ali Taiga

Dale Schuurmans

Tor Lattimore

Clare Lyle

Reducing the variance in online optimization by transporting past gradients

Sébastien M. R. Arnold

Pierre-Antoine Manzagol

Reza Babanezhad Harikandeh

Ioannis Mitliagkas

Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing pas… (voir plus)t gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of implicit gradient transport (IGT) which transforms gradients computed at previous iterates into gradients evaluated at the current iterate without using the Hessian explicitly. In addition to reducing the variance and bias of our updates over time, IGT can be used as a drop-in replacement for the gradient estimate in a number of well-understood methods such as heavy ball or Adam. We show experimentally that it achieves state-of-the-art results on a wide range of architectures and benchmarks. Additionally, the IGT gradient estimator yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal.

Understanding the impact of entropy in policy learning

Zafarali Ahmed

Mohammad Norouzi

Dale Schuurmans

Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{explorat… (voir plus)ion} by encouraging the selection of more stochastic policies. In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We first show that even with access to the exact gradient, policy optimization is difficult due to the geometry of the objective function. Then, we qualitatively show that in some environments, a policy with higher entropy can make the optimization landscape smoother, thereby connecting local optima and enabling the use of larger learning rates. This paper presents new tools for understanding the optimization landscape, shows that policy entropy serves as a regularizer, and highlights the challenge of designing general-purpose policy optimization algorithms.

Combining adaptive algorithms and hypergradient method: a performance and robustness study

Akram Erraqabi

2018-09-27

(publié)

www.semanticscholar.org

Online Hyper-Parameter Optimization

Damien Vincent

Sylvain Gelly

Olivier Bousquet

2018-02-15

(publié)

Online variance-reducing optimization

Reza Babanezhad Harikandeh

Reza Babanezhad

Pierre-Antoine Manzagol

2018-02-12

International Conference on Learning Representations (publié)

Negative eigenvalues of the Hessian in deep neural networks

Guillaume Alain

Pierre-Antoine Manzagol

The loss function of deep networks is known to be non-convex but the precise nature of this nonconvexity is still an active area of research… (voir plus). In this work, we study the loss landscape of deep networks through the eigendecompositions of their Hessian matrix. In particular, we examine how important the negative eigenvalues are and the benefits one can observe in handling them appropriately.

2018-01-01

ICLR (Workshop) (publié)