Harley Wiltzer

Doctorat - McGill

Superviseur⋅e principal⋅e

David Meger

Co-supervisor

Marc Gendron-Bellemare

Sujets de recherche

Apprentissage par renforcement

Modèles probabilistes

Systèmes dynamiques

Site web

Google Scholar

GitHub

Publications

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Bellemare Marc-Emmanuel

Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In th… (voir plus)is work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

doi.org

openreview.net

Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning

Harley Wiltzer

David Meger

Bellemare Marc-Emmanuel

Continuous-time reinforcement learning offers an appealing formalism for describing control problems in which the passage of time is not nat… (voir plus)urally divided into discrete increments. Here we consider the problem of predicting the distribution of returns obtained by an agent interacting in a continuous-time, stochastic environment. Accurate return predictions have proven useful for determining optimal policies for risk-sensitive control, learning state representations, multiagent coordination, and more. We begin by establishing the distributional analogue of the Hamilton-Jacobi-Bellman (HJB) equation for Itô diffusions and the broader class of Feller-Dynkin processes. We then specialize this equation to the setting in which the return distribution is approximated by

2022-06-27

Proceedings of the 39th International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Harley Wiltzer

Publications

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Mots-clés populaires:

Harley Wiltzer

Publications