Portrait de Harley Wiltzer n'est pas disponible

Harley Wiltzer

Doctorat - McGill University
Superviseur⋅e principal⋅e
Co-superviseur⋅e

Publications

A Distributional Analogue to the Successor Representation
Harley Wiltzer
Jesse Farebrother
Arthur Gretton
Yunhao Tang
Andre Barreto
Will Dabney
Mark Rowland
This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure … (voir plus)and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.
Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
Nathan Rahn
Pierluca D'Oro
Harley Wiltzer
Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In th… (voir plus)is work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.