Tyler Kastner

Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics

Mark Rowland

Yunhao Tang

Murat A Erdogdu

We study the problem of distributional reinforcement learning using categorical parametrisations and a KL divergence loss. Previous work ana… (see more)lyzing categorical distributional RL has done so using a Cramér distance-based loss, simplifying the analysis but creating a theory-practice gap. We introduce a preconditioned version of the algorithm, and prove that it is guaranteed to converge. We further derive the asymptotic variance of the categorical estimates under different learning rate regimes, and compare to that of classical reinforcement learning. We finally empirically validate our theoretical results and perform an empirical investigation into the relative strengths of using KL losses, and derive a number of actionable insights for practitioners.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

proceedings.mlr.press

Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics

Tyler Kastner

Mark Rowland

Yunhao Tang

Murat A Erdogdu

Amir-massoud Farahmand

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

proceedings.mlr.press

When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning

Claas Voelcker

Tyler Kastner

Igor Gilitschenski

Amir-massoud Farahmand

We investigate the impact of auxiliary learning tasks such as observation reconstruction and latent self-prediction on the representation le… (see more)arning problem in reinforcement learning. We also study how they interact with distractions and observation functions in the MDP. We provide a theoretical analysis of the learning dynamics of observation reconstruction, latent self-prediction, and TD learning in the presence of distractions and observation functions under linear model assumptions. With this formalization, we are able to explain why latent-self prediction is a helpful \emph{auxiliary task}, while observation reconstruction can provide more useful features when used in isolation. Our empirical analysis shows that the insights obtained from our learning dynamics framework predicts the behavior of these loss functions beyond the linear model assumption in non-linear neural networks. This reinforces the usefulness of the linear model framework not only for theoretical analysis, but also practical benefit for applied problems.

2024-06-01

arXiv (published)

doi.org

arxiv.org

A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

Pablo Samuel Castro

Tyler Kastner

Prakash Panangaden

Mark Rowland

We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We define a ne… (see more)w metric under this lens that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective enables us to provide new theoretical results, including value-function bounds and low-distortion finite-dimensional Euclidean embeddings, which are crucial when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.

2023-01-01

Trans. Mach. Learn. Res. (published)

doi.org

openreview.net

MICo: Improved representations via sampling-based state similarity for Markov decision processes

Pablo Samuel Castro

Tyler Kastner

Prakash Panangaden

Mark Rowland

We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effe… (see more)ctive means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analyses, we provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark.

openreview.net

MICo: Learning improved representations via sampling-based state similarity for Markov decision processes

Pablo Samuel Castro

Tyler Kastner

Prakash Panangaden

Mark Rowland

We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an eﬀ… (see more)ective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically diﬃcult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analysis

2021-01-01

arXiv.org (preprint)

dblp.uni-trier.de

Mila AI Policy Conference

Leading in a New Era

TRAIL: Responsible AI for Professionals and Leaders

Tyler Kastner

Publications

Mila AI Policy Conference

Leading in a New Era

TRAIL: Responsible AI for Professionals and Leaders

Popular keywords:

Tyler Kastner

Publications