Prakash Panangaden

Christopher G. Lucas

David Abel

Stefano V Albrecht

Extracting relevant information from a stream of high-dimensional observations is a central challenge for deep reinforcement learning agents… (voir plus). Actor-critic algorithms add further complexity to this challenge, as it is often unclear whether the same information will be relevant to both the actor and the critic. To this end, we here explore the principles that underlie effective representations for the actor and for the critic in on-policy algorithms. We focus our study on understanding whether the actor and critic will benefit from separate, rather than shared, representations. Our primary finding is that when separated, the representations for the actor and critic systematically specialise in extracting different types of information from the environment -- the actor's representation tends to focus on action-relevant information, while the critic's representation specialises in encoding value and dynamics information. We conduct a rigourous empirical study to understand how different representation learning approaches affect the actor and critic's specialisations and their downstream performance, in terms of sample efficiency and generation capabilities. Finally, we discover that a separated critic plays an important role in exploration and data collection during training. Our code, trained models and data are accessible at https://github.com/francelico/deac-rep.

2025-03-08

ArXiv (prépublication)

Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning

Samuel Garcin

Trevor McInroe

Christopher G. Lucas

David Abel

Stefano V Albrecht

Extracting relevant information from a stream of high-dimensional observations is a central challenge for deep reinforcement learning agents… (voir plus). Actor-critic algorithms add further complexity to this challenge, as it is often unclear whether the same information will be relevant to both the actor and the critic. To this end, we here explore the principles that underlie effective representations for an actor and for a critic. We focus our study on understanding whether an actor and a critic will benefit from a decoupled, rather than shared, representation. Our primary finding is that when decoupled, the representations for the actor and critic systematically specialise in extracting different types of information from the environment---the actor's representation tends to focus on action-relevant information, while the critic's representation specialises in encoding value and dynamics information. Finally, we demonstrate how these insights help select representation learning objectives that play into the actor's and critic's respective knowledge specialisations, and improve performance in terms of agent returns.

2025-01-22

ICLR.cc/2025/Conference (poster)

openreview.net

Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning

Samuel Garcin

Trevor McInroe

Christopher G. Lucas

David Abel

Stefano V Albrecht

2025-01-22

ICLR.cc/2025/Conference (poster)

openreview.net

Optimal Approximate Minimization of One-Letter Weighted Finite Automata

Clara Lacroce

Borja Balle

Guillaume Rabusseau

2024-11-08

Mathematical Structures in Computer Science (publié)

Conditions on Preference Relations that Guarantee the Existence of Optimal Policies

Jonathan Colaço Carr

Doina Precup

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (publié)

Polynomial Lawvere Logic

Giorgio Bacci

Radu Mardare

Gordon D. Plotkin

2024-02-05

ArXiv (prépublication)

Policy Gradient Methods in the Presence of Symmetries and State Abstractions

Sahand Rezaei-Shoshtari

Sum and Tensor of Quantitative Effects

Giorgio Bacci

Radu Mardare

Gordon Plotkin

2024-01-01

Log. Methods Comput. Sci. (publié)

Behavioural pseudometrics for continuous-time diffusions

Linan Chen

Florence Clerc

2023-12-27

ArXiv (prépublication)

Propositional Logics for the Lawvere Quantale

Giorgio Bacci

Radu Mardare

Gordon Plotkin

2023-11-23

Electronic Notes in Theoretical Informatics and Computer Science (publié)

Behavioural equivalences for continuous-time Markov processes

Linan Chen

Florence Clerc

2023-03-30

Mathematical Structures in Computer Science (publié)

A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

Tyler Kastner

Mark Rowland

We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We define a ne… (voir plus)w metric under this lens that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective enables us to provide new theoretical results, including value-function bounds and low-distortion finite-dimensional Euclidean embeddings, which are crucial when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.

2023-01-01

Trans. Mach. Learn. Res. (publié)