Somjit Nath

Vincent Michalski

2025-09-02

TMLR (accepted)

Behavioral Suite Analysis of Self-Supervised Learning in Atari

Rishav

Gopeshh Subbaraj

Derek Nowrouzezahrai

2025-06-20

rl-conference.cc/RLC/2025/Workshop/RLVG (accepted)

Spectral Temporal Contrastive Learning

Sacha Morin

Guy Wolf

Learning useful data representations without requiring labels is a cornerstone of modern deep learning. Self-supervised learning methods, pa… (see more)rticularly contrastive learning (CL), have proven successful by leveraging data augmentations to define positive pairs. This success has prompted a number of theoretical studies to better understand CL and investigate theoretical bounds for downstream linear probing tasks. This work is concerned with the temporal contrastive learning (TCL) setting where the sequential structure of the data is used instead to define positive pairs, which is more commonly used in RL and robotics contexts. In this paper, we adapt recent work on Spectral CL to formulate Spectral Temporal Contrastive Learning (STCL). We discuss a population loss based on a state graph derived from a time-homogeneous reversible Markov chain with uniform stationary distribution. The STCL loss enables to connect the linear probing performance to the spectral properties of the graph, and can be estimated by considering previously observed data sequences as an ensemble of MCMC chains.

2023-12-01

ArXiv (preprint)

doi.org

arxiv.org

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Shiva Kanth Sujit

Pedro Braga

Discovering Object-Centric Generalized Value Functions From Pixels

Gopeshh Subbaraj

Khimya Khetarpal

Deep Reinforcement Learning has shown significant progress in extracting useful representations from high-dimensional inputs albeit using ha… (see more)nd-crafted auxiliary tasks and pseudo rewards. Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover meaningful features from objects, translating them to temporally coherent"question"functions and leveraging the subsequent learned general value functions for control. We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation.

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)

doi.org

CAMMARL: Conformal Action Modeling in Multi Agent Reinforcement Learning

Nikunj Gupta

Before taking actions in an environment with more than one intelligent agent, an autonomous agent may benefit from reasoning about the other… (see more) agents and utilizing a notion of a guarantee or confidence about the behavior of the system. In this article, we propose a novel multi-agent reinforcement learning (MARL) algorithm CAMMARL, which involves modeling the actions of other agents in different situations in the form of confident sets, i.e., sets containing their true actions with a high probability. We then use these estimates to inform an agent's decision-making. For estimating such sets, we use the concept of conformal predictions, by means of which, we not only obtain an estimate of the most probable outcome but get to quantify the operable uncertainty as well. For instance, we can predict a set that provably covers the true predictions with high probabilities (e.g., 95%). Through several experiments in two fully cooperative multi-agent tasks, we show that CAMMARL elevates the capabilities of an autonomous agent in MARL by modeling conformal prediction sets over the behavior of other agents in the environment and utilizing such estimates to enhance its policy learning.

2023-06-19

ArXiv (preprint)

doi.org