Glen Berseth

Curiosity-Driven Exploration via Temporal \\ Contrastive Learning

Faisal Mohamed

Catherine Ji

Benjamin Eysenbach

Exploration remains a key challenge in reinforcement learning (RL), especially in long-horizon tasks and environments with high-dimensional … (see more)observations. A common strategy for effective exploration is to promote state coverage or novelty, which often involves estimating the agent's state visitation distribution. In this paper, we propose \textbf{C}uriosity-Driven Exploration via \textbf{Te}mporal \textbf{C}ontrastive Learning (\methodName), an exploration method based on temporal contrastive learning that rewards agents for reaching states with unexpected futures. This incentivizes uncovering meaningful less-visited states. \methodName is simple and does not require explicit density or uncertainty estimation, while learning representations aligned with the RL objective. It consistently outperforms standard baselines in complex mazes using different embodiments (Ant and Humanoid) and robotic manipulation tasks, while also yielding more diverse behaviors in Craftax without requiring task-specific information.

2025-07-01

rl-conference.cc/RLC/2025/Workshop/RLBrew (published)

openreview.net

Curiosity-Driven Exploration via Temporal Contrastive Learning

Faisal Mohamed

Catherine Ji

Benjamin Eysenbach

Glen Berseth

Effective exploration in reinforcement learning requires keeping track not just of where the agent has been, but also of how the agent think… (see more)s about and represents the world: an agent should explore states that enable it to learn powerful representations. Temporal representations can include the information required to solve any potential task while avoiding the computational cost of reconstruction. In this paper, we propose an exploration method that uses temporal contrastive representations to drive exploration, maximizing coverage as seen through the lens of these temporal representations. We demonstrate complex exploration behaviors in locomotion, manipulation, and embodied-AI tasks, revealing previously unknown capabilities and behaviors once achievable only via extrinsic rewards.

2025-07-01

rl-conference.cc/RLC/2025/Workshop/RLBrew (published)

openreview.net

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

Daniel Lawson

Adriana Hugessen

Charlotte Cloutier

Glen Berseth

Khimya Khetarpal

Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (see more)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,

2025-07-01

rl-conference.cc/RLC/2025/Workshop/RLBrew (published)

doi.org

openreview.net

Zero-Shot Constraint Satisfaction with Forward- Backward Representations

Adriana Hugessen

Harley Wiltzer

Cyrus Neary

Amy Zhang

Glen Berseth

Traditionally, constrained policy optimization with Reinforcement Learning (RL) requires learning a new policy from scratch for any new envi… (see more)ronment, goal or cost function, with limited generalization to new tasks and constraints. Given the sample inefficiency of many common deep RL methods, this procedure can be impractical for many real-world scenarios, particularly when constraints or tasks are changing. As an alternative, in the unconstrained setting, various works have sought to pre-train representations from offline datasets to accelerate policy optimization upon specification of a reward. Such methods can permit faster adaptation to new tasks in a given environment, dramatically improving sample efficiency. Recently, zero-shot policy optimization has been explored by leveraging a particular

2025-07-01

rl-conference.cc/RLC/2025/Workshop/RLBrew (published)

openreview.net

Scalable Tree Search over Graphs with Learned Action Pruning for Power Grid Control

Florence Cloutier

Cyrus Neary

Adriana Hugessen

Viktor Todosijević

Zina Kamel

Glen Berseth

As real-world infrastructure systems become increasingly complex and large-scale, there is a growing need for learning-based control strateg… (see more)ies that can make informed decisions in complex and dynamic environments. However, large-scale problems — such as power grid control — introduce high-dimensional action spaces and necessitate transferability across varying grid topologies. We introduce **H**ierarchical **E**xpert-Guided **R**econfiguration **O**ptimization for **G**raph **T**opologies, **HERO-GT**, a model-based planning approach that combines a pretrained graph neural network (GNN) for topology-aware action pruning with a Monte Carlo Tree Search (MCTS) planner for targeted, structured exploration. More specifically, the high-level GNN predicts a promising subset of actions, which the low-level MCTS agent uses to focus its search and reduce computational overhead while remaining adaptable to unseen graph structures. Furthermore, the MCTS planner leverages a given *default policy*---which may be defined, for example, by heuristics, problem relaxations, or rule-based methods---to bias the search and prioritize actions that are expected to improve performance over the default. We deploy HERO-GT in power grid environments, demonstrating that it not only improves over a strong default policy, but also scales to a realistic operational setting where exhaustive search becomes computationally infeasible.

2025-06-17

rl-conference.cc/RLC/2025/Workshop/RL4RS (published)

openreview.net

Exploration by Exploitation: Curriculum Learning for Reinforcement Learning Agents through Competence-Based Curriculum Policy Search

Tabitha Edith Lee

Nan Rosemary Ke

Sarvesh Patil

Annya Dahmani

Eunice Yiu

Esra'a Saleh

Alison Gopnik

Oliver Kroemer

Glen Berseth

2025-06-12

ICML.cc/2025/Workshop/EXAIT (poster)

openreview.net

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

Daniel Lawson

Adriana Hugessen

Charlotte Cloutier

Glen Berseth

Khimya Khetarpal

Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (see more)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,

2025-06-11

ArXiv (preprint)

arxiv.org

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Roger Creus Castanyer

Johan Samir Obando Ceron

Lu Liu

Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure m… (see more)ode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the combination of non-stationarity with gradient pathologies, due to suboptimal architectural choices, underlie the challenges of scale. We propose a series of direct interventions that stabilize gradient flow, enabling robust performance across a range of network depths and widths. Our interventions are simple to implement and compatible with well-established algorithms, and result in an effective mechanism that enables strong performance even at large scales. We validate our findings on a variety of agents and suites of environments.

2025-06-01

arXiv (published)

doi.org

arxiv.org

Efficient Morphology-Aware Policy Transfer to New Embodiments

Michael Przystupa

Hongyao Tang

Glen Berseth

Mariano Phielipp

Santiago Miret

Martin Jägersand

Matthew E. Taylor

Morphology-aware policy learning is a means of enhancing policy sample efficiency by aggregating data from multiple agents. These types of p… (see more)olicies have previously been shown to help generalize over dynamic, kinematic, and limb configuration variations between agent morphologies. Unfortunately, these policies still have sub-optimal zero-shot performance compared to end-to-end finetuning on morphologies at deployment. This limitation has ramifications in practical applications such as robotics because further data collection to perform end-to-end finetuning can be computationally expensive. In this work, we investigate combining morphology-aware pretraining with \textit{parameter efficient finetuning} (PEFT) techniques to help reduce the learnable parameters necessary to specialize a morphology-aware policy to a target embodiment. We compare directly tuning sub-sets of model weights, input learnable adapters, and prefix tuning techniques for online finetuning. Our analysis reveals that PEFT techniques in conjunction with policy pre-training generally help reduce the number of samples to necessary to improve a policy compared to training models end-to-end from scratch. We further find that tuning as few as less than 1\% of total parameters will improve policy performance compared the zero-shot performance of the base pretrained a policy.

2025-05-09

rl-conference.cc/RLC/2025/Conference (accepted)

openreview.net

Efficient Morphology-Aware Policy Transfer to New Embodiments

Michael Przystupa

Hongyao Tang

Glen Berseth

Mariano Phielipp

Santiago Miret

Martin Jägersand

Matthew E. Taylor

Morphology-aware policy learning is a means of enhancing policy sample efficiency by aggregating data from multiple agents. These types of p… (see more)olicies have previously been shown to help generalize over dynamic, kinematic, and limb configuration variations between agent morphologies. Unfortunately, these policies still have sub-optimal zero-shot performance compared to end-to-end finetuning on morphologies at deployment. This limitation has ramifications in practical applications such as robotics because further data collection to perform end-to-end finetuning can be computationally expensive. In this work, we investigate combining morphology-aware pretraining with \textit{parameter efficient finetuning} (PEFT) techniques to help reduce the learnable parameters necessary to specialize a morphology-aware policy to a target embodiment. We compare directly tuning sub-sets of model weights, input learnable adapters, and prefix tuning techniques for online finetuning. Our analysis reveals that PEFT techniques in conjunction with policy pre-training generally help reduce the number of samples to necessary to improve a policy compared to training models end-to-end from scratch. We further find that tuning as few as less than 1\% of total parameters will improve policy performance compared the zero-shot performance of the base pretrained a policy.

2025-05-09

rl-conference.cc/RLC/2025/Conference (published)

openreview.net

Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn

Hongyao Tang

Johan Samir Obando Ceron

Pablo Samuel Castro

Aaron Courville

Glen Berseth

Plasticity, or the ability of an agent to adapt to new tasks, environments, or distributions, is crucial for continual learning. In this pap… (see more)er, we study the loss of plasticity in deep continual RL from the lens of churn: network output variability for out-of-batch data induced by mini-batch training. We demonstrate that (1) the loss of plasticity is accompanied by the exacerbation of churn due to the gradual rank decrease of the Neural Tangent Kernel (NTK) matrix; (2) reducing churn helps prevent rank collapse and adjusts the step size of regular RL gradients adaptively. Moreover, we introduce Continual Churn Approximated Reduction (C-CHAIN) and demonstrate it improves learning performance and outperforms baselines in a diverse range of continual learning environments on OpenAI Gym Control, ProcGen, DeepMind Control Suite, and MinAtar benchmarks.

2025-05-01

ICML.cc/2025/Conference (poster)

doi.org

openreview.net

Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models

Siddarth Venkatraman

Mohsin Hasan

Minsu Kim

Luca Scimeca

Marcin Sendera

Yoshua Bengio

Glen Berseth

Nikolay Malkin

Any well-behaved generative model over a variable …

2025-05-01

ICML.cc/2025/Conference (poster)

doi.org

openreview.net

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Glen Berseth

Biography

Current Students

Blog Posts

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Glen Berseth

Biography

Current Students

Blog Posts

Publications