Pablo Samuel Castro

A Geometric Lens on RL Environment Complexity Based on Ricci Curvature

Ali Saheb Pasand

We introduce Ollivier-Ricci Curvature (ORC) as an information-geometric tool for analyzing the local structure of reinforcement learning (RL… (see more)) environments. We establish a novel connection between ORC and the Successor Representation (SR), enabling a geometric interpretation of environment dynamics decoupled from reward signals. Our analysis shows that states with positive and negative ORC values correspond to regions where random walks converge and diverge respectively, which are often critical for effective exploration. ORC is highly correlated with established environment complexity metrics, yet integrates naturally with standard RL frameworks based on SR and provides both global and local complexity measures. Leveraging this property, we propose an ORC-based intrinsic reward that guides agents toward divergent regions and away from convergent traps. Empirical results demonstrate that our curvature-driven reward substantially improves exploration performance across diverse environments, outperforming both random and count-based intrinsic reward baselines.

2025-07-01

rl-conference.cc/RLC/2025/Workshop/RLBrew (published)

openreview.net

A Survey of State Representation Learning for Deep Reinforcement Learning

Ayoub Echchahed

Pablo Samuel Castro

Representation learning methods are an important tool for addressing the challenges posed by complex observations spaces in sequential decis… (see more)ion making problems. Recently, many methods have used a wide variety of types of approaches for learning meaningful state representations in reinforcement learning, allowing better sample efficiency, generalization, and performance. This survey aims to provide a broad categorization of these methods within a model-free online setting, exploring how they tackle the learning of state representations differently. We categorize the methods into six main classes, detailing their mechanisms, benefits, and limitations. Through this taxonomy, our aim is to enhance the understanding of this field and provide a guide for new researchers. We also discuss techniques for assessing the quality of representations, and detail relevant future directions.

2025-06-23

TMLR (accepted)

openreview.net

Continual Learning in Vision-Language Models via Aligned Model Merging

Ghada Sokar

Gintare Karolina Dziugaite

Anurag Arnab

Ahmet Iscen

Pablo Samuel Castro

Cordelia Schmid

Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors pl… (see more)asticity over the stability needed to retain prior knowledge. While existing approaches attempt to mitigate catastrophic forgetting, a bias towards recent tasks persists as they build upon this sequential nature. In this work we present a new perspective based on model merging to maintain stability while still retaining plasticity. Rather than just sequentially updating the model weights, we propose merging newly trained task parameters with previously learned ones, promoting a better balance. To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones, thereby avoiding interference when merging. We evaluate this approach on large Vision-Language Models (VLMs), and demonstrate its effectiveness in reducing forgetting, increasing robustness to various task orders and similarities, and improving generalization.

2025-06-01

arXiv (published)

doi.org

arxiv.org

Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

Roger Creus Castanyer

Johan Samir Obando Ceron

Lu Liu

Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure m… (see more)ode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the combination of non-stationarity with gradient pathologies, due to suboptimal architectural choices, underlie the challenges of scale. We propose a series of direct interventions that stabilize gradient flow, enabling robust performance across a range of network depths and widths. Our interventions are simple to implement and compatible with well-established algorithms, and result in an effective mechanism that enables strong performance even at large scales. We validate our findings on a variety of agents and suites of environments.

2025-06-01

arXiv (published)

doi.org

arxiv.org

Continual Learning in Vision-Language Models via Aligned Model Merging

Ghada Sokar

Gintare Karolina Dziugaite

Anurag Arnab

Ahmet Iscen

Pablo Samuel Castro

Cordelia Schmid

Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors pl… (see more)asticity over the stability needed to retain prior knowledge. While existing approaches attempt to mitigate catastrophic forgetting, a bias towards recent tasks persists as they build upon this sequential nature. In this work we present a new perspective based on model merging to maintain stability while still retaining plasticity. Rather than just sequentially updating the model weights, we propose merging newly trained task parameters with previously learned ones, promoting a better balance. To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones, thereby avoiding interference when merging. We evaluate this approach on large Vision-Language Models (VLMs), and demonstrate its effectiveness in reducing forgetting, increasing robustness to various task orders and similarities, and improving generalization.

2025-05-30

ArXiv (preprint)

arxiv.org

Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

Max Schwarzer

Jesse Farebrother

Joshua Greaves

Ekin Dogus Cubuk

Rishabh Agarwal

Aaron Courville

Marc Gendron-Bellemare

Sergei Kalinin

Igor Mordatch

Pablo Samuel Castro

Kevin M Roccapriore

We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimu… (see more)lated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition probabilities. These learned transition dynamics are then leveraged to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.

2025-05-20

Advanced Materials Interfaces (published)

doi.org

arxiv.org

Multi-Task Reinforcement Learning Enables Parameter Scaling

Reginald McLean

Evangelos Chatzaroulas

J K Terry

Isaac Woungang

Nariman Farsad

Pablo Samuel Castro

Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have … (see more)focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.

2025-05-09

rl-conference.cc/RLC/2025/Conference (accepted)

openreview.net

Multi-Task Reinforcement Learning Enables Parameter Scaling

Reginald McLean

Evangelos Chatzaroulas

J K Terry

Isaac Woungang

Nariman Farsad

Pablo Samuel Castro

Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have … (see more)focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.

2025-05-09

rl-conference.cc/RLC/2025/Conference (published)

openreview.net

Optimistic critics can empower small actors

Olya Mastikhina

Dhruv Sreenivas

Pablo Samuel Castro

Actor-critic methods have been central to many of the recent advances in deep reinforcement learning. The most common approach is to use _sy… (see more)mmetric_ architectures, whereby both actor and critic have the same network topology and number of parameters. However, recent works have argued for the advantages of _asymmetric_ setups, specifically with the use of smaller actors. We perform broad empirical investigations and analyses to better understand the implications of this and find that, in general, smaller actors result in performance degradation and overfit critics. Our analyses suggest _poor data collection_, due to value underestimation, as one of the main causes for this behavior, and further highlight the crucial role the critic can play in alleviating this pathology. We explore techniques to mitigate the observed value underestimation, which enables further research in asymmetric actor-critic methods.

2025-05-09

rl-conference.cc/RLC/2025/Conference (accepted)

openreview.net

Optimistic critics can empower small actors

Olya Mastikhina

Dhruv Sreenivas

Pablo Samuel Castro

Actor-critic methods have been central to many of the recent advances in deep reinforcement learning. The most common approach is to use _sy… (see more)mmetric_ architectures, whereby both actor and critic have the same network topology and number of parameters. However, recent works have argued for the advantages of _asymmetric_ setups, specifically with the use of smaller actors. We perform broad empirical investigations and analyses to better understand the implications of this and find that, in general, smaller actors result in performance degradation and overfit critics. Our analyses suggest _poor data collection_, due to value underestimation, as one of the main causes for this behavior, and further highlight the crucial role the critic can play in alleviating this pathology. We explore techniques to mitigate the observed value underestimation, which enables further research in asymmetric actor-critic methods.

2025-05-09

rl-conference.cc/RLC/2025/Conference (published)

openreview.net

Discovering Symbolic Cognitive Models from Human and Animal Behavior

Pablo Samuel Castro

Nenad Tomasev

Ankit Anand

Navodita Sharma

Rishika Mohanta

Aparna Dev

Kuba Perlin

Siddhant Jain

Kyle Levin

Noemi Elteto

Will Dabney

Alexander Novikov

Glenn C Turner

Maria K Eckstein

Nathaniel D. Daw

Kevin J Miller

Kim Stachenfeld

Symbolic models play a key role in cognitive science, expressing computationally precise hypotheses about how the brain implements a cogniti… (see more)ve process. Identifying an appropriate model typically requires a great deal of effort and ingenuity on the part of a human scientist. Here, we adapt FunSearch (Romera-Paredes et al. 2024), a recently developed tool that uses Large Language Models (LLMs) in an evolutionary algorithm, to automatically discover symbolic cognitive models that accurately capture human and animal behavior. We consider datasets from three species performing a classic reward-learning task that has been the focus of substantial modeling effort, and find that the discovered programs outperform state-of-the-art cognitive models for each. The discovered programs can readily be interpreted as hypotheses about human and animal cognition, instantiating interpretable symbolic learning and decision-making algorithms. Broadly, these results demonstrate the viability of using LLM-powered program synthesis to propose novel scientific hypotheses regarding mechanisms of human and animal cognition.

2025-05-01

ICML.cc/2025/Conference (poster)

openreview.net

Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning

Jiashun Liu

Zihao Wu

Johan Samir Obando Ceron

Pablo Samuel Castro

Aaron Courville

Ling Pan

2025-05-01

arXiv (published)

doi.org

arxiv.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Pablo Samuel Castro

Biography

Current Students

Blog Posts

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Pablo Samuel Castro

Biography

Current Students

Blog Posts

Publications