Portrait de Pablo Samuel Castro

Pablo Samuel Castro

Membre industriel principal
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Chercheur scientifique, Google DeepMind
Sujets de recherche
Apprentissage par renforcement

Biographie

Pablo Samuel Castro est né et a grandi à Quito, en Équateur, et a déménagé à Montréal après l'école secondaire pour étudier à l’Université McGill. Il y a obtenu un doctorat en se concentrant sur l'apprentissage par renforcement, sous la supervision de Doina Precup et Prakash Panangaden. Il est chercheur scientifique à Google DeepMind à Montréal. Il s’intéresse particulièrement à la recherche fondamentale sur l'apprentissage par renforcement et plaide régulièrement en faveur d'une augmentation de la représentation des personnes d’origine latino-américaine dans la communauté de recherche. Il est également professeur adjoint au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal. Outre son intérêt pour le codage, l'intelligence artificielle et les mathématiques, Pablo Samuel est un musicien actif.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Visiteur de recherche indépendant - RWTH Aachen University
Maîtrise recherche - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Collaborateur·rice de recherche
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM

Publications

Meta-World+: An Improved, Standardized, RL Benchmark
Reginald McLean
Evangelos Chatzaroulas
Luc McCutcheon
Frank Röder
Tianhe Yu
Zhanpeng He
K.R. Zentner
Ryan Julian
J K Terry
Isaac Woungang
Nariman Farsad
Multi-task reinforcement learning challenges agents to master diverse skills simultaneously, and Meta-World emerged as the gold standard ben… (voir plus)chmark for evaluating these algorithms. However, since the introduction of the Meta-World benchmark there have been numerous undocumented changes which inhibit fair comparison of multi-task and meta reinforcement learning algorithms. This work strives to disambiguate these results from the literature, while also producing an open-source version of Meta-World that has full reproducibility of past results.
Continual Learning in Vision-Language Models via Aligned Model Merging
Ghada Sokar
Anurag Arnab
Ahmet Iscen
Cordelia Schmid
Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors pl… (voir plus)asticity over the stability needed to retain prior knowledge. While existing approaches attempt to mitigate catastrophic forgetting, a bias towards recent tasks persists as they build upon this sequential nature. In this work we present a new perspective based on model merging to maintain stability while still retaining plasticity. Rather than just sequentially updating the model weights, we propose merging newly trained task parameters with previously learned ones, promoting a better balance. To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones, thereby avoiding interference when merging. We evaluate this approach on large Vision-Language Models (VLMs), and demonstrate its effectiveness in reducing forgetting, increasing robustness to various task orders and similarities, and improving generalization.
Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning
Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure m… (voir plus)ode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the combination of non-stationarity with gradient pathologies, due to suboptimal architectural choices, underlie the challenges of scale. We propose a series of direct interventions that stabilize gradient flow, enabling robust performance across a range of network depths and widths. Our interventions are simple to implement and compatible with well-established algorithms, and result in an effective mechanism that enables strong performance even at large scales. We validate our findings on a variety of agents and suites of environments.
Continual Learning in Vision-Language Models via Aligned Model Merging
Ghada Sokar
Anurag Arnab
Ahmet Iscen
Cordelia Schmid
Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors pl… (voir plus)asticity over the stability needed to retain prior knowledge. While existing approaches attempt to mitigate catastrophic forgetting, a bias towards recent tasks persists as they build upon this sequential nature. In this work we present a new perspective based on model merging to maintain stability while still retaining plasticity. Rather than just sequentially updating the model weights, we propose merging newly trained task parameters with previously learned ones, promoting a better balance. To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones, thereby avoiding interference when merging. We evaluate this approach on large Vision-Language Models (VLMs), and demonstrate its effectiveness in reducing forgetting, increasing robustness to various task orders and similarities, and improving generalization.
Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
Joshua Greaves
Ekin Dogus Cubuk
Sergei Kalinin
Igor Mordatch
Kevin M Roccapriore
We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimu… (voir plus)lated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition probabilities. These learned transition dynamics are then leveraged to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
Multi-Task Reinforcement Learning Enables Parameter Scaling
Reginald McLean
Evangelos Chatzaroulas
J K Terry
Isaac Woungang
Nariman Farsad
Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have … (voir plus)focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.
Multi-Task Reinforcement Learning Enables Parameter Scaling
Reginald McLean
Evangelos Chatzaroulas
J K Terry
Isaac Woungang
Nariman Farsad
Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have … (voir plus)focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.
Optimistic critics can empower small actors
Actor-critic methods have been central to many of the recent advances in deep reinforcement learning. The most common approach is to use _sy… (voir plus)mmetric_ architectures, whereby both actor and critic have the same network topology and number of parameters. However, recent works have argued for the advantages of _asymmetric_ setups, specifically with the use of smaller actors. We perform broad empirical investigations and analyses to better understand the implications of this and find that, in general, smaller actors result in performance degradation and overfit critics. Our analyses suggest _poor data collection_, due to value underestimation, as one of the main causes for this behavior, and further highlight the crucial role the critic can play in alleviating this pathology. We explore techniques to mitigate the observed value underestimation, which enables further research in asymmetric actor-critic methods.
Optimistic critics can empower small actors
Actor-critic methods have been central to many of the recent advances in deep reinforcement learning. The most common approach is to use _sy… (voir plus)mmetric_ architectures, whereby both actor and critic have the same network topology and number of parameters. However, recent works have argued for the advantages of _asymmetric_ setups, specifically with the use of smaller actors. We perform broad empirical investigations and analyses to better understand the implications of this and find that, in general, smaller actors result in performance degradation and overfit critics. Our analyses suggest _poor data collection_, due to value underestimation, as one of the main causes for this behavior, and further highlight the crucial role the critic can play in alleviating this pathology. We explore techniques to mitigate the observed value underestimation, which enables further research in asymmetric actor-critic methods.
Discovering Symbolic Cognitive Models from Human and Animal Behavior
Nenad Tomasev
Navodita Sharma
Rishika Mohanta
Aparna Dev
Kuba Perlin
Siddhant Jain
Kyle Levin
Noemi Elteto
Will Dabney
Alexander Novikov
Glenn C Turner
Maria K Eckstein
Nathaniel D. Daw
Kevin J Miller
Kim Stachenfeld
Symbolic models play a key role in cognitive science, expressing computationally precise hypotheses about how the brain implements a cogniti… (voir plus)ve process. Identifying an appropriate model typically requires a great deal of effort and ingenuity on the part of a human scientist. Here, we adapt FunSearch (Romera-Paredes et al. 2024), a recently developed tool that uses Large Language Models (LLMs) in an evolutionary algorithm, to automatically discover symbolic cognitive models that accurately capture human and animal behavior. We consider datasets from three species performing a classic reward-learning task that has been the focus of substantial modeling effort, and find that the discovered programs outperform state-of-the-art cognitive models for each. The discovered programs can readily be interpreted as hypotheses about human and animal cognition, instantiating interpretable symbolic learning and decision-making algorithms. Broadly, these results demonstrate the viability of using LLM-powered program synthesis to propose novel scientific hypotheses regarding mechanisms of human and animal cognition.
Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning
Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning
Ghada Sokar