Portrait of Pablo Samuel Castro

Pablo Samuel Castro

Core Industry Member
Adjunct professor, Université de Montréal, Department of Computer Science and Operations Research
Research Scientist, Google DeepMind
Research Topics
Reinforcement Learning

Biography

Pablo Samuel Castro was born and raised in Quito, Ecuador, and moved to Montréal after high school to study at McGill University. For his PhD, he studied reinforcement learning with Doina Precup and Prakash Panangaden at McGill. Castro has been working at Google for over eleven years. He is currently a staff research scientist at Google DeepMind in Montreal, where he conducts fundamental reinforcement learning research and is a regular advocate for increasing LatinX representation in the research community.

He is also an adjunct professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal. In addition to his interest in coding, AI and math, Castro is an active musician.

Current Students

PhD - Université de Montréal
Principal supervisor :
Master's Research - Université de Montréal
Collaborating researcher
PhD - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal

Publications

The Dormant Neuron Phenomenon in Deep Reinforcement Learning
Ghada Sokar
Utku Evci
In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing n… (see more)umber of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (ReDo) that Recycles Dormant neurons throughout training. Our experiments demonstrate that ReDo maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance.
The Small Batch Size Anomaly in Multistep Deep Reinforcement Learning
Johan Samir Obando Ceron
Variance Double-Down: The Small Batch Size Anomaly in Multistep Deep Reinforcement Learning
Johan Samir Obando Ceron
In deep reinforcement learning, multi-step learning is almost unavoidable to achieve state-of-the-art performance. However, the increased va… (see more)riance that multistep learning brings makes it difficult to increase the update horizon beyond relatively small numbers. In this paper, we report the counterintuitive finding that decreasing the batch size parameter improves the performance of many standard deep RL agents that use multi-step learning. It is well-known that gradient variance decreases with increasing batch sizes, so obtaining improved performance by increasing variance on two fronts is a rather surprising finding. We conduct a broad set of experiments to better understand what we call the variance doubledown phenomenon.
A general class of surrogate functions for stable and efficient reinforcement learning
Sharan Vaswani
Olivier Bachem
Simone Totaro
Robert Lynn Mueller
Shivam Garg
Matthieu Geist
Marlos C. Machado
Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress
Metrics and continuity in reinforcement learning
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generali… (see more)zation, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states. PSM assigns high similarity to states for which the optimal policies in those states as well as in future states are similar. We also present a contrastive representation learning procedure to embed any state similarity metric, which we instantiate with PSM to obtain policy similarity embeddings (PSEs). We demonstrate that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jumping task from pixels, and Distracting DM Control Suite.
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. M… (see more)ost published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field. This work received an outstanding paper award at NeurIPS 2021.
MICo: Improved representations via sampling-based state similarity for Markov decision processes
Tyler Kastner
Mark Rowland
We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effe… (see more)ctive means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analyses, we provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark.
MICo: Learning improved representations via sampling-based state similarity for Markov decision processes
Tyler Kastner
Mark Rowland
We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an eff… (see more)ective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analysis
Autonomous navigation of stratospheric balloons using reinforcement learning
S. Candido
Jun Gong
Marlos C. Machado
Subhodeep Moitra
Sameera S. Ponda
Ziyun Wang
An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
Felipe Petroski Such
Vashisht Madhavan
Rosanne Liu
Rui Wang
Yulun Li
Jiale Zhi
Ludwig Schubert
Jeff Clune
Joel Lehman
Much human and computational effort has aimed to improve how deep reinforcement learning (DRL) algorithms perform on benchmarks such as the … (see more)Atari Learning Environment. Comparatively less effort has focused on understanding what has been learned by such methods, and investigating and comparing the representations learned by different families of DRL algorithms. Sources of friction include the onerous computational requirements, and general logistical and architectural complications for running DRL algorithms at scale. We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous DRL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models. This paper introduces the Atari Zoo framework, which contains models trained across benchmark Atari games, in an easy-to-use format, as well as code that implements common modes of analysis and connects such models to a popular neural network visualization library. Further, to demonstrate the potential of this dataset and software package, we show initial quantitative and qualitative comparisons between the performance and representations of several DRL algorithms, highlighting interesting and previously unknown distinctions between them.