Portrait of Pablo Samuel Castro

Pablo Samuel Castro

Core Industry Member
Adjunct professor, Université de Montréal, Department of Computer Science and Operations Research
Research Scientist, Google DeepMind
Research Topics
Reinforcement Learning

Biography

Pablo Samuel Castro was born and raised in Quito, Ecuador, and moved to Montréal after high school to study at McGill University. For his PhD, he studied reinforcement learning with Doina Precup and Prakash Panangaden at McGill. Castro has been working at Google for over eleven years. He is currently a staff research scientist at Google DeepMind in Montreal, where he conducts fundamental reinforcement learning research and is a regular advocate for increasing LatinX representation in the research community.

He is also an adjunct professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal. In addition to his interest in coding, AI and math, Castro is an active musician.

Current Students

PhD - Université de Montréal
Principal supervisor :
Independent visiting researcher - RWTH Aachen University
Master's Research - Université de Montréal
PhD - Université de Montréal
Collaborating researcher
PhD - Université de Montréal
Principal supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal

Publications

JaxPruner: A concise library for sparsity research
Joo Hyung Lee
Wonpyo Park
Nicole Elyse Mitchell
Han-Byul Kim
Namhoon Lee
Elias Frantar
Yun Long
Amir Yazdanbakhsh
Shivani Agrawal
Suvinay Subramanian
Sheng-Chun Kao
Xingyao Zhang
Trevor Gale
Aart J.C. Bik
Woohyun Han
Milen Ferev
Zhonglin Han … (see 5 more)
Hong-Seok Kim
Utku Evci
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims … (see more)to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks.
Mixture of Experts in a Mixture of RL settings
On the consistency of hyper-parameter selection in value-based deep reinforcement learning
Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and car… (see more)eful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed technique. Despite their crucial impact on performance, hyper-parameter choices are frequently overshadowed by algorithmic advancements. This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters. Our findings not only help establish which hyper-parameters are most critical to tune, but also help clarify which tunings remain consistent across different training regimes.
Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
Joshua Greaves
Kevin Roccapriore
Ekin Dogus Cubuk
Bellemare Marc-Emmanuel
Sergei Kalinin
Igor Mordatch
We introduce a machine learning approach to determine the transition rates of silicon atoms on a single layer of carbon atoms, when stimulat… (see more)ed by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition rates. These rates are then applied to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks
Maxime Chevalier-Boisvert
Bolun Dai
Mark Towers
Rodrigo De Lazcano Perez-Vicente
Suman Pal
J K Terry
We present the Minigrid and Miniworld libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly… (see more) created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas. In this paper, we outline the design philosophy, environment details, and their world generation API. We also showcase the additional capabilities brought by the unified API between Minigrid and Miniworld through case studies on transfer learning (for both RL agents and humans) between the different observation spaces. The source code of Minigrid and Miniworld can be found at https://github.com/Farama-Foundation/Minigrid and https://github.com/Farama-Foundation/Miniworld along with their documentation at https://minigrid.farama.org/ and https://miniworld.farama.org/.
Small batch deep reinforcement learning
Johan Obando-Ceron
Bellemare Marc-Emmanuel
In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each … (see more)gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon.
Offline Reinforcement Learning with On-Policy Q-Function Regularization
Laixi Shi
Robert Dadashi
Yuejie Chi
Matthieu Geist
The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the … (see more)distribution shift between the history dataset and the desired policy. A large portion of prior work tackles this challenge by implicitly/explicitly regularizing the learning policy towards the behavior policy, which is hard to estimate reliably in practice. In this work, we propose to regularize towards the Q-function of the behavior policy instead of the behavior policy itself, under the premise that the Q-function can be estimated more reliably and easily by a SARSA-style estimate and handles the extrapolation error more straightforwardly. We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.
Discovering the Electron Beam Induced Transition Rates for Silicon Dopants in Graphene with Deep Neural Networks in the STEM
Kevin M Roccapriore
Joshua Greaves
Colton Bishop
Maxim Ziatdinov
Igor Mordatch
Ekin D Cubuk
Bellemare Marc-Emmanuel
Sergei V Kalinin
Journal Article Discovering the Electron Beam Induced Transition Rates for Silicon Dopants in Graphene with Deep Neural Networks in the STEM… (see more) Get access Kevin M Roccapriore, Kevin M Roccapriore Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, United States Search for other works by this author on: Oxford Academic Google Scholar Max Schwarzer, Max Schwarzer Mila - Québec AI Institute, Montréal, QC, CanadaDepartment of Computer Science and Operations Research, Université de Montréal, Montréal, QC, CanadaGoogle Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Joshua Greaves, Joshua Greaves Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Jesse Farebrother, Jesse Farebrother Mila - Québec AI Institute, Montréal, QC, CanadaGoogle Research, Brain TeamSchool of Computer Science, McGill University, Montréal, QC, Canada Search for other works by this author on: Oxford Academic Google Scholar Rishabh Agarwal, Rishabh Agarwal Mila - Québec AI Institute, Montréal, QC, CanadaDepartment of Computer Science and Operations Research, Université de Montréal, Montréal, QC, CanadaGoogle Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Colton Bishop, Colton Bishop Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Maxim Ziatdinov, Maxim Ziatdinov Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, United StatesComputational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States Search for other works by this author on: Oxford Academic Google Scholar Igor Mordatch, Igor Mordatch Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Ekin D Cubuk, Ekin D Cubuk Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Aaron Courville, Aaron Courville Mila - Québec AI Institute, Montréal, QC, CanadaDepartment of Computer Science and Operations Research, Université de Montréal, Montréal, QC, Canada Search for other works by this author on: Oxford Academic Google Scholar ... Show more Pablo Samuel Castro, Pablo Samuel Castro Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Marc G Bellemare, Marc G Bellemare Mila - Québec AI Institute, Montréal, QC, CanadaGoogle Research, Brain TeamSchool of Computer Science, McGill University, Montréal, QC, Canada Search for other works by this author on: Oxford Academic Google Scholar Sergei V Kalinin Sergei V Kalinin Department of Materials Science and Engineering, University of Tennessee, Knoxville TN, United States Corresponding author: sergei2@utk.edu Search for other works by this author on: Oxford Academic Google Scholar Microscopy and Microanalysis, Volume 29, Issue Supplement_1, 1 August 2023, Pages 1932–1933, https://doi.org/10.1093/micmic/ozad067.1000 Published: 22 July 2023
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Johan Obando-Ceron
Bellemare Marc-Emmanuel
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on sca… (see more)ling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks
Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well und… (see more)erstood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent's network are simultaneously increased. For this purpose, we derive a new family of auxiliary tasks based on the successor measure. These tasks are easy to implement and have appealing theoretical properties. Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)'s proto-value functions to deep reinforcement learning -- accordingly, we call the resulting object proto-value networks. Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment's reward function.
A Kernel Perspective on Behavioural Metrics for Markov Decision Processes
We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We define a ne… (see more)w metric under this lens that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective enables us to provide new theoretical results, including value-function bounds and low-distortion finite-dimensional Euclidean embeddings, which are crucial when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.
The Dormant Neuron Phenomenon in Deep Reinforcement Learning
Ghada Sokar
Utku Evci
In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing n… (see more)umber of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (ReDo) that Recycles Dormant neurons throughout training. Our experiments demonstrate that ReDo maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance.