Portrait de Jesse Farebrother

Jesse Farebrother

Doctorat - McGill
Superviseur⋅e principal⋅e
Co-supervisor
Sujets de recherche
Apprentissage de représentations
Apprentissage par renforcement
Apprentissage profond

Publications

Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
Joshua Greaves
Ekin Dogus Cubuk
Bellemare Marc-Emmanuel
Sergei Kalinin
Igor Mordatch
Kevin M Roccapriore
We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimu… (voir plus)lated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition probabilities. These learned transition dynamics are then leveraged to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment. Tradit… (voir plus)ionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimizes the reward through repeated RL procedures. This game-solving approach is both computationally expensive and difficult to stabilize. In this work, we propose a novel approach to IRL by direct policy optimization: exploiting a linear factorization of the return as the inner product of successor features and a reward vector, we design an IRL algorithm by policy gradient descent on the gap between the learner and expert features. Our non-adversarial method does not require learning a reward function and can be solved seamlessly with existing actor-critic RL algorithms. Remarkably, our approach works in state-only settings without expert action labels, a setting which behavior cloning (BC) cannot solve. Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks.
CALE: Continuous Arcade Learning Environment
We introduce the Continuous Arcade Learning Environment (CALE), an extension of the well-known Arcade Learning Environment (ALE) [Bellemare … (voir plus)et al., 2013]. The CALE uses the same underlying emulator of the Atari 2600 gaming system (Stella), but adds support for continuous actions. This enables the benchmarking and evaluation of continuous-control agents (such as PPO [Schulman et al., 2017] and SAC [Haarnoja et al., 2018]) and value-based agents (such as DQN [Mnih et al., 2015] and Rainbow [Hessel et al., 2018]) on the same environment suite. We provide a series of open questions and research directions that CALE enables, as well as initial baseline results using Soft Actor-Critic. CALE is available as part of the ALE athttps://github.com/Farama-Foundation/Arcade-Learning-Environment.
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Jordi Orbay
Quan Vuong
Yevgen Chebotar
Ted Xiao
Alex Irpan
Sergey Levine
Aleksandra Faust
Aviral Kumar
Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained … (voir plus)using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.
The Position Dependence of Electron Beam Induced Effects in 2D Materials with Deep Neural Networks
Kevin M. Roccapriore
Joshua Greaves
Riccardo Torsi
Colton Bishop
Igor Mordatch
Ekin D. Cubuk
Bellemare Marc-Emmanuel
Joshua Robinson
Sergei V Kalinin
Revisiting Successor Features for Inverse Reinforcement Learning
A Distributional Analogue to the Successor Representation
Arthur Gretton
Yunhao Tang
Andre Barreto
Will Dabney
Bellemare Marc-Emmanuel
Mark Rowland
This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure … (voir plus)and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Johan Obando-Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jakob Foerster
Karolina Dziugaite
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance s… (voir plus)cales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
Joshua Greaves
Kevin Roccapriore
Ekin Dogus Cubuk
Bellemare Marc-Emmanuel
Sergei Kalinin
Igor Mordatch
We introduce a machine learning approach to determine the transition rates of silicon atoms on a single layer of carbon atoms, when stimulat… (voir plus)ed by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition rates. These rates are then applied to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
Discovering the Electron Beam Induced Transition Rates for Silicon Dopants in Graphene with Deep Neural Networks in the STEM
Kevin M Roccapriore
Joshua Greaves
Colton Bishop
Maxim Ziatdinov
Igor Mordatch
Ekin D Cubuk
Bellemare Marc-Emmanuel
Sergei V Kalinin
Journal Article Discovering the Electron Beam Induced Transition Rates for Silicon Dopants in Graphene with Deep Neural Networks in the STEM… (voir plus) Get access Kevin M Roccapriore, Kevin M Roccapriore Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, United States Search for other works by this author on: Oxford Academic Google Scholar Max Schwarzer, Max Schwarzer Mila - Québec AI Institute, Montréal, QC, CanadaDepartment of Computer Science and Operations Research, Université de Montréal, Montréal, QC, CanadaGoogle Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Joshua Greaves, Joshua Greaves Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Jesse Farebrother, Jesse Farebrother Mila - Québec AI Institute, Montréal, QC, CanadaGoogle Research, Brain TeamSchool of Computer Science, McGill University, Montréal, QC, Canada Search for other works by this author on: Oxford Academic Google Scholar Rishabh Agarwal, Rishabh Agarwal Mila - Québec AI Institute, Montréal, QC, CanadaDepartment of Computer Science and Operations Research, Université de Montréal, Montréal, QC, CanadaGoogle Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Colton Bishop, Colton Bishop Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Maxim Ziatdinov, Maxim Ziatdinov Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, United StatesComputational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States Search for other works by this author on: Oxford Academic Google Scholar Igor Mordatch, Igor Mordatch Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Ekin D Cubuk, Ekin D Cubuk Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Aaron Courville, Aaron Courville Mila - Québec AI Institute, Montréal, QC, CanadaDepartment of Computer Science and Operations Research, Université de Montréal, Montréal, QC, Canada Search for other works by this author on: Oxford Academic Google Scholar ... Show more Pablo Samuel Castro, Pablo Samuel Castro Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Marc G Bellemare, Marc G Bellemare Mila - Québec AI Institute, Montréal, QC, CanadaGoogle Research, Brain TeamSchool of Computer Science, McGill University, Montréal, QC, Canada Search for other works by this author on: Oxford Academic Google Scholar Sergei V Kalinin Sergei V Kalinin Department of Materials Science and Engineering, University of Tennessee, Knoxville TN, United States Corresponding author: sergei2@utk.edu Search for other works by this author on: Oxford Academic Google Scholar Microscopy and Microanalysis, Volume 29, Issue Supplement_1, 1 August 2023, Pages 1932–1933, https://doi.org/10.1093/micmic/ozad067.1000 Published: 22 July 2023
A Novel Stochastic Gradient Descent Algorithm for LearningPrincipal Subspaces
Joshua Greaves
Mark Rowland
Fabian Pedregosa
Bellemare Marc-Emmanuel
In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace i… (voir plus)s represented by a neural network, and hence can bescaled to datasets with an effectively infinite number of rows and columns. Our method consistsin defining a loss function whose minimizer is the desired principal subspace, and constructing agradient estimate of this loss whose bias can be controlled.
Investigating Multi-Task Pretraining and Generalization in Reinforcement Learning
Bellemare Marc-Emmanuel
Google Brain