Portrait of Rishabh Agarwal

Rishabh Agarwal

Associate Industry Member
Adjunct Professor, McGill University, School of Computer Science
Google DeepMind
Research Topics
Deep Learning
Large Language Models (LLM)
Reinforcement Learning

Biography

I am a research scientist in the Google DeepMind Team in Montréal. I am also an Adjunct Professor at McGill University and an Associate Industry Member at Mila - Quebec Artificial Intelligence Institute. I finished my PhD at Mila under the guidance of Aaron Courville and Marc Bellemare. Previously, I spent a year at Geoffrey Hinton's amazing team in Google Brain, Toronto. Earlier, I graduated in Computer Science and Engineering from IIT Bombay.

My research work mainly revolves around language models and deep reinforcement learning (RL), and includes an outstanding paper award at NeurIPS.

Current Students

PhD - Université de Montréal
Principal supervisor :

Publications

Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
Max Schwarzer
Jesse Farebrother
Joshua Greaves
Kevin Roccapriore
Ekin Dogus Cubuk
Sergei Kalinin
Igor Mordatch
We introduce a machine learning approach to determine the transition rates of silicon atoms on a single layer of carbon atoms, when stimulat… (see more)ed by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition rates. These rates are then applied to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
Max Schwarzer
Jesse Farebrother
Joshua Greaves
Kevin Roccapriore
Ekin Dogus Cubuk
Sergei Kalinin
Igor Mordatch
We introduce a machine learning approach to determine the transition rates of silicon atoms on a single layer of carbon atoms, when stimulat… (see more)ed by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition rates. These rates are then applied to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
Discovering the Electron Beam Induced Transition Rates for Silicon Dopants in Graphene with Deep Neural Networks in the STEM
Kevin M Roccapriore
Max Schwarzer
Joshua Greaves
Jesse Farebrother
Colton Bishop
Maxim Ziatdinov
Igor Mordatch
Ekin Dogus Cubuk
Sergei V Kalinin
Bigger, Better, Faster: Human-level Atari with human-level efficiency
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on sca… (see more)ling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
Bootstrapped Representations in Reinforcement Learning
Charline Le Lan
Stephen Tu
Mark Rowland
Anna Harutyunyan
Will Dabney
In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of… (see more) deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated into the learning process and help shape the learnt state representation. Bootstrapping methods are today's method of choice to make these additional predictions. Yet, it is unclear which features these algorithms capture and how they relate to those from other auxiliary-task-based approaches. In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988). Surprisingly, we find that this representation differs from the features learned by Monte Carlo and residual gradient algorithms for most transition structures of the environment in the policy evaluation setting. We describe the efficacy of these representations for policy evaluation, and use our theoretical analysis to design new auxiliary learning rules. We complement our theoretical results with an empirical comparison of these learning rules for different cumulant functions on classic domains such as the four-room domain (Sutton et al, 1999) and Mountain Car (Moore, 1990).
A Novel Stochastic Gradient Descent Algorithm for LearningPrincipal Subspaces
Charline Le Lan
Joshua Greaves
Jesse Farebrother
Mark Rowland
Fabian Pedregosa
In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace i… (see more)s represented by a neural network, and hence can bescaled to datasets with an effectively infinite number of rows and columns. Our method consistsin defining a loss function whose minimizer is the desired principal subspace, and constructing agradient estimate of this loss whose bias can be controlled.
Investigating Multi-task Pretraining and Generalization in Reinforcement Learning
Adrien Ali Taiga
Jesse Farebrother
Google Brain
Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks
Jesse Farebrother
Joshua Greaves
Charline Le Lan
Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well-und… (see more)erstood; in practice, how-ever, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent’s network are simultaneously increased. For this purpose, we derive a new family of auxiliary tasks based on the successor measure. These tasks are easy to implement and have appealing theoretical properties. Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)’s proto-value functions to deep reinforcement learning – accordingly, we call the resulting object proto-value networks. Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment’s reward function.
Bigger, Better, Faster: Human-level Atari with human-level efficiency
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on sca… (see more)ling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
The Dormant Neuron Phenomenon in Deep Reinforcement Learning
Ghada Sokar
Utku Evci
In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing n… (see more)umber of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (ReDo) that Recycles Dormant neurons throughout training. Our experiments demonstrate that ReDo maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance.
DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization
Aviral Kumar
Tengyu Ma
George Tucker
Sergey Levine
Despite overparameterization, deep networks trained via supervised learning are surprisingly easy to optimize and exhibit excellent generali… (see more)zation. One hypothesis to explain this is that overparameterized deep networks enjoy the benefits of implicit regularization induced by stochastic gradient descent, which favors parsimonious solutions that generalize well on test inputs. It is reasonable to surmise that deep reinforcement learning (RL) methods could also benefit from this effect. In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations. Our theoretical analysis shows that when existing models of implicit regularization are applied to temporal difference learning, the resulting derived regularizer favors degenerate solutions with excessive aliasing, in stark contrast to the supervised learning case. We back up these findings empirically, showing that feature representations learned by a deep network value function trained via bootstrapping can indeed become degenerate, aliasing the representations for state-action pairs that appear on either side of the Bellman backup. To address this issue, we derive the form of this implicit regularizer and, inspired by this derivation, propose a simple and effective explicit regularizer, called DR3, that counteracts the undesirable effects of this implicit regularizer. When combined with existing offline RL methods, DR3 substantially improves performance and stability, alleviating unlearning in Atari 2600 games, D4RL domains, and robotic manipulation from images.
Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress