Portrait de Rishabh Agarwal n'est pas disponible

Rishabh Agarwal

Doctorat - Université de Montréal
Superviseur⋅e principal⋅e
Co-superviseur⋅e

Publications

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Jesse Farebrother
Jordi Orbay
Quan Vuong
Adrien Ali Taiga
Yevgen Chebotar
Ted Xiao
Alex Irpan
Sergey Levine
Aleksandra Faust
Aviral Kumar
Rishabh Agarwal
Many-Shot In-Context Learning
Rishabh Agarwal
Avi Singh
Lei M. Zhang
Bernd Bohnet
Stephanie Chan
Ankesh Anand
Zaheer Abbas
Azade Nova
John D. Co-Reyes
Eric Chu
Feryal M. P. Behbahani
Aleksandra Faust
Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, w… (voir plus)ithout any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated examples. To mitigate this limitation, we explore two new settings: Reinforced and Unsupervised ICL. Reinforced ICL uses model-generated chain-of-thought rationales in place of human examples. Unsupervised ICL removes rationales from the prompt altogether, and prompts the model only with domain-specific questions. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. Finally, we demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases and can learn high-dimensional functions with numerical inputs. Our analysis also reveals the limitations of next-token prediction loss as an indicator of downstream ICL performance.
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Jesse Farebrother
Jordi Orbay
Quan Ho Vuong
Adrien Ali Taiga
Yevgen Chebotar
Ted Xiao
A. Irpan
Sergey Levine
Aleksandra Faust
Aviral Kumar
Rishabh Agarwal
Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained … (voir plus)using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.
V-STaR: Training Verifiers for Self-Taught Reasoners
Arian Hosseini
Xingdi Yuan
Nikolay Malkin
Rishabh Agarwal
Common self-improvement approaches for large language models (LLMs), such as STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on sel… (voir plus)f-generated solutions to improve their problem-solving ability. However, these approaches discard the large amounts of incorrect solutions generated during this process, potentially neglecting valuable information in such solutions. To address this shortcoming, we propose V-STaR that utilizes both the correct and incorrect solutions generated during the self-improvement process to train a verifier using DPO that judges correctness of model-generated solutions. This verifier is used at inference time to select one solution among many candidate solutions. Running V-STaR for multiple iterations results in progressively better reasoners and verifiers, delivering a 4% to 17% test accuracy improvement over existing self-improvement and verification approaches on common code generation and math reasoning benchmarks with LLaMA2 models.
Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
Max Schwarzer
Jesse Farebrother
Joshua Greaves
Ekin Dogus Cubuk
Rishabh Agarwal
Sergei V. Kalinin
Igor Mordatch
Kevin M Roccapriore
We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimu… (voir plus)lated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition probabilities. These learned transition dynamics are then leveraged to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
Max Schwarzer
Jesse Farebrother
Joshua Greaves
Kevin Roccapriore
Ekin Dogus Cubuk
Rishabh Agarwal
Sergei Kalinin
Igor Mordatch
We introduce a machine learning approach to determine the transition rates of silicon atoms on a single layer of carbon atoms, when stimulat… (voir plus)ed by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition rates. These rates are then applied to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
Max Schwarzer
Jesse Farebrother
Joshua Greaves
Kevin Roccapriore
Ekin Dogus Cubuk
Rishabh Agarwal
Sergei Kalinin
Igor Mordatch
We introduce a machine learning approach to determine the transition rates of silicon atoms on a single layer of carbon atoms, when stimulat… (voir plus)ed by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition rates. These rates are then applied to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
Discovering the Electron Beam Induced Transition Rates for Silicon Dopants in Graphene with Deep Neural Networks in the STEM
Kevin M Roccapriore
Max Schwarzer
Joshua Greaves
Jesse Farebrother
Rishabh Agarwal
Colton Bishop
Maxim Ziatdinov
Igor Mordatch
Ekin Dogus Cubuk
Sergei V Kalinin
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Max Schwarzer
Johan Samir Obando Ceron
Rishabh Agarwal
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on sca… (voir plus)ling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
Bootstrapped Representations in Reinforcement Learning
Charline Le Lan
Stephen Tu
Mark Rowland
Anna Harutyunyan
Rishabh Agarwal
Will Dabney
In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of… (voir plus) deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated into the learning process and help shape the learnt state representation. Bootstrapping methods are today's method of choice to make these additional predictions. Yet, it is unclear which features these algorithms capture and how they relate to those from other auxiliary-task-based approaches. In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988). Surprisingly, we find that this representation differs from the features learned by Monte Carlo and residual gradient algorithms for most transition structures of the environment in the policy evaluation setting. We describe the efficacy of these representations for policy evaluation, and use our theoretical analysis to design new auxiliary learning rules. We complement our theoretical results with an empirical comparison of these learning rules for different cumulant functions on classic domains such as the four-room domain (Sutton et al, 1999) and Mountain Car (Moore, 1990).
A Novel Stochastic Gradient Descent Algorithm for LearningPrincipal Subspaces
Charline Le Lan
Joshua Greaves
Jesse Farebrother
Mark Rowland
Fabian Pedregosa
Rishabh Agarwal
In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace i… (voir plus)s represented by a neural network, and hence can bescaled to datasets with an effectively infinite number of rows and columns. Our method consistsin defining a loss function whose minimizer is the desired principal subspace, and constructing agradient estimate of this loss whose bias can be controlled.
Investigating Multi-task Pretraining and Generalization in Reinforcement Learning
Adrien Ali Taiga
Rishabh Agarwal
Jesse Farebrother
Google Brain