Pablo Samuel Castro

The Position Dependence of Electron Beam Induced Effects in 2D Materials with Deep Neural Networks

Kevin M Roccapriore

Max Schwarzer

Joshua Greaves

Jesse Farebrother

Riccardo Torsi

Colton Bishop

Igor Mordatch

Ekin Dogus Cubuk

Aaron Courville

Marc Gendron-Bellemare

Joshua Robinson

Sergei V Kalinin

2024-07-01

Microscopy and Microanalysis (publié)

Mixture of Experts in a Mixture of RL settings

Timon Willi

Johan Samir Obando Ceron

Jakob Nicolaus Foerster

Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to … (voir plus)distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's learning capacity and ability to deal with non-stationarity. In this work, we shed more light on MoEs' ability to deal with non-stationarity and investigate MoEs in DRL settings with"amplified"non-stationarity via multi-task training, providing further evidence that MoEs improve learning capacity. In contrast to previous work, our multi-task results allow us to better understand the underlying causes for the beneficial effect of MoE in DRL training, the impact of the various MoE components, and insights into how best to incorporate them in actor-critic-based DRL networks. Finally, we also confirm results from previous work.

2024-06-26

ArXiv (prépublication)

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Johan Samir Obando Ceron

Ghada Sokar

Timon Willi

Clare Lyle

Jesse Farebrother

Jakob Nicolaus Foerster

Doina Precup

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance s… (voir plus)cales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.

2024-05-01

ICML.cc/2024/Conference (spotlight)

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother

Jordi Orbay

Quan Vuong

Adrien Ali Taiga

Yevgen Chebotar

Ted Xiao

Alex Irpan

Sergey Levine

Aleksandra Faust

Aviral Kumar

2024-05-01

ICML.cc/2024/Conference (présentation orale)

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother

Jordi Orbay

Quan Vuong

Adrien Ali Taiga

Yevgen Chebotar

Ted Xiao

Alex Irpan

Sergey Levine

Aleksandra Faust

Aviral Kumar

Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained … (voir plus)using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.

2024-03-06

ArXiv (prépublication)

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother

Jordi Orbay

Quan Vuong

Adrien Ali Taiga

Yevgen Chebotar

Ted Xiao

Alex Irpan

Sergey Levine

Aleksandra Faust

Aviral Kumar

Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained … (voir plus)using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.

2024-03-06

ArXiv (prépublication)

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother

Jordi Orbay

Quan Vuong

Adrien Ali Taiga

Yevgen Chebotar

Ted Xiao

Alex Irpan

Sergey Levine

Aleksandra Faust

Aviral Kumar

Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained … (voir plus)using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.

2024-03-06

ArXiv (prépublication)

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother

Jordi Orbay

Quan Ho Vuong

Adrien Ali Taiga

Yevgen Chebotar

Ted Xiao

A. Irpan

Sergey Levine

Aleksandra Faust

Aviral Kumar

Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained … (voir plus)using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.

2024-03-06

ArXiv (prépublication)

A density estimation perspective on learning from pairwise human preferences

Vincent Dumoulin

Daniel D. Johnson

Hugo Larochelle

Yann Dauphin

Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in tr… (voir plus)aining large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted to maximize the rewards, often under additional regularization constraints. We propose an alternative interpretation which centers on the generative process for pairwise preferences and treats LHF as a density estimation problem. We provide theoretical and empirical results showing that for a family of generative processes defined via preference behavior distribution equations, training a reward function on pairwise preferences effectively models an annotator's implicit preference distribution. Finally, we discuss and present findings on"annotator misspecification"-- failure cases where wrong modeling assumptions are made about annotator behavior, resulting in poorly-adapted models -- suggesting that approaches that learn from pairwise human preferences could have trouble learning from a population of annotators with diverse viewpoints.

2024-02-27

TMLR (accepté)

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Johan Samir Obando Ceron

Ghada Sokar

Timon Willi

Clare Lyle

Jesse Farebrother

Jakob Nicolaus Foerster

Doina Precup

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance s… (voir plus)cales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.

2024-02-13

ArXiv (prépublication)

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Johan Samir Obando Ceron

Ghada Sokar

Timon Willi

Clare Lyle

Jesse Farebrother

Jakob Nicolaus Foerster

Doina Precup

2024-02-13

ArXiv (prépublication)