Portrait de walter Mayor

walter Mayor

Collaborateur·rice de recherche
Superviseur⋅e principal⋅e
Sujets de recherche
Apprentissage par renforcement
Apprentissage profond
Traitement du langage naturel
Vision par ordinateur

Publications

Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents
Recent works have proposed accelerating the wall-clock training time of actor-critic methods via the use of large-scale environment parallel… (voir plus)ization; unfortunately, these can sometimes still require large number of environment interactions to achieve a desired level of performance. Noting that well-structured representations can improve the generalization and sample efficiency of deep reinforcement learning (RL) agents, we propose the use of simplicial embeddings: lightweight representation layers that constrain embeddings to simplicial structures. This geometric inductive bias results in sparse and discrete features that stabilize critic bootstrapping and strengthen policy gradients. When applied to FastTD3, FastSAC, and PPO, simplicial embeddings consistently improve sample efficiency and final performance across a variety of continuous- and discrete-control environments, without any loss in runtime speed.
Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents
Recent works have proposed accelerating the wall-clock training time of actor-critic methods via the use of large-scale environment parallel… (voir plus)ization; unfortunately, these can sometimes still require large number of environment interactions to achieve a desired level of performance. Noting that well-structured representations can improve the generalization and sample efficiency of deep reinforcement learning (RL) agents, we propose the use of simplicial embeddings: lightweight representation layers that constrain embeddings to simplicial structures. This geometric inductive bias results in sparse and discrete features that stabilize critic bootstrapping and strengthen policy gradients. When applied to FastTD3, FastSAC, and PPO, simplicial embeddings consistently improve sample efficiency and final performance across a variety of continuous- and discrete-control environments, without any loss in runtime speed.
The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks
The use of parallel actors for data collection has been an effective technique used in reinforcement learning (RL) algorithms. The manner in… (voir plus) which data is collected in these algorithms, controlled via the number of parallel environments and the rollout length, induces a form of bias-variance trade-off; the number of training passes over the collected data, on the other hand, must strike a balance between sample efficiency and overfitting. We conduct an empirical analysis of these trade-offs on PPO, one of the most popular RL algorithms that uses parallel actors, and establish connections to network plasticity and, more generally, optimization stability. We examine its impact on network architectures, as well as the hyper-parameter sensitivity when scaling data. Our analyses indicate that larger dataset sizes can increase final performance across a variety of settings, and that scaling parallel environments is more effective than increasing rollout lengths. These findings highlight the critical role of data collection strategies in improving agent performance.
The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks
The use of parallel actors for data collection has been an effective technique used in reinforcement learning (RL) algorithms. The manner in… (voir plus) which data is collected in these algorithms, controlled via the number of parallel environments and the rollout length, induces a form of bias-variance trade-off; the number of training passes over the collected data, on the other hand, must strike a balance between sample efficiency and overfitting. We conduct an empirical analysis of these trade-offs on PPO, one of the most popular RL algorithms that uses parallel actors, and establish connections to network plasticity and, more generally, optimization stability. We examine its impact on network architectures, as well as the hyper-parameter sensitivity when scaling data. Our analyses indicate that larger dataset sizes can increase final performance across a variety of settings, and that scaling parallel environments is more effective than increasing rollout lengths. These findings highlight the critical role of data collection strategies in improving agent performance.