Portrait of Homayoun Honari

Homayoun Honari

Collaborating researcher - Université de Montréal
Supervisor
Research Topics
Reinforcement Learning
Representation Learning
Robotics

Publications

Training PPO-Clip with Parallelized Data Generation: A Case of Fixed-Point Convergence
In recent years, with the increase in the compute power of GPUs, parallelized data collection has become the dominant approach for training … (see more)reinforcement learning (RL) agents. Proximal Policy Optimization (PPO) is one of the widely-used on-policy methods for training RL agents. In this paper, we focus on the training behavior of PPO-Clip with the increase in the number of parallel environments. In particular, we show that as we increase the amount of data used to train PPO-Clip, the optimized policy would converge to a fixed distribution. We use the results to study the behavior of PPO-Clip in two case studies: the effect of change in the minibatch size and the effect of increase in the number of parallel environments versus the increase in the rollout lengths. The experiments show that settings with high-return PPO runs result in slower convergence to the fixed-distribution and higher consecutive KL divergence changes. Our results aim to offer a better understanding for the prediction of the performance of PPO with the scaling of the parallel environments.