Portrait of Pierluca D'Oro

Pierluca D'Oro

Affiliate Member
Research Scientist, Meta
Research Topics
AGI (Artificial General Intelligence)
Human-Centered AI
Large Language Models (LLM)
Reinforcement Learning

Publications

Unleashing The Potential of Data Sharing in Ensemble Deep Reinforcement Learning
This work studies a crucial but often overlooked element of ensemble methods in deep reinforcement learning: data sharing between ensemble m… (see more)embers. We show that data sharing enables peer learning, a powerful learning process in which individual agents learn from each other's experience to significantly improve their performance. When given access to the experience of other ensemble members, even the worst agent can match or outperform the previously best agent, triggering a virtuous circle. However, we show that peer learning can be unstable when the agents' ability to learn is impaired due to overtraining on early data. We thus employ the recently proposed solution of periodic resets and show that it ensures effective peer learning. We perform extensive experiments on continuous control tasks from both dense states and pixels to demonstrate the strong effect of peer learning and its interaction with resets.
The Primacy Bias in Deep Reinforcement Learning
This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore usefu… (see more)l evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance.
Long-Term Credit Assignment via Model-based Temporal Shortcuts