Pierluca D'Oro

Affiliate Member

Research Scientist, Meta

Research Topics

AGI (Artificial General Intelligence)

Human-Centered AI

Large Language Models (LLM)

Reinforcement Learning

Website

Google Scholar

Blog Posts

October 24, 2023

Motif: Intrinsic Motivation from Artificial Intelligence Feedback

Pierluca D'Oro

Martin Klissarov

Read the article

July 13, 2022

The Primacy Bias in Deep Reinforcement Learning

Pierluca D'Oro

Evgenii Nikishin

Read the article

Publications

Unleashing The Potential of Data Sharing in Ensemble Deep Reinforcement Learning

This work studies a crucial but often overlooked element of ensemble methods in deep reinforcement learning: data sharing between ensemble m… (see more)embers. We show that data sharing enables peer learning, a powerful learning process in which individual agents learn from each other's experience to significantly improve their performance. When given access to the experience of other ensemble members, even the worst agent can match or outperform the previously best agent, triggering a virtuous circle. However, we show that peer learning can be unstable when the agents' ability to learn is impaired due to overtraining on early data. We thus employ the recently proposed solution of periodic resets and show that it ensures effective peer learning. We perform extensive experiments on continuous control tasks from both dense states and pixels to demonstrate the strong effect of peer learning and its interaction with resets.

2022-12-08

NeurIPS.cc/2022/Workshop/DeepRL (unknown)

openreview.net

The Primacy Bias in Deep Reinforcement Learning

This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore usefu… (see more)l evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance.

2022-06-27

Proceedings of the 39th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Long-Term Credit Assignment via Model-based Temporal Shortcuts

2021-12-12

NeurIPS.cc/2021/Workshop/DeepRL (unknown)

openreview.net