Pierluca D'Oro

Membre affilié

Chercheur scientifique, Meta

Sujets de recherche

Apprentissage par renforcement

Grands modèles de langage (LLM)

IA centrée sur l'humain

IAG (Intelligence Artificielle Générale)

Site web

Google Scholar

Billets de blogue

Motif: Intrinsic Motivation from Artificial Intelligence Feedback

24 octobre 2023

Motif : Motivation intrinsèque à partir de la rétroaction de l’intelligence artificielle

par

Pierluca D'Oro

Martin Klissarov

Lire l'article

The Primacy Bias in Deep Reinforcement Learning

13 juillet 2022

Le biais de primauté dans l’apprentissage par renforcement profond

par

Pierluca D'Oro

Evgenii Nikishin

Lire l'article

Publications

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Nathan Rahn

Pierluca D'Oro

Harley Wiltzer

Pierre-Luc Bacon

Marc Gendron-Bellemare

Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In th… (voir plus)is work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.

openreview.net

Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier

Marc Gendron-Bellemare

Aaron Courville

Increasing the replay ratio, the number of updates of an agent's parameters per environment interaction, is an appealing strategy for improv… (voir plus)ing the sample efficiency of deep reinforcement learning algorithms. In this work, we show that fully or partially resetting the parameters of deep reinforcement learning agents causes better replay ratio scaling capabilities to emerge. We push the limits of the sample efficiency of carefully-modified algorithms by training them using an order of magnitude more updates than usual, significantly improving their performance in the Atari 100k and DeepMind Control Suite benchmarks. We then provide an analysis of the design choices required for favorable replay ratio scaling to be possible and discuss inherent limits and tradeoffs.

2023-02-01

ICLR.cc/2023/Conference (notable)

openreview.net