Rishabh Agarwal

Membre industriel associé

Professeur associé, McGill University, École d'informatique

Google DeepMind

Sujets de recherche

Apprentissage par renforcement

Apprentissage profond

Grands modèles de langage (LLM)

Site web

Google Scholar

Biographie

Je suis chercheur dans l'équipe DeepMind de Google à Montréal, professeur adjoint à l'Université McGill et membre industriel associé à Mila - Institut québécois d'intelligence artificielle. J'ai réalisé mon doctorat au sein de Mila sous la supervision d'Aaron Courville et Marc Bellemare. Avant cela, j'ai eu l'opportunité de travailler pendant un an avec l'équipe de Geoffrey Hinton chez Google Brain, à Toronto. J'ai obtenu mon diplôme en informatique et en ingénierie à l'IIT Bombay.

Mes recherches se concentrent sur les modèles de langage et l'apprentissage par renforcement profond (RL). J'ai eu l'honneur de recevoir un prix pour un article exceptionnel présenté à NeurIPS.

Étudiants actuels

Morgane Moss

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Aaron Courville

Billets de blogue

Beyond Tabula Rasa: Reincarnating Reinforcement Learning

25 novembre 2022

Au-delà de « Tabula Rasa » : l’apprentissage par renforcement réincarné (« RRL »)

par

Max Schwarzer

Rishabh Agarwal

Lire l'article

Publications

Revisiting Fundamentals of Experience Replay

William Fedus

Prajit Ramachandran

Rishabh Agarwal

Yoshua Bengio

Hugo Larochelle

Mark Rowland

Will Dabney

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understa… (voir plus)nding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and ablative studies upend conventional wisdom around experience replay -- greater capacity is found to substantially increase the performance of certain algorithms, while leaving others unaffected. Counterintuitively we show that theoretically ungrounded, uncorrected n-step returns are uniquely beneficial while other techniques confer limited benefit for sifting through larger memory. Separately, by directly controlling the replay ratio we contextualize previous observations in the literature and empirically measure its importance across a variety of deep RL algorithms. Finally, we conclude by testing a set of hypotheses on the nature of these performance benefits.

2020-11-21

Proceedings of the 37th International Conference on Machine Learning (publié)

proceedings.mlr.press

arxiv.org

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Rishabh Agarwal

Biographie

Étudiants actuels

Billets de blogue

Publications

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Mots-clés populaires:

Rishabh Agarwal

Biographie

Étudiants actuels

Billets de blogue

Publications