Akilesh Badrinaaraayanan

Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

Mohammad Amin Amini

Janarthanan Rajendran

Aditya Mahajan

Sarath Chandar

2023-11-20

Proceedings of The 2nd Conference on Lifelong Learning Agents (publié)

doi.org

arxiv.org

Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

Hadi Nekoei

Akilesh Badrinaaraayanan

Amit Sinha

Mohammad Amin Amini

Janarthanan Rajendran

Aditya Mahajan

Sarath Chandar

2023-02-06

ArXiv (prépublication)

doi.org

arxiv.org

Staged independent learning: Towards decentralized cooperative multi-agent Reinforcement Learning

Hadi Nekoei

Akilesh Badrinaaraayanan

Amit Sinha

Mohammad Amini

Janarthanan Rajendran

Aditya Mahajan

Sarath Chandar

We empirically show that classic ideas from two-time scale stochastic approximation \citep{borkar1997stochastic} can be combined with sequen… (voir plus)tial iterative best response (SIBR) to solve complex cooperative multi-agent reinforcement learning (MARL) problems. We first start with giving a multi-agent estimation problem as a motivating example where SIBR converges while parallel iterative best response (PIBR) does not. Then we present a general implementation of staged multi-agent RL algorithms based on SIBR and multi-time scale stochastic approximation, and show that our new methods which we call Staged Independent Proximal Policy Optimization (SIPPO) and Staged Independent Q-learning (SIQL) outperform state-of-the-art independent learning on almost all the tasks in the epymarl \citep{papoudakis2020benchmarking} benchmark. This can be seen as a first step towards more decentralized MARL methods based on SIBR and multi-time scale learning.

2022-04-25

ICLR.cc/2022/Workshop/GMS (publié)

openreview.net

Continuous Coordination As a Realistic Scenario for Lifelong Learning

Hadi Nekoei

Akilesh Badrinaaraayanan

Aaron Courville

Sarath Chandar

Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. L… (voir plus)ifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of LLL algorithms difficult. Multi-agent RL (MARL), on the other hand, can be seen as a natural scenario for lifelong RL due to its inherent non-stationarity, since the agents' policies change over time. In this work, we introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on Hanabi -- a partially-observable, fully cooperative multi-agent game that has been shown to be challenging for zero-shot coordination. Its large strategy space makes it a desirable environment for lifelong RL tasks. We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation regimes to shed light on their strengths and weaknesses. This continual learning paradigm also provides us with a pragmatic way of going beyond centralized training which is the most commonly used training protocol in MARL. We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works. The code and all pre-trained models are available at https://github.com/chandar-lab/Lifelong-Hanabi.

2021-07-01

Proceedings of the 38th International Conference on Machine Learning (publié)

proceedings.mlr.press

arxiv.org

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Akilesh Badrinaaraayanan

Publications

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Mots-clés populaires:

Akilesh Badrinaaraayanan

Publications