Mohammad Amini

Dealing with Non-Stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios wh… (see more)ere centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.

2023-11-19

Conference on Lifelong Learning Agents (published)

doi.org

proceedings.mlr.press

PatchUp: A Feature-Space Block-Level Regularization Technique for Convolutional Neural Networks

Mojtaba Faramarzi

Mohammad Amini

Akilesh Badrinaaraayanan

Vikas Verma

Sarath Chandar

Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data… (see more). A recent class of methods to address this problem uses various ways to construct a new training sample by mixing a pair (or more) of training samples. We propose PatchUp, a hidden state block-level regularization technique for Convolutional Neural Networks (CNNs), that is applied on selected contiguous blocks of feature maps from a random pair of samples. Our approach improves the robustness of CNN models against the manifold intrusion problem that may occur in other state-of-the-art mixing approaches. Moreover, since we are mixing the contiguous block of features in the hidden space, which has more dimensions than the input space, we obtain more diverse samples for training towards different dimensions. Our experiments on CIFAR10/100, SVHN, Tiny-ImageNet, and ImageNet using ResNet architectures including PreActResnet18/34, WRN-28-10, ResNet101/152 models show that PatchUp improves upon, or equals, the performance of current state-of-the-art regularizers for CNNs. We also show that PatchUp can provide a better generalization to deformed samples and is more robust against adversarial attacks.

2022-06-27

AAAI Conference on Artificial Intelligence (published)

doi.org

arxiv.org

Staged independent learning: Towards decentralized cooperative multi-agent Reinforcement Learning

Hadi Nekoei

Akilesh Badrinaaraayanan

Amit Sinha

Mohammad Amini

Janarthanan Rajendran

Aditya Mahajan

A. Chandar

We empirically show that classic ideas from two-time scale stochastic approximation \citep{borkar1997stochastic} can be combined with sequen… (see more)tial iterative best response (SIBR) to solve complex cooperative multi-agent reinforcement learning (MARL) problems. We first start with giving a multi-agent estimation problem as a motivating example where SIBR converges while parallel iterative best response (PIBR) does not. Then we present a general implementation of staged multi-agent RL algorithms based on SIBR and multi-time scale stochastic approximation, and show that our new methods which we call Staged Independent Proximal Policy Optimization (SIPPO) and Staged Independent Q-learning (SIQL) outperform state-of-the-art independent learning on almost all the tasks in the epymarl \citep{papoudakis2020benchmarking} benchmark. This can be seen as a first step towards more decentralized MARL methods based on SIBR and multi-time scale learning.

2022-04-24

ICLR.cc/2022/Workshop/GMS (accepted)

openreview.net

Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers

Amir Ardalan Kalantari

Mohammad Amini

Sarath Chandar

Doina Precup

Much of recent Deep Reinforcement Learning success is owed to the neural architecture's potential to learn and use effective internal repres… (see more)entations of the world. While many current algorithms access a simulator to train with a large amount of data, in realistic settings, including while playing games that may be played against people, collecting experience can be quite costly. In this paper, we introduce a deep reinforcement learning architecture whose purpose is to increase sample efficiency without sacrificing performance. We design this architecture by incorporating advances achieved in recent years in the field of Natural Language Processing and Computer Vision. Specifically, we propose a visually attentive model that uses transformers to learn a self-attention mechanism on the feature maps of the state representation, while simultaneously optimizing return. We demonstrate empirically that this architecture improves sample complexity for several Atari environments, while also achieving better performance in some of the games.

2022-01-31

arXiv (preprint)

doi.org

arxiv.org

Towered Actor Critic For Handling Multiple Action Types In Reinforcement Learning For Drug Discovery

Sai Krishna Gottipati

Yashaswi Pathak

Boris Sattarov

Sahir

Rohan Nuttall

Mohammad Amini

Matthew E. Taylor

Sarath Chandar

Reinforcement learning (RL) has made significant progress in both abstract and real-world domains, but the majority of state-of-the-art algo… (see more)rithms deal only with monotonic actions. However, some applications require agents to reason over different types of actions. Our application simulates reaction-based molecule generation, used as part of the drug discovery pipeline, and includes both uni-molecular and bi-molecular reactions. This paper introduces a novel framework, towered actor critic (TAC), to handle multiple action types. The TAC framework is general in that it is designed to be combined with any existing RL algorithms for continuous action space. We combine it with TD3 to empirically obtain significantly better results than existing methods in the drug discovery setting. TAC is also applied to RL benchmarks in OpenAI Gym and results show that our framework can improve, or at least does not hurt, performance relative to standard TD3.

2021-05-17

AAAI Conference on Artificial Intelligence (published)

doi.org

AI Policy Compass

AI Policy Fellowship Publications

Mila Ventures Launchpad

Mohammad Amini

Publications

AI Policy Compass

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Mohammad Amini

Publications