Portrait de Amit Sinha n'est pas disponible

Amit Sinha

Doctorat - McGill University
Superviseur⋅e principal⋅e

Publications

Asymmetric Actor-Critic with Approximate Information State
Amit Sinha
Reinforcement learning (RL) for partially observable Markov decision processes (POMDPs) is a challenging problem because decisions need to b… (voir plus)e made based on the entire history of observations and actions. However, in several scenarios, state information is available during the training phase. We are interested in exploiting the availability of this state information during the training phase to efficiently learn a history-based policy using RL. Specifically, we consider actor-critic algorithms, where the actor uses only the history information but the critic uses both history and state. Such algorithms are called asymmetric actor-critic, to highlight the fact that the actor and critic have asymmetric information. Motivated by the recent success of using representation losses in RL for POMDPs [1], we derive similar theoretical results for the asymmetric actor-critic case and evaluate the effectiveness of adding such auxiliary losses in experiments. In particular, we learn a history representation-called an approximate information state (AIS)-and bound the performance loss when acting using AIS.
Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning
Hadi Nekoei
Akilesh Badrinaaraayanan
Amit Sinha
Mohammad Amin Amini
Janarthanan Rajendran
Approximate information state based convergence analysis of recurrent Q-learning
Erfan SeyedSalehi
Nima Akbarzadeh
Amit Sinha
In spite of the large literature on reinforcement learning (RL) algorithms for partially observable Markov decision processes (POMDPs), a co… (voir plus)mplete theoretical understanding is still lacking. In a partially observable setting, the history of data available to the agent increases over time so most practical algorithms either truncate the history to a finite window or compress it using a recurrent neural network leading to an agent state that is non-Markovian. In this paper, it is shown that in spite of the lack of the Markov property, recurrent Q-learning (RQL) converges in the tabular setting. Moreover, it is shown that the quality of the converged limit depends on the quality of the representation which is quantified in terms of what is known as an approximate information state (AIS). Based on this characterization of the approximation error, a variant of RQL with AIS losses is presented. This variant performs better than a strong baseline for RQL that does not use AIS losses. It is demonstrated that there is a strong correlation between the performance of RQL over time and the loss associated with the AIS representation.
Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning
Hadi Nekoei
Akilesh Badrinaaraayanan
Amit Sinha
Mohammad Amin Amini
Janarthanan Rajendran
Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games
Jayakumar Subramanian
Amit Sinha
Staged independent learning: Towards decentralized cooperative multi-agent Reinforcement Learning
Hadi Nekoei
Akilesh Badrinaaraayanan
Amit Sinha
Mohammad Amini
Janarthanan Rajendran
We empirically show that classic ideas from two-time scale stochastic approximation \citep{borkar1997stochastic} can be combined with sequen… (voir plus)tial iterative best response (SIBR) to solve complex cooperative multi-agent reinforcement learning (MARL) problems. We first start with giving a multi-agent estimation problem as a motivating example where SIBR converges while parallel iterative best response (PIBR) does not. Then we present a general implementation of staged multi-agent RL algorithms based on SIBR and multi-time scale stochastic approximation, and show that our new methods which we call Staged Independent Proximal Policy Optimization (SIPPO) and Staged Independent Q-learning (SIQL) outperform state-of-the-art independent learning on almost all the tasks in the epymarl \citep{papoudakis2020benchmarking} benchmark. This can be seen as a first step towards more decentralized MARL methods based on SIBR and multi-time scale learning.
Approximate information state for approximate planning and reinforcement learning in partially observed systems
Jayakumar Subramanian
Amit Sinha
Raihan Seraj
We propose a theoretical framework for approximate planning and learning in partially observed systems. Our framework is based on the fundam… (voir plus)ental notion of information state. We provide two equivalent definitions of information state---i) a function of history which is sufficient to compute the expected reward and predict its next value; ii) equivalently, a function of the history which can be recursively updated and is sufficient to compute the expected reward and predict the next observation. An information state always leads to a dynamic programming decomposition. Our key result is to show that if a function of the history (called approximate information state (AIS)) approximately satisfies the properties of the information state, then there is a corresponding approximate dynamic program. We show that the policy computed using this is approximately optimal with bounded loss of optimality. We show that several approximations in state, observation and action spaces in literature can be viewed as instances of AIS. In some of these cases, we obtain tighter bounds. A salient feature of AIS is that it can be learnt from data. We present AIS based multi-time scale policy gradient algorithms. and detailed numerical experiments with low, moderate and high dimensional environments.
Robustness of Whittle Index Policy to Model Approximation
Amit Sinha
Robustness of Markov perfect equilibrium to model approximations in general-sum dynamic games
Jayakumar Subramanian
Amit Sinha
Dynamic games (also called stochastic games or Markov games) are an important class of games for modeling multi-agent interactions. In many … (voir plus)situations, the dynamics and reward functions of the game are learnt from past data and are therefore approximate. In this paper, we study the robustness of Markov perfect equilibrium to approximations in reward and transition functions. Using approximation results from Markov decision processes, we show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game. We provide explicit bounds on the approximation error in terms of three quantities: (i) the error in approximating the reward functions, (ii) the error in approximating the transition function, and (iii) a property of the value function of the MPE of the approximate game. The second and third quantities depend on the choice of metric on probability spaces. We also present coarser upper bounds which do not depend on the value function but only depend on the properties of the reward and transition functions of the approximate game. We illustrate the results via a numerical example.