Portrait of Nima Akbarzadeh is unavailable

Nima Akbarzadeh

Postdoctorate - HEC Montréal
Supervisor
Research Topics
Reinforcement Learning

Publications

Fair Resource Allocation in Weakly Coupled Markov Decision Processes
We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where r… (see more)esource constraints couple the action spaces of
Planning and Learning in Risk-Aware Restless Multi-Arm Bandits
Yossiri Adulyasak
Fair Resource Allocation in Weakly Coupled Markov Decision Processes
We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where r… (see more)esource constraints couple the action spaces of
Planning and Learning in Risk-Aware Restless Multi-Arm Bandits
Yossiri Adulyasak
In restless multi-arm bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms), with e… (see more)ach arm being a Markov decision process. In this work, we generalize the traditional restless multi-arm bandit problem with a risk-neutral objective by incorporating risk-awareness. We establish indexability conditions for the case of a risk-aware objective and provide a solution based on Whittle index. In addition, we address the learning problem when the true transition probabilities are unknown by proposing a Thompson sampling approach and show that it achieves bounded regret that scales sublinearly with the number of episodes and quadratically with the number of arms. The efficacy of our method in reducing risk exposure in restless multi-arm bandits is illustrated through a set of numerical experiments in the contexts of machine replacement and patient scheduling applications under both planning and learning setups.
On learning Whittle index policy for restless bandits with scalable regret
Reinforcement learning is an attractive approach to learn good resource allocation and scheduling policies based on data when the system mod… (see more)el is unknown. However, the cumulative regret of most RL algorithms scales as ˜ O(S
Two Families of Indexable Partially Observable Restless Bandits and Whittle Index Computation
Approximate information state based convergence analysis of recurrent Q-learning
In spite of the large literature on reinforcement learning (RL) algorithms for partially observable Markov decision processes (POMDPs), a co… (see more)mplete theoretical understanding is still lacking. In a partially observable setting, the history of data available to the agent increases over time so most practical algorithms either truncate the history to a finite window or compress it using a recurrent neural network leading to an agent state that is non-Markovian. In this paper, it is shown that in spite of the lack of the Markov property, recurrent Q-learning (RQL) converges in the tabular setting. Moreover, it is shown that the quality of the converged limit depends on the quality of the representation which is quantified in terms of what is known as an approximate information state (AIS). Based on this characterization of the approximation error, a variant of RQL with AIS losses is presented. This variant performs better than a strong baseline for RQL that does not use AIS losses. It is demonstrated that there is a strong correlation between the performance of RQL over time and the loss associated with the AIS representation.
Conditions for indexability of restless bandits and an algorithm to compute whittle index – CORRIGENDUM
Partially observable restless bandits with restarts: indexability and computation of Whittle index
We consider restless bandits with restarts, where the state of the active arms resets according to a known probability distribution while th… (see more)e state of the passive arms evolves in a Markovian manner. We assume that the state of the arm is observed after it is reset but not observed otherwise. We show that the model is indexable and propose an efficient algorithm to compute the Whittle index by exploiting the qualitative properties of the optimal policy. A detailed numerical study of machine repair models shows that Whittle index policy outperforms myopic policy and is close to optimal policy.
Conditions for indexability of restless bandits and an
$\mathcal{O}\!\left(K^3\right)$
algorithm to compute Whittle index
Abstract Restless bandits are a class of sequential resource allocation problems concerned with allocating one or more resources among sever… (see more)al alternative processes where the evolution of the processes depends on the resources allocated to them. Such models capture the fundamental trade-offs between exploration and exploitation. In 1988, Whittle developed an index heuristic for restless bandit problems which has emerged as a popular solution approach because of its simplicity and strong empirical performance. The Whittle index heuristic is applicable if the model satisfies a technical condition known as indexability. In this paper, we present two general sufficient conditions for indexability and identify simpler-to-verify refinements of these conditions. We then revisit a previously proposed algorithm called the adaptive greedy algorithm which is known to compute the Whittle index for a sub-class of restless bandits. We show that a generalization of the adaptive greedy algorithm computes the Whittle index for all indexable restless bandits. We present an efficient implementation of this algorithm which can compute the Whittle index of a restless bandit with K states in
On learning Whittle index policy for restless bandits with scalable regret
Reinforcement learning is an attractive approach to learn good resource allocation and scheduling policies based on data when the system mod… (see more)el is unknown. However, the cumulative regret of most RL algorithms scales as
Scalable Operator Allocation for Multirobot Assistance: A Restless Bandit Approach
Abhinav Dahiya
Stephen L. Smith
In this article, we consider the problem of allocating human operators in a system with multiple semiautonomous robots. Each robot is requir… (see more)ed to perform an independent sequence of tasks, subject to a chance of failing and getting stuck in a fault state at every task. If and when required, a human operator can assist or teleoperate a robot. Conventional dynamic programming-based techniques used to solve such problems face scalability issues due to an exponential growth of state and action spaces with the number of robots and operators. In this article, we derive conditions under which the operator allocation problem satisfies a technical condition called indexability, thereby enabling the use of the Whittle index heuristic. The conditions are easy to check, and we show that they hold for a wide range of problems of interest. Our key insight is to leverage the structure of the value function of individual robots, resulting in conditions that can be verified separately for each state of each robot. We apply these conditions to two types of transitions commonly seen in remote robot supervision systems. Through numerical simulations, we demonstrate the efficacy of Whittle index policy as a near-optimal and scalable approach that outperforms existing scalable methods.