Portrait of Anirudha Jitani is unavailable

Anirudha Jitani

Alumni

Publications

Structure-Aware Reinforcement Learning for Node-Overload Protection in Mobile Edge Computing
Zhongwen Zhu
Hatem Abou-Zeid
Emmanuel Thepie Fapi
Hakimeh Purmehdi
Mobile Edge Computing (MEC) involves placing computational capability and applications at the edge of the network, providing benefits such a… (see more)s reduced latency, reduced network congestion, and improved performance of applications. The performance and reliability of MEC degrades significantly when the edge server(s) in the cluster are overloaded. In this work, an adaptive admission control policy to prevent edge node from getting overloaded is presented. This approach is based on a recently-proposed low complexity RL (Reinforcement Learning) algorithm called SALMUT (Structure-Aware Learning for Multiple Thresholds), which exploits the structure of the optimal admission control policy in multi-class queues for an average-cost setting. We extend the framework to work for node overload-protection problem in a discounted-cost setting. The proposed solution is validated using several scenarios mimicking real-world deployments in two different settings — computer simulations and a docker testbed. Our empirical evaluations show that the total discounted cost incurred by SALMUT is similar to state-of-the-art deep RL algorithms such as PPO (Proximal Policy Optimization) and A2C (Advantage Actor Critic) but requires an order of magnitude less time to train, outputs easily interpretable policy, and can be deployed in an online manner.
Structure-Aware Reinforcement Learning for Node-Overload Protection in Mobile Edge Computing
Zhongwen Zhu
Hatem Abou-Zeid
Emmanuel Thepie Fapi
Hakimeh Purmehdi
Mobile Edge Computing (MEC) involves placing computational capability and applications at the edge of the network, providing benefits such a… (see more)s reduced latency, reduced network congestion, and improved performance of applications. The performance and reliability of MEC degrades significantly when the edge server(s) in the cluster are overloaded. In this work, an adaptive admission control policy to prevent edge node from getting overloaded is presented. This approach is based on a recently-proposed low complexity RL (Reinforcement Learning) algorithm called SALMUT (Structure-Aware Learning for Multiple Thresholds), which exploits the structure of the optimal admission control policy in multi-class queues for an average-cost setting. We extend the framework to work for node overload-protection problem in a discounted-cost setting. The proposed solution is validated using several scenarios mimicking real-world deployments in two different settings — computer simulations and a docker testbed. Our empirical evaluations show that the total discounted cost incurred by SALMUT is similar to state-of-the-art deep RL algorithms such as PPO (Proximal Policy Optimization) and A2C (Advantage Actor Critic) but requires an order of magnitude less time to train, outputs easily interpretable policy, and can be deployed in an online manner.
Reward Redistribution Mechanisms in Multi-agent Reinforcement Learning
Aly Ibrahim
Piracha
Daoud
In typical Multi-Agent Reinforcement Learning (MARL) settings, each agent acts to maximize its individual reward objective. However, for col… (see more)lective social welfare maximization, some agents may need to act non-selfishly. We propose a reward shaping mechanism using extrinsic motivation for achieving modularity and increased cooperation among agents in Sequential Social Dilemma (SSD) problems. Our mechanism, inspired by capitalism, provides extrinsic motivation to agents by redistributing a portion of collected re-wards based on each agent’s individual contribution towards team rewards. We demonstrate empirically that this mechanism leads to higher collective welfare relative to existing baselines. Furthermore, this reduces free rider issues and leads to more diverse policies. We evaluate our proposed mechanism for already specialised agents that are pre-trained for specific roles. We show that our mechanism, in the most challenging CleanUp environment, significantly out-performs two baselines (based roughly on socialism and anarchy) and accumulates 2-3 times higher rewards in an easier setting of the environment.