Portrait of Aditya Mahajan

Aditya Mahajan

Associate Academic Member
Associate Professor, McGill University, Department of Electrical and Computer Engineering
Research Topics
Reinforcement Learning

Biography

Aditya Mahajan is a professor in the Department of Electrical and Computer Engineering at McGill University and an associate academic member of Mila – Quebec Artificial Intelligence Institute.

He is also a member of the McGill Centre for Intelligent Machines (CIM), the International Laboratory for Learning Systems (ILLS), and the Group for Research in Decision Analysis (GERAD). Mahajan received his BTech degree in electrical engineering from the Indian Institute of Technology Kanpur, and his MSc and PhD degrees in electrical engineering and computer science from the University of Michigan at Ann Arbor.

He is a senior member of the U.S. Institute of Electrical and Electronics Engineers (IEEE), as well as a member of Professional Engineers Ontario. He currently serves as associate editor for IEEE Transactions on Automatic Control, IEEE Control Systems Letters, and Mathematics of Control, Signals, and Systems (Springer). He served as associate editor for the conference editorial board of the IEEE Control Systems Society from 2014 to 2017.

Mahajan’s numerous awards include the 2015 George Axelby Outstanding Paper Award, 2016 NSERC Discovery Accelerator Award, 2014 CDC Best Student Paper Award (as supervisor), and 2016 NecSys Best Student Paper Award (as supervisor). Mahajan’s principal research interests are stochastic control and reinforcement learning.

Current Students

Master's Research - McGill University
Master's Research - McGill University
Research Intern - McGill University
Master's Research - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University

Publications

Mean-field games among teams
Jayakumar Subramanian
Akshat Kumar
Decentralized Linear Quadratic Systems With Major and Minor Agents and Non-Gaussian Noise
Mohammad Afshari
A decentralized linear quadratic system with a major agent and a collection of minor agents is considered. The major agent affects the minor… (see more) agents, but not vice versa. The state of the major agent is observed by all agents. In addition, the minor agents have a noisy observation of their local state. The noise process is not assumed to be Gaussian. The structures of the optimal strategy and the best linear strategy are characterized. It is shown that the major agent's optimal control action is a linear function of the major agent's minimum mean-squared error (MMSE) estimate of the system state while the minor agent's optimal control action is a linear function of the major agent's MMSE estimate of the system state and a “correction term” that depends on the difference of the minor agent's MMSE estimate of its local state and the major agent's MMSE estimate of the minor agent's local state. Since the noise is non-Gaussian, the minor agent's MMSE estimate is a nonlinear function of its observation. It is shown that replacing the minor agent's MMSE estimate with its linear least mean square estimate gives the best linear control strategy. The results are proved using a direct method based on conditional independence, common-information-based splitting of state and control actions, and simplifying the per-step cost based on conditional independence, orthogonality principle, and completion of squares.
Approximate information state based convergence analysis of recurrent Q-learning
Erfan SeyedSalehi
Nima Akbarzadeh
Amit Sinha
In spite of the large literature on reinforcement learning (RL) algorithms for partially observable Markov decision processes (POMDPs), a co… (see more)mplete theoretical understanding is still lacking. In a partially observable setting, the history of data available to the agent increases over time so most practical algorithms either truncate the history to a finite window or compress it using a recurrent neural network leading to an agent state that is non-Markovian. In this paper, it is shown that in spite of the lack of the Markov property, recurrent Q-learning (RQL) converges in the tabular setting. Moreover, it is shown that the quality of the converged limit depends on the quality of the representation which is quantified in terms of what is known as an approximate information state (AIS). Based on this characterization of the approximation error, a variant of RQL with AIS losses is presented. This variant performs better than a strong baseline for RQL that does not use AIS losses. It is demonstrated that there is a strong correlation between the performance of RQL over time and the loss associated with the AIS representation.
Conditions for indexability of restless bandits and an algorithm to compute whittle index – CORRIGENDUM
Nima Akbarzadeh
Scalable Regret for Learning to Control Network-Coupled Subsystems With Unknown Dynamics
Sagar Sudhakara
Ashutosh Nayyar
Yi Ouyang
In this article, we consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems … (see more)connected over a network. Our goal is to minimize and quantify the regret (i.e., loss in performance) of our learning and control strategy with respect to an oracle who knows the system model. Upfront viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the global system results in a regret that increases super-linearly with the number of subsystems. Instead, we propose a new Thompson sampling-based learning algorithm which exploits the structure of the underlying network. We show that the expected regret of the proposed algorithm is bounded by
Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning
Hadi Nekoei
Akilesh Badrinaaraayanan
Amit Sinha
Mohammad Amin Amini
Janarthanan Rajendran
Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games
Jayakumar Subramanian
Amit Sinha
Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Markov Jump Linear Systems
Borna Sayedana
Mohammad Afshari
Peter E. Caines
In this paper, we investigate the problem of system identification for autonomous Markov jump linear systems (MJS) with complete state obser… (see more)vations. We propose switched least squares method for identification of MJS, show that this method is strongly consistent, and derive data-dependent and data-independent rates of convergence. In particular, our data-dependent rate of convergence shows that, almost surely, the system identification error is
A modified Thompson sampling-based learning algorithm for unknown linear systems
Mukul Gagrani
Sagar Sudhakara
Rahul Jain
Ashutosh Nayyar
Yi Ouyang
We revisit the Thompson sampling-based learning algorithm for controlling an unknown linear system with quadratic cost proposed in [1]. This… (see more) algorithm operates in episodes of dynamic length and it is shown to have a regret bound of
Partially observable restless bandits with restarts: indexability and computation of Whittle index
Nima Akbarzadeh
We consider restless bandits with restarts, where the state of the active arms resets according to a known probability distribution while th… (see more)e state of the passive arms evolves in a Markovian manner. We assume that the state of the arm is observed after it is reset but not observed otherwise. We show that the model is indexable and propose an efficient algorithm to compute the Whittle index by exploiting the qualitative properties of the optimal policy. A detailed numerical study of machine repair models shows that Whittle index policy outperforms myopic policy and is close to optimal policy.
Thompson-Sampling Based Reinforcement Learning for Networked Control of Unknown Linear Systems
Borna Sayedana
Mohammad Afshari
Peter E. Caines
In recent years, there has been considerable interest in reinforcement learning for linear quadratic Gaussian (LQG) systems. In this paper, … (see more)we consider a generalization of such systems where the controller and the plant are connected over an unreliable packet drop channel. Packet drops cause the system dynamics to switch between controlled and uncontrolled modes. This switching phenomena introduces new challenges in designing learning algorithms. We identify a sufficient condition under which the regret of Thompson sampling-based reinforcement learning algorithm with dynamic episodes (TSDE) at horizon T is bounded by
Structure-Aware Reinforcement Learning for Node-Overload Protection in Mobile Edge Computing
Anirudha Jitani
Zhongwen Zhu
Hatem Abou-Zeid
Emmanuel Thepie Fapi
Hakimeh Purmehdi
Mobile Edge Computing (MEC) involves placing computational capability and applications at the edge of the network, providing benefits such a… (see more)s reduced latency, reduced network congestion, and improved performance of applications. The performance and reliability of MEC degrades significantly when the edge server(s) in the cluster are overloaded. In this work, an adaptive admission control policy to prevent edge node from getting overloaded is presented. This approach is based on a recently-proposed low complexity RL (Reinforcement Learning) algorithm called SALMUT (Structure-Aware Learning for Multiple Thresholds), which exploits the structure of the optimal admission control policy in multi-class queues for an average-cost setting. We extend the framework to work for node overload-protection problem in a discounted-cost setting. The proposed solution is validated using several scenarios mimicking real-world deployments in two different settings — computer simulations and a docker testbed. Our empirical evaluations show that the total discounted cost incurred by SALMUT is similar to state-of-the-art deep RL algorithms such as PPO (Proximal Policy Optimization) and A2C (Advantage Actor Critic) but requires an order of magnitude less time to train, outputs easily interpretable policy, and can be deployed in an online manner.