Portrait of Aditya Mahajan

Aditya Mahajan

Associate Academic Member
Associate Professor, McGill University, Department of Electrical and Computer Engineering
Research Topics
Reinforcement Learning

Biography

Aditya Mahajan is a professor in the Department of Electrical and Computer Engineering at McGill University and an associate academic member of Mila – Quebec Artificial Intelligence Institute.

He is also a member of the McGill Centre for Intelligent Machines (CIM), the International Laboratory for Learning Systems (ILLS), and the Group for Research in Decision Analysis (GERAD). Mahajan received his BTech degree in electrical engineering from the Indian Institute of Technology Kanpur, and his MSc and PhD degrees in electrical engineering and computer science from the University of Michigan at Ann Arbor.

He is a senior member of the U.S. Institute of Electrical and Electronics Engineers (IEEE), as well as a member of Professional Engineers Ontario. He currently serves as associate editor for IEEE Transactions on Automatic Control, IEEE Control Systems Letters, and Mathematics of Control, Signals, and Systems (Springer). He served as associate editor for the conference editorial board of the IEEE Control Systems Society from 2014 to 2017.

Mahajan’s numerous awards include the 2015 George Axelby Outstanding Paper Award, 2016 NSERC Discovery Accelerator Award, 2014 CDC Best Student Paper Award (as supervisor), and 2016 NecSys Best Student Paper Award (as supervisor). Mahajan’s principal research interests are stochastic control and reinforcement learning.

Current Students

Master's Research - McGill University
Master's Research - McGill University
Research Intern - McGill University
Master's Research - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University

Publications

Structure-Aware Reinforcement Learning for Node-Overload Protection in Mobile Edge Computing
Anirudha Jitani
Zhongwen Zhu
Hatem Abou-Zeid
Emmanuel Thepie Fapi
Hakimeh Purmehdi
Mobile Edge Computing (MEC) involves placing computational capability and applications at the edge of the network, providing benefits such a… (see more)s reduced latency, reduced network congestion, and improved performance of applications. The performance and reliability of MEC degrades significantly when the edge server(s) in the cluster are overloaded. In this work, an adaptive admission control policy to prevent edge node from getting overloaded is presented. This approach is based on a recently-proposed low complexity RL (Reinforcement Learning) algorithm called SALMUT (Structure-Aware Learning for Multiple Thresholds), which exploits the structure of the optimal admission control policy in multi-class queues for an average-cost setting. We extend the framework to work for node overload-protection problem in a discounted-cost setting. The proposed solution is validated using several scenarios mimicking real-world deployments in two different settings — computer simulations and a docker testbed. Our empirical evaluations show that the total discounted cost incurred by SALMUT is similar to state-of-the-art deep RL algorithms such as PPO (Proximal Policy Optimization) and A2C (Advantage Actor Critic) but requires an order of magnitude less time to train, outputs easily interpretable policy, and can be deployed in an online manner.
Maintenance of a collection of machines under partial observability: Indexability and computation of Whittle index
Nima Akbarzadeh
We consider the problem of scheduling maintenance for a collection of machines under partial observations when the state of each machine det… (see more)eriorates stochastically in a Markovian manner. We consider two observational models: first, the state of each machine is not observable at all, and second, the state of each machine is observable only if a service-person visits them. The agent takes a maintenance action, e.g., machine replacement, if he is chosen for the task. We model both problems as restless multi-armed bandit problem and propose the Whittle index policy for scheduling the visits. We show that both models are indexable. For the first model, we derive a closed-form expression for the Whittle index. For the second model, we propose an efficient algorithm to compute the Whittle index by exploiting the qualitative properties of the optimal policy. We present detailed numerical experiments which show that for multiple instances of the model, the Whittle index policy outperforms myopic policy and can be close-to-optimal in different setups.
Intention estimation and controllable behaviour models for traffic merges
Takayasu Kumano
Yuji Yasui
This work focuses on decision making for automated driving vehicles in interaction rich scenarios like traffic merges in a flexibly assertiv… (see more)e yet safe manner. We propose a Q-learning based approach, that takes in active intention inferences as additional inputs besides the directly observed state inputs. The outputs of Q-function are processed to select a decision by a modulation function, which can control how assertively or defensively the agent behaves.
Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Switched Linear Systems
Borna Sayedana
Mohammad Afshari
Peter E. Caines
In this paper, we investigate the problem of system identification for autonomous switched linear systems with complete state observations.… (see more) We propose switched least squares method for the identification for switched linear systems, show that this method is strongly consistent, and derive data-dependent and data-independent rates of convergence. In particular, our data-dependent rate of convergence shows that, almost surely, the system identification error is O (cid:0)(cid:112) log( T ) /T (cid:1) where T is the time horizon. These results show that our method for switched linear systems has the same rate of convergence as least squares method for non-switched linear systems. We compare our results with those in the literature. We present numerical examples to illustrate the performance of the proposed system identification method.
Multi-Agent Estimation and Filtering for Minimizing Team Mean-Squared Error
Mohammad Afshari
Motivated by estimation problems arising in autonomous vehicles and decentralized control of unmanned aerial vehicles, we consider multi-age… (see more)nt estimation and filtering problems in which multiple agents generate state estimates based on decentralized information and the objective is to minimize a coupled mean-squared error which we call team mean-square error. We call the resulting estimates as minimum team mean-squared error (MTMSE) estimates. We show that MTMSE estimates are different from minimum mean-squared error (MMSE) estimates. We derive closed-form expressions for MTMSE estimates, which are linear function of the observations where the corresponding gain depends on the weight matrix that couples the estimation error. We then consider a filtering problem where a linear stochastic process is monitored by multiple agents which can share their observations (with delay) over a communication graph. We derive expressions to recursively compute the MTMSE estimates. To illustrate the effectiveness of the proposed scheme we consider an example of estimating the distances between vehicles in a platoon and show that MTMSE estimates significantly outperform MMSE estimates and consensus Kalman filtering estimates.
A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems
Mukul Gagrani
Sagar Sudhakara
Ashutosh Nayyar
Yi Ouyang
—We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al. [1]. The… (see more) regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modified algorithm has the same Bayesian regret of ˜ O ( √ T ) , where T is the time-horizon and the ˜ O ( · ) notation hides logarithmic terms in T .
Team Optimal Control of Coupled Major-Minor Subsystems with Mean-Field Sharing
Jalal Arabneydi
Approximate Planning and Learning for Partially Observed Systems
Optimal Control of Network-Coupled Subsystems: Spectral Decomposition and Low-Dimensional Solutions
Shuang Gao
In this article, we investigate the optimal control of network-coupled subsystems with coupled dynamics and costs. The dynamics coupling may… (see more) be represented by the adjacency matrix, the Laplacian matrix, or any other symmetric matrix corresponding to an underlying weighted undirected graph. Cost couplings are represented by two coupling matrices which have the same eigenvectors as the coupling matrix in the dynamics. We use the spectral decomposition of these three coupling matrices to decompose the overall system into
Optimal Local and Remote Controllers With Unreliable Uplink Channels: An Elementary Proof
Mohammad Afshari
Recently, a model of a decentralized control system with local and remote controllers connected over unreliable channels was presented in [… (see more)1]. The model has a nonclassical information structure that is not partially nested. Nonetheless, it is shown in [1] that the optimal control strategies are linear functions of the state estimate (which is a nonlinear function of the observations). Their proof is based on a fairly sophisticated dynamic programming argument. In this article, we present an alternative and elementary proof of the result which uses common information-based conditional independence and completion of squares.
Renewal Monte Carlo: Renewal Theory-Based Reinforcement Learning
Jayakumar Subramanian
An online reinforcement learning algorithm called renewal Monte Carlo (RMC) is presented. RMC works for infinite horizon Markov decision pro… (see more)cesses with a designated start state. RMC is a Monte Carlo algorithm that retains the key advantages of Monte Carlo—viz., simplicity, ease of implementation, and low bias—while circumventing the main drawbacks of Monte Carlo—viz., high variance and delayed updates. Given a parameterized policy
Counterexamples on the Monotonicity of Delay Optimal Strategies for Energy Harvesting Transmitters
Borna Sayedana
We consider cross-layer design of delay optimal transmission strategies for energy harvesting transmitters where the data and energy arrival… (see more) processes are stochastic. Using Markov decision theory, we show that the value function is weakly increasing in the queue state and weakly decreasing in the battery state. It is natural to expect that the delay optimal policy should be weakly increasing in the queue and battery states. We show via counterexamples that this is not the case. In fact, we show that for some sample scenarios the delay optimal policy may perform 5–13% better than the best monotone policy.