Portrait de Aditya Mahajan

Aditya Mahajan

Membre académique associé
Professeur agrégé, McGill University, Département de génie électrique et informatique
Sujets de recherche
Apprentissage par renforcement

Biographie

Aditya Mahajan est professeur de génie électrique et informatique à l'Université McGill. Il est membre du Centre sur les machines intelligentes (CIM) de McGill, de Mila – Institut québécois d’intelligence artificielle, du Laboratoire international des systèmes d'apprentissage (ILLS) et du Groupe d'études et de recherche en analyse des décisions (GERAD). Il est titulaire d'une licence en génie électrique de l'Indian Institute of Technology de Kanpur (Inde), ainsi que d'une maîtrise et d'un doctorat en génie électrique et en informatique de l'Université du Michigan à Ann Arbor (États-Unis).

Aditya Mahajan est membre senior de l'Institute of Electrical and Electronics Engineers (IEEE) et membre de Professional Engineers Ontario. Il est actuellement rédacteur en chef adjoint des IEEE Transactions on Automatic Control, des IEEE Control Systems Letters et de Mathematics of Control, Signals, and Systems (Springer). Il a été rédacteur associé au comité de rédaction de la conférence de l'IEEE Control Systems Society de 2014 à 2017.

Il a reçu le prix George Axelby 2015 récompensant un article exceptionnel, un supplément d’accélération à la découverte du Conseil de recherches en sciences naturelles et en génie du Canada (CRSNG) en 2016, le prix CDC du meilleur article étudiant 2014 (en tant que superviseur) et le prix NecSys du meilleur article étudiant 2016 (en tant que superviseur). Ses principaux domaines de recherche sont le contrôle stochastique et l'apprentissage par renforcement.

Étudiants actuels

Maîtrise recherche - McGill
Maîtrise recherche - McGill
Stagiaire de recherche - McGill
Maîtrise recherche - McGill
Doctorat - McGill
Doctorat - McGill
Doctorat - McGill
Doctorat - McGill
Doctorat - McGill

Publications

Structure-Aware Reinforcement Learning for Node-Overload Protection in Mobile Edge Computing
Anirudha Jitani
Zhongwen Zhu
Hatem Abou-Zeid
Emmanuel Thepie Fapi
Hakimeh Purmehdi
Mobile Edge Computing (MEC) involves placing computational capability and applications at the edge of the network, providing benefits such a… (voir plus)s reduced latency, reduced network congestion, and improved performance of applications. The performance and reliability of MEC degrades significantly when the edge server(s) in the cluster are overloaded. In this work, an adaptive admission control policy to prevent edge node from getting overloaded is presented. This approach is based on a recently-proposed low complexity RL (Reinforcement Learning) algorithm called SALMUT (Structure-Aware Learning for Multiple Thresholds), which exploits the structure of the optimal admission control policy in multi-class queues for an average-cost setting. We extend the framework to work for node overload-protection problem in a discounted-cost setting. The proposed solution is validated using several scenarios mimicking real-world deployments in two different settings — computer simulations and a docker testbed. Our empirical evaluations show that the total discounted cost incurred by SALMUT is similar to state-of-the-art deep RL algorithms such as PPO (Proximal Policy Optimization) and A2C (Advantage Actor Critic) but requires an order of magnitude less time to train, outputs easily interpretable policy, and can be deployed in an online manner.
Maintenance of a collection of machines under partial observability: Indexability and computation of Whittle index
Nima Akbarzadeh
We consider the problem of scheduling maintenance for a collection of machines under partial observations when the state of each machine det… (voir plus)eriorates stochastically in a Markovian manner. We consider two observational models: first, the state of each machine is not observable at all, and second, the state of each machine is observable only if a service-person visits them. The agent takes a maintenance action, e.g., machine replacement, if he is chosen for the task. We model both problems as restless multi-armed bandit problem and propose the Whittle index policy for scheduling the visits. We show that both models are indexable. For the first model, we derive a closed-form expression for the Whittle index. For the second model, we propose an efficient algorithm to compute the Whittle index by exploiting the qualitative properties of the optimal policy. We present detailed numerical experiments which show that for multiple instances of the model, the Whittle index policy outperforms myopic policy and can be close-to-optimal in different setups.
Intention estimation and controllable behaviour models for traffic merges
Takayasu Kumano
Yuji Yasui
This work focuses on decision making for automated driving vehicles in interaction rich scenarios like traffic merges in a flexibly assertiv… (voir plus)e yet safe manner. We propose a Q-learning based approach, that takes in active intention inferences as additional inputs besides the directly observed state inputs. The outputs of Q-function are processed to select a decision by a modulation function, which can control how assertively or defensively the agent behaves.
Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Switched Linear Systems
Borna Sayedana
Mohammad Afshari
Peter E. Caines
In this paper, we investigate the problem of system identification for autonomous switched linear systems with complete state observations.… (voir plus) We propose switched least squares method for the identification for switched linear systems, show that this method is strongly consistent, and derive data-dependent and data-independent rates of convergence. In particular, our data-dependent rate of convergence shows that, almost surely, the system identification error is O (cid:0)(cid:112) log( T ) /T (cid:1) where T is the time horizon. These results show that our method for switched linear systems has the same rate of convergence as least squares method for non-switched linear systems. We compare our results with those in the literature. We present numerical examples to illustrate the performance of the proposed system identification method.
Multi-Agent Estimation and Filtering for Minimizing Team Mean-Squared Error
Mohammad Afshari
Motivated by estimation problems arising in autonomous vehicles and decentralized control of unmanned aerial vehicles, we consider multi-age… (voir plus)nt estimation and filtering problems in which multiple agents generate state estimates based on decentralized information and the objective is to minimize a coupled mean-squared error which we call team mean-square error. We call the resulting estimates as minimum team mean-squared error (MTMSE) estimates. We show that MTMSE estimates are different from minimum mean-squared error (MMSE) estimates. We derive closed-form expressions for MTMSE estimates, which are linear function of the observations where the corresponding gain depends on the weight matrix that couples the estimation error. We then consider a filtering problem where a linear stochastic process is monitored by multiple agents which can share their observations (with delay) over a communication graph. We derive expressions to recursively compute the MTMSE estimates. To illustrate the effectiveness of the proposed scheme we consider an example of estimating the distances between vehicles in a platoon and show that MTMSE estimates significantly outperform MMSE estimates and consensus Kalman filtering estimates.
A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems
Mukul Gagrani
Sagar Sudhakara
Ashutosh Nayyar
Yi Ouyang
—We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al. [1]. The… (voir plus) regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modified algorithm has the same Bayesian regret of ˜ O ( √ T ) , where T is the time-horizon and the ˜ O ( · ) notation hides logarithmic terms in T .
Team Optimal Control of Coupled Major-Minor Subsystems with Mean-Field Sharing
Jalal Arabneydi
Approximate Planning and Learning for Partially Observed Systems
Optimal Control of Network-Coupled Subsystems: Spectral Decomposition and Low-Dimensional Solutions
Shuang Gao
In this article, we investigate the optimal control of network-coupled subsystems with coupled dynamics and costs. The dynamics coupling may… (voir plus) be represented by the adjacency matrix, the Laplacian matrix, or any other symmetric matrix corresponding to an underlying weighted undirected graph. Cost couplings are represented by two coupling matrices which have the same eigenvectors as the coupling matrix in the dynamics. We use the spectral decomposition of these three coupling matrices to decompose the overall system into
Optimal Local and Remote Controllers With Unreliable Uplink Channels: An Elementary Proof
Mohammad Afshari
Recently, a model of a decentralized control system with local and remote controllers connected over unreliable channels was presented in [… (voir plus)1]. The model has a nonclassical information structure that is not partially nested. Nonetheless, it is shown in [1] that the optimal control strategies are linear functions of the state estimate (which is a nonlinear function of the observations). Their proof is based on a fairly sophisticated dynamic programming argument. In this article, we present an alternative and elementary proof of the result which uses common information-based conditional independence and completion of squares.
Renewal Monte Carlo: Renewal Theory-Based Reinforcement Learning
Jayakumar Subramanian
An online reinforcement learning algorithm called renewal Monte Carlo (RMC) is presented. RMC works for infinite horizon Markov decision pro… (voir plus)cesses with a designated start state. RMC is a Monte Carlo algorithm that retains the key advantages of Monte Carlo—viz., simplicity, ease of implementation, and low bias—while circumventing the main drawbacks of Monte Carlo—viz., high variance and delayed updates. Given a parameterized policy
Counterexamples on the Monotonicity of Delay Optimal Strategies for Energy Harvesting Transmitters
Borna Sayedana
We consider cross-layer design of delay optimal transmission strategies for energy harvesting transmitters where the data and energy arrival… (voir plus) processes are stochastic. Using Markov decision theory, we show that the value function is weakly increasing in the queue state and weakly decreasing in the battery state. It is natural to expect that the delay optimal policy should be weakly increasing in the queue and battery states. We show via counterexamples that this is not the case. In fact, we show that for some sample scenarios the delay optimal policy may perform 5–13% better than the best monotone policy.