Portrait of Aditya Mahajan

Aditya Mahajan

Associate Academic Member
Associate Professor, McGill University, Department of Electrical and Computer Engineering
Research Topics
Reinforcement Learning

Biography

Aditya Mahajan is a professor in the Department of Electrical and Computer Engineering at McGill University and an associate academic member of Mila – Quebec Artificial Intelligence Institute.

He is also a member of the McGill Centre for Intelligent Machines (CIM), the International Laboratory for Learning Systems (ILLS), and the Group for Research in Decision Analysis (GERAD). Mahajan received his BTech degree in electrical engineering from the Indian Institute of Technology Kanpur, and his MSc and PhD degrees in electrical engineering and computer science from the University of Michigan at Ann Arbor.

He is a senior member of the U.S. Institute of Electrical and Electronics Engineers (IEEE), as well as a member of Professional Engineers Ontario. He currently serves as associate editor for IEEE Transactions on Automatic Control, IEEE Control Systems Letters, and Mathematics of Control, Signals, and Systems (Springer). He served as associate editor for the conference editorial board of the IEEE Control Systems Society from 2014 to 2017.

Mahajan’s numerous awards include the 2015 George Axelby Outstanding Paper Award, 2016 NSERC Discovery Accelerator Award, 2014 CDC Best Student Paper Award (as supervisor), and 2016 NecSys Best Student Paper Award (as supervisor). Mahajan’s principal research interests are stochastic control and reinforcement learning.

Current Students

Collaborating Alumni - McGill University
Master's Research - McGill University
Research Intern - McGill University
Master's Research - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University

Publications

Restless bandits: indexability and computation of Whittle index
Nima Akbarzadeh
Restless bandits are a class of sequential resource allocation problems concerned with allocating one or more resources among several altern… (see more)ative processes where the evolution of the process depends on the resource allocated to them. Such models capture the fundamental trade-offs between exploration and exploitation. In 1988, Whittle developed an index heuristic for restless bandit problems which has emerged as a popular solution approach due to its simplicity and strong empirical performance. The Whittle index heuristic is applicable if the model satisfies a technical condition known as indexability. In this paper, we present two general sufficient conditions for indexability and identify simpler to verify refinements of these conditions. We then present a general algorithm to compute Whittle index for indexable restless bandits. Finally, we present a detailed numerical study which affirms the strong performance of the Whittle index heuristic.
Decentralized Linear Quadratic Systems With Major and Minor Agents and Non-Gaussian Noise
Mohammad Afshari
A decentralized linear quadratic system with a major agent and a collection of minor agents is considered. The major agent affects the minor… (see more) agents, but not vice versa. The state of the major agent is observed by all agents. In addition, the minor agents have a noisy observation of their local state. The noise process is not assumed to be Gaussian. The structures of the optimal strategy and the best linear strategy are characterized. It is shown that the major agent's optimal control action is a linear function of the major agent's minimum mean-squared error (MMSE) estimate of the system state while the minor agent's optimal control action is a linear function of the major agent's MMSE estimate of the system state and a “correction term” that depends on the difference of the minor agent's MMSE estimate of its local state and the major agent's MMSE estimate of the minor agent's local state. Since the noise is non-Gaussian, the minor agent's MMSE estimate is a nonlinear function of its observation. It is shown that replacing the minor agent's MMSE estimate with its linear least mean square estimate gives the best linear control strategy. The results are proved using a direct method based on conditional independence, common-information-based splitting of state and control actions, and simplifying the per-step cost based on conditional independence, orthogonality principle, and completion of squares.
Cross-layer communication over fading channels with adaptive decision feedback
Borna Sayedana
E. Yeh
In this paper, cross-layer design of transmitting data packets over AWGN fading channel with adaptive decision feedback is considered. The t… (see more)ransmitter decides the number of packets to transmit and the threshold of the decision feedback based on the queue length and the channel state. The transmit power is chosen such that the probability of error is below a pre-specified threshold. We model the system as a Markov decision process and use ideas from lattice theory to establish qualitative properties of optimal transmission strategies. In particular, we show that: (i) if the channel state remains the same and the number of packets in the queue increase, then the optimal policy either transmits more packets or uses a smaller decision feedback threshold or both; and (ii) if the number of packets in the queue remain the same and the channel quality deteriorates, then the optimal policy either transmits fewer packets or uses a larger threshold for the decision feedback or both. We also show under rate constraints that if the channel gains for all channel states are above a threshold, then the “or” in the above characterization can be replaced by “and”. Finally, we present a numerical example showing that adaptive decision feedback significantly improves the power-delay trade-off as compared with the case of no feedback.
Approximate information state for partially observed systems
Jayakumar Subramanian
The standard approach for modeling partially observed systems is to model them as partially observable Markov decision processes (POMDPs) an… (see more)d obtain a dynamic program in terms of a belief state. The belief state formulation works well for planning but is not ideal for online reinforcement learning because the belief state depends on the model and, as such, is not observable when the model is unknown.In this paper, we present an alternative notion of an information state for obtaining a dynamic program in partially observed models. In particular, an information state is a sufficient statistic for the current reward which evolves in a controlled Markov manner. We show that such an information state leads to a dynamic programming decomposition. Then we present a notion of an approximate information state and present an approximate dynamic program based on the approximate information state. Approximate information state is defined in terms of properties that can be estimated using sampled trajectories. Therefore, they provide a constructive method for reinforcement learning in partially observed systems. We present one such construction and show that it performs better than the state of the art for three benchmark models.
Networked control of coupled subsystems: Spectral decomposition and low-dimensional solutions
Shuang Gao
In this paper, we investigate optimal networked control of coupled subsystems where the dynamics and the cost couplings depend on an underly… (see more)ing weighted graph. We use the spectral decomposition of the graph adjacency matrix to decompose the overall system into (L+1) systems with decoupled dynamics and cost, where L is the rank of the adjacency matrix. Consequently, the optimal control input at each subsystem can be computed by solving (L+1) decoupled Riccati equations. A salient feature of the result is that the solution complexity depends on the rank of the adjacency matrix rather than the size of the network (i.e., the number of nodes). Therefore, the proposed solution framework provides a scalable method for synthesizing and implementing optimal control laws for large-scale systems.
Restless bandits with controlled restarts: Indexability and computation of Whittle index
Nima Akbarzadeh
Motivated by applications in machine repair, queueing, surveillance, and clinic care, we consider a scheduling problem where a decision make… (see more)r can reset m out of n Markov processes at each time. Processes that are reset, restart according to a known probability distribution and processes that are not reset, evolve in a Markovian manner. Due to the high complexity of finding an optimal policy, such scheduling problems are often modeled as restless bandits. We show that the model satisfies a technical condition known as indexability. For indexable restless bandits, the Whittle index policy, which computes a function known as Whittle index for each process and resets the m processes with the lowest index, is known to be a good heuristic. The Whittle index is computed by solving an auxiliary Markov decision problem for each arm. When the optimal policy for this auxiliary problem is threshold based, we use ideas from renewal theory to derive closed form expression for the Whittle index. We present detailed numerical experiments which suggest that Whittle index policy performs close to the optimal policy and performs significantly better than myopic policy, which is a commonly used heuristic.
Dynamic spectrum access under partial observations: A restless bandit approach
Nima Akbarzadeh
We consider a communication system where multiple unknown channels are available for transmission. Each channel is a channel with state whic… (see more)h evolves in a Markov manner. The transmitter has to select L channels to use and also decide the resources (e.g., power, rate, etc.) to use for each of the selected channels. It observes the state of the channels it uses and receives no feedback on the state of the other channels. We model this problem as a partially observable Markov decision process and obtain a simplified belief state. We show that the optimal resource allocation policy can be identified in closed form. Once the optimal resource allocation policy is fixed, choosing the channel scheduling policy may be viewed as a restless bandit. We present an efficient algorithm to check indexability and compute the Whittle index for each channel. When the model is indexable, the Whittle index policy, which transmits over the L channels with the smallest Whittle indices, is an attractive heuristic policy.
Multi-Agent Estimation and Filtering for Minimizing Team Mean-Squared Error
Mohammad Afshari
Motivated by estimation problems arising in autonomous vehicles and decentralized control of unmanned aerial vehicles, we consider multi-age… (see more)nt estimation and filtering problems in which multiple agents generate state estimates based on decentralized information and the objective is to minimize a coupled mean-squared error which we call team mean-square error. We call the resulting estimates as minimum team mean-squared error (MTMSE) estimates. We show that MTMSE estimates are different from minimum mean-squared error (MMSE) estimates. We derive closed-form expressions for MTMSE estimates, which are linear function of the observations where the corresponding gain depends on the weight matrix that couples the estimation error. We then consider a filtering problem where a linear stochastic process is monitored by multiple agents which can share their observations (with delay) over a communication graph. We derive expressions to recursively compute the MTMSE estimates. To illustrate the effectiveness of the proposed scheme we consider an example of estimating the distances between vehicles in a platoon and show that MTMSE estimates significantly outperform MMSE estimates and consensus Kalman filtering estimates.
Reinforcement Learning in Stationary Mean-field Games
Jayakumar Subramanian
Multi-agent reinforcement learning has made significant progress in recent years, but it remains a hard problem. Hence, one often resorts to… (see more) developing learning algorithms for specific classes of multi-agent systems. In this paper we study reinforcement learning in a specific class of multi-agent systems systems called mean-field games. In particular, we consider learning in stationary mean-field games. We identify two different solution concepts---stationary mean-field equilibrium and stationary mean-field social-welfare optimal policy---for such games based on whether the agents are non-cooperative or cooperative, respectively. We then generalize these solution concepts to their local variants using bounded rationality based arguments. For these two local solution concepts, we present two reinforcement learning algorithms. We show that the algorithms converge to the right solution under mild technical conditions and demonstrate this using two numerical examples.