Portrait de Aditya Mahajan

Aditya Mahajan

Membre académique associé
Professeur agrégé, McGill University, Département de génie électrique et informatique
Sujets de recherche
Apprentissage par renforcement

Biographie

Aditya Mahajan est professeur de génie électrique et informatique à l'Université McGill. Il est membre du Centre sur les machines intelligentes (CIM) de McGill, de Mila – Institut québécois d’intelligence artificielle, du Laboratoire international des systèmes d'apprentissage (ILLS) et du Groupe d'études et de recherche en analyse des décisions (GERAD). Il est titulaire d'une licence en génie électrique de l'Indian Institute of Technology de Kanpur (Inde), ainsi que d'une maîtrise et d'un doctorat en génie électrique et en informatique de l'Université du Michigan à Ann Arbor (États-Unis).

Aditya Mahajan est membre senior de l'Institute of Electrical and Electronics Engineers (IEEE) et membre de Professional Engineers Ontario. Il est actuellement rédacteur en chef adjoint des IEEE Transactions on Automatic Control, des IEEE Control Systems Letters et de Mathematics of Control, Signals, and Systems (Springer). Il a été rédacteur associé au comité de rédaction de la conférence de l'IEEE Control Systems Society de 2014 à 2017.

Il a reçu le prix George Axelby 2015 récompensant un article exceptionnel, un supplément d’accélération à la découverte du Conseil de recherches en sciences naturelles et en génie du Canada (CRSNG) en 2016, le prix CDC du meilleur article étudiant 2014 (en tant que superviseur) et le prix NecSys du meilleur article étudiant 2016 (en tant que superviseur). Ses principaux domaines de recherche sont le contrôle stochastique et l'apprentissage par renforcement.

Étudiants actuels

Maîtrise recherche - McGill
Maîtrise recherche - McGill
Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Maîtrise recherche - UdeM
Doctorat - McGill
Maîtrise recherche - McGill
Doctorat - McGill
Doctorat - McGill

Publications

Networked control of coupled subsystems: Spectral decomposition and low-dimensional solutions
Shuang Gao
In this paper, we investigate optimal networked control of coupled subsystems where the dynamics and the cost couplings depend on an underly… (voir plus)ing weighted graph. We use the spectral decomposition of the graph adjacency matrix to decompose the overall system into (L+1) systems with decoupled dynamics and cost, where L is the rank of the adjacency matrix. Consequently, the optimal control input at each subsystem can be computed by solving (L+1) decoupled Riccati equations. A salient feature of the result is that the solution complexity depends on the rank of the adjacency matrix rather than the size of the network (i.e., the number of nodes). Therefore, the proposed solution framework provides a scalable method for synthesizing and implementing optimal control laws for large-scale systems.
Restless bandits with controlled restarts: Indexability and computation of Whittle index
Motivated by applications in machine repair, queueing, surveillance, and clinic care, we consider a scheduling problem where a decision make… (voir plus)r can reset m out of n Markov processes at each time. Processes that are reset, restart according to a known probability distribution and processes that are not reset, evolve in a Markovian manner. Due to the high complexity of finding an optimal policy, such scheduling problems are often modeled as restless bandits. We show that the model satisfies a technical condition known as indexability. For indexable restless bandits, the Whittle index policy, which computes a function known as Whittle index for each process and resets the m processes with the lowest index, is known to be a good heuristic. The Whittle index is computed by solving an auxiliary Markov decision problem for each arm. When the optimal policy for this auxiliary problem is threshold based, we use ideas from renewal theory to derive closed form expression for the Whittle index. We present detailed numerical experiments which suggest that Whittle index policy performs close to the optimal policy and performs significantly better than myopic policy, which is a commonly used heuristic.
Dynamic spectrum access under partial observations: A restless bandit approach
We consider a communication system where multiple unknown channels are available for transmission. Each channel is a channel with state whic… (voir plus)h evolves in a Markov manner. The transmitter has to select L channels to use and also decide the resources (e.g., power, rate, etc.) to use for each of the selected channels. It observes the state of the channels it uses and receives no feedback on the state of the other channels. We model this problem as a partially observable Markov decision process and obtain a simplified belief state. We show that the optimal resource allocation policy can be identified in closed form. Once the optimal resource allocation policy is fixed, choosing the channel scheduling policy may be viewed as a restless bandit. We present an efficient algorithm to check indexability and compute the Whittle index for each channel. When the model is indexable, the Whittle index policy, which transmits over the L channels with the smallest Whittle indices, is an attractive heuristic policy.
Multi-Agent Estimation and Filtering for Minimizing Team Mean-Squared Error
Mohammad Afshari
Motivated by estimation problems arising in autonomous vehicles and decentralized control of unmanned aerial vehicles, we consider multi-age… (voir plus)nt estimation and filtering problems in which multiple agents generate state estimates based on decentralized information and the objective is to minimize a coupled mean-squared error which we call team mean-square error. We call the resulting estimates as minimum team mean-squared error (MTMSE) estimates. We show that MTMSE estimates are different from minimum mean-squared error (MMSE) estimates. We derive closed-form expressions for MTMSE estimates, which are linear function of the observations where the corresponding gain depends on the weight matrix that couples the estimation error. We then consider a filtering problem where a linear stochastic process is monitored by multiple agents which can share their observations (with delay) over a communication graph. We derive expressions to recursively compute the MTMSE estimates. To illustrate the effectiveness of the proposed scheme we consider an example of estimating the distances between vehicles in a platoon and show that MTMSE estimates significantly outperform MMSE estimates and consensus Kalman filtering estimates.
Reinforcement Learning in Stationary Mean-field Games
Jayakumar Subramanian
Multi-agent reinforcement learning has made significant progress in recent years, but it remains a hard problem. Hence, one often resorts to… (voir plus) developing learning algorithms for specific classes of multi-agent systems. In this paper we study reinforcement learning in a specific class of multi-agent systems systems called mean-field games. In particular, we consider learning in stationary mean-field games. We identify two different solution concepts---stationary mean-field equilibrium and stationary mean-field social-welfare optimal policy---for such games based on whether the agents are non-cooperative or cooperative, respectively. We then generalize these solution concepts to their local variants using bounded rationality based arguments. For these two local solution concepts, we present two reinforcement learning algorithms. We show that the algorithms converge to the right solution under mild technical conditions and demonstrate this using two numerical examples.