Aditya Mahajan

Constant step-size stochastic approximation with delayed updates

Silviu-Iulian Niculescu

Mathukumalli Vidyasagar

In this paper, we consider constant step-size stochastic approximation with delayed updates. For the non-delayed case, it is well known that… (see more) under appropriate conditions, the discrete-time iterates of stochastic approximation track the trajectory of a continuous-time ordinary differential equation (ODE). For the delayed case, we show in this paper that, under appropriate conditions, the discrete-time iterates track the trajectory of a delay-differential equation (DDE) rather than an ODE. Thus, delayed updates lead to a qualitative change in the behavior of constant step-size stochastic approximation. We present multiple examples to illustrate the qualitative affect of delay and show that increasing the delay is generally destabilizing but, for some systems, it can be stabilizing as well.

2024-12-16

2024 IEEE 63rd Conference on Decision and Control (CDC) (published)

doi.org

A vector almost-supermartingale convergence theorem and its applications

Aditya Mahajan

Silviu-Iulian Niculescu

Mathukumalli Vidyasagar

The almost-supermartingale convergence theorem of Robbins and Siegmund (1971) is a fundamental tool for establishing the convergence of vari… (see more)ous stochastic iterative algorithms including system identification, adaptive control, and reinforcement learning. The theorem is stated for non-negative scalar valued stochastic processes. In this paper, we generalize the theorem to non-negative vector valued stochastic processes and provide two set of sufficient conditions for such processes to converge almost surely. We present several applications of vector almost-supermartingale convergence theorem, including convergence of autoregressive supermartingales, delayed supermartingales, and stochastic approximation with delayed updates.

2024-12-16

2024 IEEE 63rd Conference on Decision and Control (CDC) (published)

doi.org

Periodic agent-state based Q-learning for POMDPs

Amit Sinha

Matthieu Geist

Aditya Mahajan

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Periodic agent-state based Q-learning for POMDPs

Amit Sinha

Matthieu Geist

Aditya Mahajan

The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP. H… (see more)owever, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy. Our main thesis that we illustrate via examples is that because the agent state does not satisfy the Markov property, non-stationary agent-state based policies can outperform stationary ones. To leverage this feature, we propose PASQL (periodic agent-state based Q-learning), which is a variant of agent-state-based Q-learning that learns periodic policies. By combining ideas from periodic Markov chains and stochastic approximation, we rigorously establish that PASQL converges to a cyclic limit and characterize the approximation error of the converged periodic policy. Finally, we present a numerical experiment to highlight the salient features of PASQL and demonstrate the benefit of learning periodic policies over stationary policies.

2024-08-01

EWRL/2024/Workshop (accepted)

openreview.net

On learning history-based policies for controlling Markov decision processes

Gandharv Patil

Aditya Mahajan

Doina Precup

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

doi.org

openreview.net

Model approximation in MDPs with unbounded per-step cost

Berk Bozkurt

Aditya Mahajan

Ashutosh Nayyar

Yi Ouyang

We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov decision process …

2024-02-13

ArXiv (preprint)

doi.org

arxiv.org

Bridging State and History Representations: Understanding Self-Predictive RL

Tianwei Ni

Benjamin Eysenbach

Erfan SeyedSalehi

Michel Ma

Clement Gehring

Aditya Mahajan

Pierre-Luc Bacon

Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially obse… (see more)rvable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared properties among them remain unclear. In this paper, we show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction. Furthermore, we provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations. These findings together yield a minimalist algorithm to learn self-predictive representations for states and histories. We validate our theories by applying our algorithm to standard MDPs, MDPs with distractors, and POMDPs with sparse rewards. These findings culminate in a set of preliminary guidelines for RL practitioners.

2024-01-17

ArXiv (preprint)

doi.org

arxiv.org

Bridging State and History Representations: Understanding Self-Predictive RL

Tianwei Ni

Benjamin Eysenbach

Erfan SeyedSalehi

Michel Ma

Clement Gehring

Aditya Mahajan

Pierre-Luc Bacon

Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially obse… (see more)rvable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared properties among them remain unclear. In this paper, we show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction. Furthermore, we provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations. These findings together yield a minimalist algorithm to learn self-predictive representations for states and histories. We validate our theories by applying our algorithm to standard MDPs, MDPs with distractors, and POMDPs with sparse rewards. These findings culminate in a set of preliminary guidelines for RL practitioners.

2024-01-17

ArXiv (preprint)

doi.org

arxiv.org

Bridging State and History Representations: Understanding Self-Predictive RL

Tianwei Ni

Benjamin Eysenbach

Erfan SeyedSalehi

Michel Ma

Clement Gehring

Aditya Mahajan

Pierre-Luc Bacon

Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially obse… (see more)rvable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared properties among them remain unclear. In this paper, we show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction. Furthermore, we provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations. These findings together yield a minimalist algorithm to learn self-predictive representations for states and histories. We validate our theories by applying our algorithm to standard MDPs, MDPs with distractors, and POMDPs with sparse rewards. These findings culminate in a set of preliminary guidelines for RL practitioners.

2024-01-16

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

On learning Whittle index policy for restless bandits with scalable regret

Nima Akbarzadeh

Aditya Mahajan

Reinforcement learning is an attractive approach to learn good resource allocation and scheduling policies based on data when the system mod… (see more)el is unknown. However, the cumulative regret of most RL algorithms scales as ˜ O(S

2024-01-01

IEEE Transactions on Control of Network Systems (published)

doi.org

arxiv.org

Strong Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Markov Jump Linear Systems

Borna Sayedana

Mohammad Afshari

Peter E. Caines

Aditya Mahajan

In this paper, we investigate the problem of system identification for autonomous Markov jump linear systems (MJS) with complete state obser… (see more)vations. We propose switched least squares method for identification of MJS, show that this method is strongly consistent, and derive data-dependent and data-independent rates of convergence. In particular, our data-independent rate of convergence shows that, almost surely, the system identification error is

2024-01-01

IEEE Transactions on Automatic Control (published)

doi.org

arxiv.org

Two Families of Indexable Partially Observable Restless Bandits and Whittle Index Computation

Nima Akbarzadeh

Aditya Mahajan

2024-01-01

Performance Evaluation (published)

doi.org

arxiv.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Aditya Mahajan

Biography

Current Students

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Aditya Mahajan

Biography

Current Students

Publications