Aditya Mahajan

Motivated by estimation problems arising in autonomous vehicles and decentralized control of unmanned aerial vehicles, we consider multi-age… (voir plus)nt estimation and filtering problems in which multiple agents generate state estimates based on decentralized information and the objective is to minimize a coupled mean-squared error which we call team mean-square error. We call the resulting estimates as minimum team mean-squared error (MTMSE) estimates. We show that MTMSE estimates are different from minimum mean-squared error (MMSE) estimates. We derive closed-form expressions for MTMSE estimates, which are linear function of the observations where the corresponding gain depends on the weight matrix that couples the estimation error. We then consider a filtering problem where a linear stochastic process is monitored by multiple agents which can share their observations (with delay) over a communication graph. We derive expressions to recursively compute the MTMSE estimates. To illustrate the effectiveness of the proposed scheme we consider an example of estimating the distances between vehicles in a platoon and show that MTMSE estimates significantly outperform MMSE estimates and consensus Kalman filtering estimates.

2021-01-01

IEEE Transactions on Signal Processing (publié)

A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems

Mukul Gagrani

Sagar Sudhakara

Ashutosh Nayyar

Yi Ouyang

—We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al. [1]. The… (voir plus) regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modiﬁcation in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modiﬁed algorithm has the same Bayesian regret of ˜ O ( √ T ) , where T is the time-horizon and the ˜ O ( · ) notation hides logarithmic terms in T .

2021-01-01

arXiv.org (prépublication)

dblp.uni-trier.de

Team Optimal Control of Coupled Major-Minor Subsystems with Mean-Field Sharing

Jalal Arabneydi

2020-12-04

ArXiv (prépublication)

Approximate Planning and Learning for Partially Observed Systems

2020-11-09

International Conference of Control, Dynamic Systems, and Robotics (publié)

Optimal Control of Network-Coupled Subsystems: Spectral Decomposition and Low-Dimensional Solutions

Shuang Gao

In this article, we investigate the optimal control of network-coupled subsystems with coupled dynamics and costs. The dynamics coupling may… (voir plus) be represented by the adjacency matrix, the Laplacian matrix, or any other symmetric matrix corresponding to an underlying weighted undirected graph. Cost couplings are represented by two coupling matrices which have the same eigenvectors as the coupling matrix in the dynamics. We use the spectral decomposition of these three coupling matrices to decompose the overall system into

2020-09-25

ArXiv (preprint)

Optimal Local and Remote Controllers With Unreliable Uplink Channels: An Elementary Proof

Mohammad Afshari

Recently, a model of a decentralized control system with local and remote controllers connected over unreliable channels was presented in [… (voir plus)1]. The model has a nonclassical information structure that is not partially nested. Nonetheless, it is shown in [1] that the optimal control strategies are linear functions of the state estimate (which is a nonlinear function of the observations). Their proof is based on a fairly sophisticated dynamic programming argument. In this article, we present an alternative and elementary proof of the result which uses common information-based conditional independence and completion of squares.

2020-08-01

IEEE Transactions on Automatic Control (publié)

Renewal Monte Carlo: Renewal Theory-Based Reinforcement Learning

Jayakumar Subramanian

An online reinforcement learning algorithm called renewal Monte Carlo (RMC) is presented. RMC works for infinite horizon Markov decision pro… (voir plus)cesses with a designated start state. RMC is a Monte Carlo algorithm that retains the key advantages of Monte Carlo—viz., simplicity, ease of implementation, and low bias—while circumventing the main drawbacks of Monte Carlo—viz., high variance and delayed updates. Given a parameterized policy

2020-08-01

IEEE Transactions on Automatic Control (publié)

Counterexamples on the Monotonicity of Delay Optimal Strategies for Energy Harvesting Transmitters

Borna Sayedana

We consider cross-layer design of delay optimal transmission strategies for energy harvesting transmitters where the data and energy arrival… (voir plus) processes are stochastic. Using Markov decision theory, we show that the value function is weakly increasing in the queue state and weakly decreasing in the battery state. It is natural to expect that the delay optimal policy should be weakly increasing in the queue and battery states. We show via counterexamples that this is not the case. In fact, we show that for some sample scenarios the delay optimal policy may perform 5–13% better than the best monotone policy.

2020-07-01

IEEE Wireless Communications Letters (publié)

Restless bandits: indexability and computation of Whittle index

Nima Akbarzadeh

Restless bandits are a class of sequential resource allocation problems concerned with allocating one or more resources among several altern… (voir plus)ative processes where the evolution of the process depends on the resource allocated to them. Such models capture the fundamental trade-offs between exploration and exploitation. In 1988, Whittle developed an index heuristic for restless bandit problems which has emerged as a popular solution approach due to its simplicity and strong empirical performance. The Whittle index heuristic is applicable if the model satisfies a technical condition known as indexability. In this paper, we present two general sufficient conditions for indexability and identify simpler to verify refinements of these conditions. We then present a general algorithm to compute Whittle index for indexable restless bandits. Finally, we present a detailed numerical study which affirms the strong performance of the Whittle index heuristic.

2020-06-01

arXiv.org (prépublication)

dblp.uni-trier.de

Decentralized Linear Quadratic Systems With Major and Minor Agents and Non-Gaussian Noise

Mohammad Afshari

A decentralized linear quadratic system with a major agent and a collection of minor agents is considered. The major agent affects the minor… (voir plus) agents, but not vice versa. The state of the major agent is observed by all agents. In addition, the minor agents have a noisy observation of their local state. The noise process is not assumed to be Gaussian. The structures of the optimal strategy and the best linear strategy are characterized. It is shown that the major agent's optimal control action is a linear function of the major agent's minimum mean-squared error (MMSE) estimate of the system state while the minor agent's optimal control action is a linear function of the major agent's MMSE estimate of the system state and a “correction term” that depends on the difference of the minor agent's MMSE estimate of its local state and the major agent's MMSE estimate of the minor agent's local state. Since the noise is non-Gaussian, the minor agent's MMSE estimate is a nonlinear function of its observation. It is shown that replacing the minor agent's MMSE estimate with its linear least mean square estimate gives the best linear control strategy. The results are proved using a direct method based on conditional independence, common-information-based splitting of state and control actions, and simplifying the per-step cost based on conditional independence, orthogonality principle, and completion of squares.

2020-04-24

ArXiv (preprint)

Cross-layer communication over fading channels with adaptive decision feedback

Borna Sayedana

E. Yeh

In this paper, cross-layer design of transmitting data packets over AWGN fading channel with adaptive decision feedback is considered. The t… (voir plus)ransmitter decides the number of packets to transmit and the threshold of the decision feedback based on the queue length and the channel state. The transmit power is chosen such that the probability of error is below a pre-specified threshold. We model the system as a Markov decision process and use ideas from lattice theory to establish qualitative properties of optimal transmission strategies. In particular, we show that: (i) if the channel state remains the same and the number of packets in the queue increase, then the optimal policy either transmits more packets or uses a smaller decision feedback threshold or both; and (ii) if the number of packets in the queue remain the same and the channel quality deteriorates, then the optimal policy either transmits fewer packets or uses a larger threshold for the decision feedback or both. We also show under rate constraints that if the channel gains for all channel states are above a threshold, then the “or” in the above characterization can be replaced by “and”. Finally, we present a numerical example showing that adaptive decision feedback significantly improves the power-delay trade-off as compared with the case of no feedback.

2020-01-01

WiOpt (publié)

dblp.uni-trier.de

Approximate information state for partially observed systems

Jayakumar Subramanian

The standard approach for modeling partially observed systems is to model them as partially observable Markov decision processes (POMDPs) an… (voir plus)d obtain a dynamic program in terms of a belief state. The belief state formulation works well for planning but is not ideal for online reinforcement learning because the belief state depends on the model and, as such, is not observable when the model is unknown.In this paper, we present an alternative notion of an information state for obtaining a dynamic program in partially observed models. In particular, an information state is a sufficient statistic for the current reward which evolves in a controlled Markov manner. We show that such an information state leads to a dynamic programming decomposition. Then we present a notion of an approximate information state and present an approximate dynamic program based on the approximate information state. Approximate information state is defined in terms of properties that can be estimated using sampled trajectories. Therefore, they provide a constructive method for reinforcement learning in partially observed systems. We present one such construction and show that it performs better than the state of the art for three benchmark models.

2019-12-01

IEEE Conference on Decision and Control (publié)