Berk Bozkurt

Collaborateur·rice alumni - McGill

Superviseur⋅e principal⋅e

Aditya Mahajan

Sujets de recherche

Apprentissage par renforcement

Contrôle stochastique

Site web

Google Scholar

Publications

Sub-optimality bounds for certainty equivalent policies in partially observed systems

Berk Bozkurt

Aditya Mahajan

Ashutosh Nayyar

Yi Ouyang

In this paper, we present a generalization of the certainty equivalence principle of stochastic control. One interpretation of the classical… (voir plus) certainty equivalence principle for linear systems with output feedback and quadratic costs is as follows: the optimal action at each time is obtained by evaluating the optimal state-feedback policy of the stochastic linear system at the minimum mean square error (MMSE) estimate of the state. Motivated by this interpretation, we consider certainty equivalent policies for general (non-linear) partially observed stochastic systems that allow for any state estimate rather than restricting to MMSE estimates. In such settings, the certainty equivalent policy is not optimal. For models where the cost and the dynamics are smooth in an appropriate sense, we derive upper bounds on the sub-optimality of certainty equivalent policies. We present several examples to illustrate the results.

2026-02-01

ArXiv (prépublication)

arxiv.org

Generalized certainty equivalence based policies in partially observable systems

Berk Bozkurt

Aditya Mahajan

Ashutosh Nayyar

Yi Ouyang

In this paper, we present a generalization of the certainty equivalence principle of stochastic control. One interpretation of the classical… (voir plus) certainty equivalence principle for linear systems with output feedback and quadratic costs is as follows: the optimal action at each time is obtained by evaluating the optimal state-feedback policy of the stochastic linear system at the minimum mean square error (MMSE) estimate of the state. Motivated by this interpretation, we consider certainty equivalent policies for general (non-linear) partially observed stochastic systems and allow for any state estimate rather than restricting to MMSE estimates. In such settings, the certainty equivalent policy is not optimal. For models with Lipschitz cost and dynamics, we derive upper bounds on the sub-optimality of certainty equivalent policies in terms of expected error of the proposed estimator. We present several examples to illustrate the results.

2025-12-08

IEEE Conference on Decision and Control (publié)

doi.org

Model approximation in MDPs with unbounded per-step cost

Berk Bozkurt

Aditya Mahajan

Ashutosh Nayyar

Yi Ouyang

We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov decision process …

2025-06-30

IEEE Transactions on Automatic Control (publié)

doi.org