Portrait de Berk Bozkurt n'est pas disponible

Berk Bozkurt

Collaborateur·rice alumni - McGill
Superviseur⋅e principal⋅e
Sujets de recherche
Apprentissage par renforcement
Contrôle stochastique

Publications

Sub-optimality bounds for certainty equivalent policies in partially observed systems
Ashutosh Nayyar
Yi Ouyang
In this paper, we present a generalization of the certainty equivalence principle of stochastic control. One interpretation of the classical… (voir plus) certainty equivalence principle for linear systems with output feedback and quadratic costs is as follows: the optimal action at each time is obtained by evaluating the optimal state-feedback policy of the stochastic linear system at the minimum mean square error (MMSE) estimate of the state. Motivated by this interpretation, we consider certainty equivalent policies for general (non-linear) partially observed stochastic systems that allow for any state estimate rather than restricting to MMSE estimates. In such settings, the certainty equivalent policy is not optimal. For models where the cost and the dynamics are smooth in an appropriate sense, we derive upper bounds on the sub-optimality of certainty equivalent policies. We present several examples to illustrate the results.
Generalized certainty equivalence based policies in partially observable systems
Ashutosh Nayyar
Yi Ouyang
In this paper, we present a generalization of the certainty equivalence principle of stochastic control. One interpretation of the classical… (voir plus) certainty equivalence principle for linear systems with output feedback and quadratic costs is as follows: the optimal action at each time is obtained by evaluating the optimal state-feedback policy of the stochastic linear system at the minimum mean square error (MMSE) estimate of the state. Motivated by this interpretation, we consider certainty equivalent policies for general (non-linear) partially observed stochastic systems and allow for any state estimate rather than restricting to MMSE estimates. In such settings, the certainty equivalent policy is not optimal. For models with Lipschitz cost and dynamics, we derive upper bounds on the sub-optimality of certainty equivalent policies in terms of expected error of the proposed estimator. We present several examples to illustrate the results.
Model approximation in MDPs with unbounded per-step cost
Ashutosh Nayyar
Yi Ouyang
We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov decision process …
Weighted-Norm Bounds on Model Approximation in MDPs with Unbounded Per-Step Cost
Ashutosh Nayyar
Yi Ouyang
We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov Decision Process (MDP) …