New discount and average optimality conditions for continuous-time Markov decision processes

Xianping Guo; Liuer Ye

doi:10.1239/aap/1293113146

New discount and average optimality conditions for continuous-time Markov decision processes

Part of: Markov processes

Published online by Cambridge University Press: 01 July 2016

Xianping Guo and

Liuer Ye

Show author details

Xianping Guo*: Affiliation:
Sun Yat-Sen University
Liuer Ye*: Affiliation:
Sun Yat-Sen University
*: ∗ Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, 510275, P. R. China.
∗ Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, 510275, P. R. China.

Article contents

Abstract
Footnotes
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This paper deals with continuous-time Markov decision processes in Polish spaces, under the discounted and average cost criteria. All underlying Markov processes are determined by given transition rates which are allowed to be unbounded, and the costs are assumed to be bounded below. By introducing an occupation measure of a randomized Markov policy and analyzing properties of occupation measures, we first show that the family of all randomized stationary policies is ‘sufficient’ within the class of all randomized Markov policies. Then, under the semicontinuity and compactness conditions, we prove the existence of a discounted cost optimal stationary policy by providing a value iteration technique. Moreover, by developing a new average cost, minimum nonnegative solution method, we prove the existence of an average cost optimal stationary policy under some reasonably mild conditions. Finally, we use some examples to illustrate applications of our results. Except that the costs are assumed to be bounded below, the conditions for the existence of discounted cost (or average cost) optimal policies are much weaker than those in the previous literature, and the minimum nonnegative solution approach is new.

Keywords

Occupation measure optimality inequality/equation discounted cost optimal policy average cost optimal policy minimum nonnegative solution method

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes 60J27: Continuous-time Markov processes on discrete state spaces

Information

Type: General Applied Probability
Information: Advances in Applied Probability , Volume 42 , Issue 4 , December 2010 , pp. 953 - 985

DOI: https://doi.org/10.1239/aap/1293113146 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 2010

Footnotes

Research supported by the NSFC and GDUPS (2010).

References

Anderson, W. J. (1991). Continuous-Time Markov Chains. Springer, New York.Google Scholar

Chen, M.-F. (2004). From Markov Chains to Non-Equilibrium Particle Systems, 2nd edn. World Scientific, River Edge, NJ.Google Scholar

Doshi, B. T. (1976). Continuous time control of Markov processes on an arbitrary state space: discounted rewards. Ann. Statist. 4, 1219–1235.Google Scholar

Feinberg, E. A. (2004). Continuous time discounted Jump Markov decision processes: a discrete-event approach. Math. Operat. Res. 29, 492–524.Google Scholar

Feller, W. (1940). On the integro-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc. 48, 488–515.Google Scholar

Guo, X. P. (2007). Constrained optimization for average cost continuous-time Markov decision processes. IEEE Trans. Automatic Control 52, 1139–1143.Google Scholar

Guo, X. P. (2007). Continuous time Markov decision processes with discounted rewards: the case of Polish spaces. Math. Operat. Res. 32, 73–87.Google Scholar

Guo, X. P. and Hernández-Lerma, O. (2003). Continuous-time controlled Markov chains. Ann. Appl. Prob. 13, 363–388.Google Scholar

Guo, X. P. and Hernández-Lerma, O. (2003). Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion. IEEE Trans. Automatic Control 48, 236–245.Google Scholar

Guo, X. P. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Prob. 16, 730–756.CrossRef Google Scholar

Guo, X. P., Hernández-Lerma, O. and Prieto-Rumeau, T. (2006). A survey of resent results on continuous-time Markov decision processes. TOP 14, 177–261.CrossRef Google Scholar

Hernández-Lerma, O. and Govindan, T. E. (2001). Nonstationary continuous-time Markov control processes with discounted costs on infinite horizon. Acta Appl. Math. 67, 277–293.CrossRef Google Scholar

Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Springer, New York.Google Scholar

Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.Google Scholar

Kitaev, M. Y. and Rykov, V. V. (1995). Controlled Queueing Systems. CRC Press, Boca Raton, FL.Google Scholar

Lewis, M. E. and Puterman, M. L. (2000). A note on bias optimality in controlled queueing systems. J. Appl. Prob. 37, 300–305.Google Scholar

Lund, R. B., Meyn, S. P. and Tweedie, R. L. (1996). Computable exponential convergence rates for stochastically ordered Markov processes. Ann. Appl. Prob. 6, 218–237.Google Scholar

Prieto-Rumeau, T. and Hernández-Lerma, O. (2006). Bias optimality for continuous-time controlled Markov chains. SIAM J. Control Optimization 45, 51–73.Google Scholar

Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.Google Scholar

Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing Systems. John Wiley, New York.Google Scholar

Wang, Z. K. and Yang, X. Q. (1992). Birth and Death Processes and Markov Chains. Science Press, Beijing.Google Scholar

Ye, L., Guo, X. P. and Hernández-Lerma, O. (2008). Existence and regularity of a nonhomogeneous transition matrix under measurability conditions. J. Theoret. Prob. 21, 604–627.Google Scholar

Ye, L. E. and Guo, X. P. (2010). Construction and regularity of transition functions on Polish spaces under measurablity conditions. Submitted.Google Scholar

Zhu, Q. X. (2007). Average optimality inequality for continuous-time Markov decision processes in Polish spaces. Math. Meth. Operat. Res. 66, 299–313.Google Scholar

Zhu, Q. X. (2008). Average optimality for continuous-time Markov decision processes with a policy iteration approach. J. Math. Anal. Appl. 339, 691–704.Google Scholar

Article contents

New discount and average optimality conditions for continuous-time Markov decision processes

Abstract

Keywords

MSC classification

Information

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests