Average optimality for Markov decision processes in borel spaces: a new condition and approach

Xianping Guo; Quanxin Zhu

doi:10.1239/jap/1152413725

Average optimality for Markov decision processes in borel spaces: a new condition and approach

Part of: Stochastic systems and control

Published online by Cambridge University Press: 14 July 2016

Xianping Guo and

Quanxin Zhu

Show author details

Xianping Guo*: Affiliation:
Zhongshan University
Quanxin Zhu*: Affiliation:
South China Normal University
*: ∗Postal address: School of Mathematics and Computational Science, Zhongshan University, Guangzhou, 510275, PR China. Email address: mcsgxp@mail.sysu.edu.cn
∗∗Postal address: Department of Mathematics, South China Normal University, Guangzhou, 510631, PR China. Email address: zqx1975@sina.com.cn

Article contents

Abstract
Footnotes
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

In this paper we study discrete-time Markov decision processes with Borel state and action spaces. The criterion is to minimize average expected costs, and the costs may have neither upper nor lower bounds. We first provide two average optimality inequalities of opposing directions and give conditions for the existence of solutions to them. Then, using the two inequalities, we ensure the existence of an average optimal (deterministic) stationary policy under additional continuity-compactness assumptions. Our conditions are slightly weaker than those in the previous literature. Also, some new sufficient conditions for the existence of an average optimal stationary policy are imposed on the primitive data of the model. Moreover, our approach is slightly different from the well-known ‘optimality inequality approach’ widely used in Markov decision processes. Finally, we illustrate our results in two examples.

Keywords

Discrete-time Markov decision process average expected criterion average optimality inequality optimal stationary policy

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes 93E20: Optimal stochastic control

Information

Type: Research Papers
Information: Journal of Applied Probability , Volume 43 , Issue 2 , June 2006 , pp. 318 - 334

DOI: https://doi.org/10.1239/jap/1152413725 [Opens in a new window]
Copyright: © Applied Probability Trust 2006

Footnotes

Partially supported by the NSFC, the NCET, and the RFDP.

References

Altman, E., Hordijk, A. and Spieksma, F. M. (1979). Contraction conditions for average and α-discount optimality in countable state Markov games with unbounded rewards. Math. Operat. Res. 22, 588–618.Google Scholar

Arapostathis, A. et al. (1993). Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optimization 31, 282–344.Google Scholar

Borkar, V. S. (1989). Control of Markov chains with long-run average cost criterion: the dynamic programming equations. SIAM J. Control Optimization 27, 642–657.Google Scholar

Cavazos-Cadena, R. and Fernández-Gaucherand, E. (1996). Denumerable controlled Markov chains with strong average optimality criterion: bounded and unbounded costs. Math. Meth. Operat. Res. 43, 281–300.CrossRef Google Scholar

Dekker, R. and Hordijk, A. (1988). Average, sensitive and Blackwell optimal policies in denumerable Markov decision chains with unbounded rewards. Math. Operat. Res. 13, 395–420.Google Scholar

Derman, C. (1970). Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar

Dynkin, E. B. and Yushkevich, A. A. (1979). Controlled Markov Processes. Springer, New York.Google Scholar

Gordienko, E. and Hernández-Lerma, O. (1995). Average cost Markov control processes with weighted norms: existence of canonical policies. Appl. Math. 23, 199–218.Google Scholar

Guo, X. P. and Shi, P. (2001). Limiting average criteria for nonstationary Markov decision processes. SIAM J. Optimization 11, 1037–1053.Google Scholar

Guo, X. P., Liu, J. Y. and Liu, K. (2000). Nonhomogeneous Markov decision processes with Borel state space—the average criterion with nonuniformly bounded rewards. Math. Operat. Res. 25, 667–678.Google Scholar

Guo, X. P., Shi, P. and Zhu, W. P. (2001). Strong average optimality for controlled nonhomogeneous Markov chains. Stoch. Anal. Appl. 19, 115–134.Google Scholar

Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer, New York.Google Scholar

Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Basic Optimality Criteria. Springer, New York.Google Scholar

Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.Google Scholar

Hordijk, A. and Yushkevich, A. A. (1999). Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards. Math. Meth. Operat. Res. 49, 1–39.Google Scholar

Hordijk, A. and Yushkevich, A. A. (1999). Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards. Math. Meth. Operat. Res. 50, 421–448.Google Scholar

Meyn, S. P. and Tweedie, R. L. (1994). Computable bounds for geometric convergence rates of Markov chains. Ann. Appl. Prob. 4, 981–1011.Google Scholar

Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.Google Scholar

Ritt, R. K. and Sennott, L. I. (1992). Optimal stationary policies in general state space Markov decision chains with finite action sets. Math. Operat. Res. 17, 901–909.Google Scholar

Robinson, D. R. (1976). Markov decision chains with unbounded costs and applications to the control of queues. Adv. Appl. Prob. 8, 159–176.Google Scholar

Rolski, T., Schmidli, H., Schmidli, V. and Teugels, J. (1998). Stochastic Processes for Insurance and Finance. John Wiley, Chichester.Google Scholar

Ross, S. M. (1968). Arbitrary state Markovian decision processes. Ann. Math. Statist. 39, 2118–2122.Google Scholar

Scott, D. J. and Tweedie, R. L. (1996). Explicit rates of convergence of stochastically ordered Markov chains. In Athens Conference on Applied Probability and Time Series, Vol. 1, Applied Probability (Lecture Notes Statist. 114), eds Heyde, C. C. et al., Springer, Berlin, pp. 176–191.Google Scholar

Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing Systems. John Wiley, New York.Google Scholar

Sennott, L. I. (2002). Average reward optimization theory for denumerable state spaces. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. Sci. 40), eds Feinberg, E. A. and Shwartz, A., Kluwer, Boston, MA, pp. 153–172.Google Scholar

Article contents

Average optimality for Markov decision processes in borel spaces: a new condition and approach

Abstract

Keywords

MSC classification

Information

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests