Hostname: page-component-78c5997874-dh8gc Total loading time: 0 Render date: 2024-11-13T06:02:29.686Z Has data issue: false hasContentIssue false

First Passage Optimality and Variance Minimisation of Markov Decision Processes with Varying Discount Factors

Published online by Cambridge University Press:  30 January 2018

Xiao Wu*
Affiliation:
Sun Yat-Sen University
Xianping Guo*
Affiliation:
Sun Yat-Sen University
*
Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, P. R. China.
Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, P. R. China.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This paper deals with the first passage optimality and variance minimisation problems of discrete-time Markov decision processes (MDPs) with varying discount factors and unbounded rewards/costs. First, under suitable conditions slightly weaker than those in the previous literature on the standard (infinite horizon) discounted MDPs, we establish the existence and characterisation of the first passage expected-optimal stationary policies. Second, to further distinguish the expected-optimal stationary policies, we introduce the variance minimisation problem, prove that it is equivalent to a new first passage optimality problem of MDPs, and, thus, show the existence of a variance-optimal policy that minimises the variance over the set of all first passage expected-optimal stationary policies. Finally, we use a computable example to illustrate our main results and also to show the difference between the first passage optimality here and the standard discount optimality of MDPs in the previous literature.

Type
Research Article
Copyright
© Applied Probability Trust 

References

Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications in Finance. Springer, Heidelberg.CrossRefGoogle Scholar
Derman, C. (1970). Finite State Markovian Decision Processes (Math. Sci. Eng. 67). Academic Press, New York.Google Scholar
Feinberg, E. A. and Shwartz, A. (1994). Markov decision models with weighted discounted criteria. Math. Operat. Res. 19, 152168.CrossRefGoogle Scholar
González-Hernández, J., López-Martı´nez, R. R. and Minjárez-Sosa, J. A. (2008). Adaptive policies for stochastic systems under a randomized discounted cost criterion. Bol. Soc. Mat. Mexicana (3) 14, 149163.Google Scholar
Guo, X. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes. Springer, Berlin.CrossRefGoogle Scholar
Guo, X. and Song, X. (2009). Mean-variance criteria for finite continuous-time Markov decision processes. IEEE Trans. Automatic Control 54, 21512157.Google Scholar
Guo, X., Hernández-del-Valle, A. and Hernández-Lerma, O. (2012). First passage problems for nonstationary discrete-time stochastic control systems. Europ. J. Control 18, 528538.Google Scholar
Guo, X., Ye, L. and Yin, G. (2012). A mean-variance optimization problem for discounted Markov decision processes. Europ. J. Operat. Res. 220, 423429.Google Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Springer, New York.Google Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.Google Scholar
Hernández-Lerma, O., Vega-Amaya, O. and Carrasco, G. (1999). Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optimization 38, 7993.Google Scholar
Hordijk, A. and Yushkevich, A. A. (1999). Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards. Math. Meth. Operat. Res. 50, 421448.Google Scholar
Huang, Y. and Guo, X. (2009). Optimal risk probability for first passage models in semi- Markov decision processes. J. Math. Anal. Appl. 359, 404420.Google Scholar
Huang, Y.-H. and Guo, X.-P. (2011). First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta. Math. Appl. Sinica (English Ser.) 27, 177190.Google Scholar
Kurano, M. (1987). Markov decision processes with a minimum-variance criterion. J. Math. Anal. Appl. 123, 572583.Google Scholar
Liu, J. and Huang, S. (2001). Markov decision processes with distribution function criterion of first-passage time. Appl. Math. Optimization 43, 187201.Google Scholar
Liu, J. Y. and Liu, K. (1992). Markov decision programming—the first passage model with denumerable state space. Systems Sci. Math. Sci. 5, 340351.Google Scholar
Mamabolo, R. M. and Beichelt, F. E. (2004). Maintenance policies with minimal repair. Econ. Qual. Control 19, 143166.Google Scholar
Prieto-Rumeau, T. and Hernández-Lerma, O. (2009). Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains. Math. Meth. Operat. Res. 70, 527540.CrossRefGoogle Scholar
Puterman, M. L. (1994). Markov Decision Processes. John Wiley, New York.Google Scholar
Schäl, M. (2005). Control of ruin probabilities by discrete-time investments. Math. Meth. Operat. Res. 62, 141158.Google Scholar
Sobel, M. J. (1982). The variance of discounted Markov decision processes. J. Appl. Prob. 19, 794802.Google Scholar
Wei, Q. and Guo, X. (2011). Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Operat. Res. Lett. 39, 369374.Google Scholar
Yu, S. X., Lin, Y. and Yan, P. (1998). Optimization models for the first arrival target distribution function in discrete time. J. Math. Analysis Appl. 225, 193223.Google Scholar