Hostname: page-component-78c5997874-m6dg7 Total loading time: 0 Render date: 2024-11-15T00:15:44.719Z Has data issue: false hasContentIssue false

Finite-horizon optimality for continuous-time Markov decision processes with unbounded transition rates

Published online by Cambridge University Press:  21 March 2016

Xianping Guo*
Affiliation:
Sun Yat-Sen University
Xiangxiang Huang*
Affiliation:
Sun Yat-Sen University
Yonghui Huang*
Affiliation:
Sun Yat-Sen University
*
Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, 510275, P. R. China.
Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, 510275, P. R. China.
Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, 510275, P. R. China.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

In this paper we focus on the finite-horizon optimality for denumerable continuous-time Markov decision processes, in which the transition and reward/cost rates are allowed to be unbounded, and the optimality is over the class of all randomized history-dependent policies. Under mild reasonable conditions, we first establish the existence of a solution to the finite-horizon optimality equation by designing a technique of approximations from the bounded transition rates to unbounded ones. Then we prove the existence of ε (≥ 0)-optimal Markov policies and verify that the value function is the unique solution to the optimality equation by establishing the analog of the Itô-Dynkin formula. Finally, we provide an example in which the transition rates and the value function are all unbounded and, thus, obtain solutions to some of the unsolved problems by Yushkevich (1978).

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 2015 

References

Baüerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer, Heidelberg.Google Scholar
Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control. The Discrete Time Case. Academic Press, New York.Google Scholar
Feinberg, E. A. (2004). Continuous time discounted jump Markov decision processes: a discrete-event approach. Math. Operat. Res. 29, 492524.CrossRefGoogle Scholar
Feinberg, E. A. (2012). Reduction of discounted continuous-time MDPs with unbounded jump and reward rates to discrete-time total-reward MDPs. In Optimization, Control, and Applications of Stochastic Systems. Birkhäuser, New York, pp. 7797.CrossRefGoogle Scholar
Feinberg, E. A., Mandava, M. and Shiryaev, A. N. (2014). On solutions of Kolmogorov's equations for nonhomogeneous jump Markov processes. J. Math. Anal. Appl. 411, 261270.Google Scholar
Ghosh, M. K. and Saha, S. (2012). Continuous-time controlled jump Markov processes on the finite horizon. In Optimization, Control, and Applications of Stochastic Systems. Birkhäuser, New York, pp. 99109.CrossRefGoogle Scholar
Guo, X. (2007). Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math. Operat. Res. 32, 7387.Google Scholar
Guo, X. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes. Springer, Berlin.Google Scholar
Guo, X. and Piunovskiy, A. (2011). Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math. Operat. Res. 36, 105132.Google Scholar
Guo, X. and Ye, L. (2010). New discount and average optimality conditions for continuous-time Markov decision processes. Adv. Appl. Prob. 42, 953985.Google Scholar
Guo, X., Hernández-Lerma, O. and Prieto-Rumeau, T. (2006). A survey of recent results on continuous-time Markov decision processes. Top 14, 177261.Google Scholar
Guo, X., Huang, Y. and Song, X. (2012). Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent policies. SIAM J. Control Optimization 50, 2347.Google Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Basic Optimality Criteria. Springer, New York.Google Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.CrossRefGoogle Scholar
Jacod, J. (1975). Multivariate point processes: predictable projection, Radon–Nikodým derivatives, representation of martingales. Z. Wahrscheinlichkeitsth. 31, 235253.Google Scholar
Kakumanu, P. (1971). Continuously discounted Markov decision model with countable state and action space. Ann. Math. Statist. 42, 919926.CrossRefGoogle Scholar
Kakumanu, P. (1975). Continuous time Markovian decision processes average return criterion. J. Math. Anal. Appl. 52, 173188.Google Scholar
Kitaev, M. Y. and Rykov, V. V. (1995). Controlled Queueing Systems. CRC Press, Boca Raton, FL.Google Scholar
Miller, B. L. (1968). Finite state continuous time Markov decision processes with a finite planning horizon. SIAM. J. Control 6, 266280.CrossRefGoogle Scholar
Piunovskiy, A. and Zhang, Y. (2011). Accuracy of fluid approximations to controlled birth-and-death processes: absorbing case. Math. Meth. Operat. Res. 73, 159187.Google Scholar
Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optimization 49, 20322061.Google Scholar
Pliska, S. R. (1975). Controlled jump processes. Stoch. Process. Appl. 3, 259282.Google Scholar
Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Discounted continuous-time controlled Markov chains: convergence of control models. J. Appl. Prob. 49, 10721090.Google Scholar
Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Selected Topics on Continuous-Time Controlled Markov Chains and Markov Games. Imperial College Press, London.Google Scholar
Prieto-Rumeau, T. and Lorenzo, J. M. (2010). Approximating ergodic average reward continuous-time controlled Markov chains. IEEE Trans. Automatic Control 55, 201207.Google Scholar
Ye, L. and Guo, X. (2012). Continuous-time Markov decision processes with state-dependent discount factors. Acta Appl. Math. 121, 527.CrossRefGoogle Scholar
Yushkevich, A. A. (1978). Controlled Markov models with countable state space and continuous time. Theory Prob. Appl. 22, 215235.CrossRefGoogle Scholar