Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-14T05:48:21.276Z Has data issue: false hasContentIssue false

Denumerable-state continuous-time Markov decision processes with unbounded transition and reward rates under the discounted criterion

Published online by Cambridge University Press:  14 July 2016

Xianping Guo*
Affiliation:
Zhongshan University
Weiping Zhu*
Affiliation:
University of New South Wales
*
Postal address: Department of Mathematics, Zhongshan University, Guangzhou 510275, P. R. China.
∗∗ Postal address: School of Computer Science, ADFA, University of New South Wales, ACT 2600, Australia. Email address: weiping@cs.adfa.edu.au

Abstract

In this paper, we consider denumerable-state continuous-time Markov decision processes with (possibly unbounded) transition and reward rates and general action space under the discounted criterion. We provide a set of conditions weaker than those previously known and then prove the existence of optimal stationary policies within the class of all possibly randomized Markov policies. Moreover, the results in this paper are illustrated by considering the birth-and-death processes with controlled immigration in which the conditions in this paper are satisfied, whereas the earlier conditions fail to hold.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 2002 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Anderson, W. J. (1991). Continuous-Time Markov Chains. Springer, New York.Google Scholar
[2] Artémiadis, N. K. (1976). Real Analysis. Southern Illinois University Press, Carbondale, IL.Google Scholar
[3] Bather, J. (1976). Optimal stationary policies for denumerable Markov chains in continuous time. Adv. Appl. Prob. 8, 144158.CrossRefGoogle Scholar
[4] Bertsekas, D. P. (1987). Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
[5] Cavazos-Cadena, R. and Fernández-Gaucherand, E. (1996). Value iteration in a class of average controlled Markov chains with unbounded costs: necessary and sufficient conditions for pointwise convergence. J. Appl. Prob. 33, 9861002.CrossRefGoogle Scholar
[6] Feller, W. (1940). On the integro-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc. 48, 488555.Google Scholar
[7] Guo, X. P., and Zhu, W. (2002). Optimality conditions for CTMDP with average cost criterion. In Markov Processes and Controlled Markov Chains, eds Hou, Z. T., Filar, J. A. and Chen, A. Y., Kluwer, Dordrecht.Google Scholar
[8] Guo, X. P., and Zhu, W. (2002). Denumerable state continuous time Markov decision processes with unbounded cost and transition rate under average criterion. Austral. N. Z. Indust. Appl. Math. J. 43, 541557.Google Scholar
[9] Haviv, M., and Puterman, M. L. (1998). Bias optimality in controlled queueing systems. J. Appl. Prob. 35, 136150.Google Scholar
[10] Hou, Z. T., and Guo, X. P. (1998). Markov Decision Processes. Science and Technology Press of Hunan, Changsha.Google Scholar
[11] Howard, R. A. (1960). Dynamic Programming and Markov Processes. John Wiley, New York.Google Scholar
[12] Kakumanu, P. (1971). Continuously discounted Markov decision model with countable state and action spaces. Ann. Math. Statist. 42, 919926.Google Scholar
[13] Lembersky, M. R. (1974). On maximal rewards and ∊-optimal policies in continuous time Markov chains. Ann. Statist. 2, 159169.Google Scholar
[14] Lippman, S. A. (1973). Semi-Markov decision processes with unbounded rewards. Manag. Sci. 19, 717731.Google Scholar
[15] Lippman, S. A. (1975). Applying a new device in the optimization of exponential queueing system. Operat. Res. 23, 667710.Google Scholar
[16] Lippman, S. A. (1975). On dynamic programming with unbounded rewards. Manag. Sci. 21, 12251233.Google Scholar
[17] Miller, R. L. (1968). Finite state continuous time Markov decision processes with an infinite planning horizon. J. Math. Anal. Appl. 22, 552569.Google Scholar
[18] Puterman, M. L. (1994). Markov Decision Processes. John Wiley, New York.Google Scholar
[19] Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing Systems. John Wiley, New York.Google Scholar
[20] Serfozo, R. (1981). Optimal control of random walks, birth and death processes, and queues. Adv. Appl. Prob. 13, 6183.Google Scholar
[21] Song, J. S. (1987). Continuous time Markov decision programming with nonuniformly bounded transition rate. Scientia Sinica 12, 12581267.Google Scholar
[22] Van Nunen, J. A. E. E., and Wessels, J. (1978). A note on dynamic programming with unbounded rewards. Manag. Sci. 24, 576580.Google Scholar
[23] Wu, C. B. (1997). Continuous time Markov decision processes with unbounded reward and non-uniformly bounded transition rate under discounted criterion. Acta Math. Appl. Sinica 20, 196208.Google Scholar
[24] Yushkevich, A. A. (1977). Controlled Markov models with countable state space and continuous time. Theory Prob. Appl. 22, 215235.Google Scholar
[25] Yushkevich, A. A., and Feinberg, E. A. (1979). On homogeneous Markov model with continuous time and finite or countable state space. Theory Prob. Appl. 24, 156161.Google Scholar