Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-01-13T13:03:29.517Z Has data issue: false hasContentIssue false

Impulsive Control for Continuous-Time Markov Decision Processes

Published online by Cambridge University Press:  04 January 2016

François Dufour*
Affiliation:
Université Bordeaux, IMB and INRIA Bordeaux Sud-Ouest
Alexei B. Piunovskiy*
Affiliation:
University of Liverpool
*
Postal address: INRIA Bordeaux Sud-Ouest, 200 Avenue de la Vieille Tour, 33405 Talence cedex, France. Email address: dufour@math.u-bordeaux1.fr
∗∗ Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool, L69 7ZL, UK. Email address: piunov@liverpool.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.

Type
General Applied Probability
Copyright
© Applied Probability Trust 

References

Bellman, R. (1957). Dynamic Programming. Princeton University Press.Google ScholarPubMed
Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control: The Discrete Time Case (Math. Sci. Eng. 139). Academic Press, New York.Google Scholar
Bourbaki, N. (1971). Éléments de Mathématique. Topologie Générale. Chapitres 1 à 4. Hermann, Paris.Google Scholar
Brémaud, P. (1981). Point Processes and Queues. Springer, New York.Google Scholar
Davis, M. H. A. (1993). Markov Models and Optimization (Monogr. Statist. Appl. Prob. 49). Chapman & Hall, London.Google Scholar
De Leve, G. (1964). Generalized Markovian Decision Processes. Part I: Model and Method (Math. Centre Tracts 3). Mathematisch Centrum, Amsterdam.Google Scholar
De Leve, G. (1964). Generalized Markovian Decision Processes. Part II: Probabilistic Background (Math. Centre Tracts 4). Mathematisch Centrum, Amsterdam.Google Scholar
Guo, X. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes. Theory and Applications (Stoch. Modelling Appl. Prob. 62). Springer, Berlin.CrossRefGoogle Scholar
Guo, X., Hernández-Lerma, O. and Prieto-Rumeau, T. (2006). A survey of recent results on continuous-time Markov decision processes. With comments and a rejoinder by the authors. Top 14, 177261.Google Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Basic Optimality Criteria (Appl. Math. (New York) 30). Springer, New York.Google Scholar
Hordijk, A. and van der Duyn Schouten, F. A. (1983). Average optimal policies in Markov decision drift processes with applications to a queueing and a replacement model. Adv. Appl. Prob. 15, 274303.Google Scholar
Hordijk, A. and van der Duyn Schouten, F. A. (1984). Discretization and weak convergence in Markov decision drift processes. Math. Operat. Res. 9, 112141.Google Scholar
Hordijk, A. and van der Duyn Schouten, F. (1985). Markov decision drift processes: conditions for optimality obtained by discretization. Math. Operat. Res. 10, 160173.Google Scholar
Howard, R. A. (1960). Dynamic Programming and Markov Processes. The Technology Press of M.I.T., Cambridge, MA.Google Scholar
Jacod, J. (1975). Multivariate point processes: predictable projection, Radon–Nikodým derivatives, representation of martingales. Z. Wahrscheinlichkeitsth. 31, 235253.Google Scholar
Jacod, J. (1979). Calcul Stochastique et Problèmes de Martingales (Lecture Notes Math. 714). Springer, Berlin.Google Scholar
Last, G. and Brandt, A. (1995). Marked Point Processes on the Real Line. The Dynamic Approach. Springer, New York.Google Scholar
Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Selected Topics on Continuous-Time Controlled Markov Chains and Markov Games (ICP Adv. Texts Math. 5). Imperial College Press, London.Google Scholar
Van der Duyn Schouten, F. A. (1983). Markov Decision Processes With Continuous Time Parameter (Math. Centre Tracts 164). Mathematisch Centrum, Amsterdam.Google Scholar
Yushkevich, A. A. (1983). Continuous time Markov decision processes with interventions. Stochastics 9, 235274.Google Scholar
Yushkevich, A. A. (1986). Markov decision processes with both continuous and impulsive control. In Stochastic Optimization (Kiev, 1984; Lecture Notes Control Inf. Sci. 81), Springer, Berlin, pp. 234246.Google Scholar
Yushkevich, A. A. (1987). Bellman inequalities in Markov decision deterministic drift processes. Stochastics 23, 2577.CrossRefGoogle Scholar
Yushkevich, A. A. (1989). Verification theorems for Markov decision processes with controllable deterministic drift and gradual and impulsive controls. Theory Prob. Appl. 34, 474496.CrossRefGoogle Scholar