Gradient approach for recursive estimation and control in finite Markov chains

Yousri M. El-Fattah

doi:10.2307/1426973

Gradient approach for recursive estimation and control in finite Markov chains

Published online by Cambridge University Press: 01 July 2016

Yousri M. El-Fattah

Show author details

Yousri M. El-Fattah*: Affiliation:
Faculté des Sciences, Rabat
*: ∗ Postal address: Laboratoire d'Electronique et d'sétude des Systèmes Automatiques, Faculté des Sciences, B.P. 1014, Rabat, Morocco.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The problem studied is that of controlling a finite Markov chain so as to maximize the long-run expected reward per unit time. The chain's transition probabilities depend upon an unknown parameter taking values in a subset [a, b ] of Rn . A control policy is defined as the probability of selecting a control action for each state of the chain. Derived is a Taylor-like expansion formula for the expected reward in terms of policy variations. Based on that result, a recursive stochastic gradient algorithm is presented for the adaptation of the control policy at consecutive times. The gradient depends on the estimated transition parameter which is also recursively updated using the gradient of the likelihood function. Convergence with probability 1 is proved for the control and estimation algorithms.

Keywords

CONTROLLED MARKOV CHAINS PARAMETER ESTIMATION RECURSIVE ALGORITHMS ADAPTATION LEARNING AUTOMATA

Information

Type: Research Article
Information: Advances in Applied Probability , Volume 13 , Issue 4 , December 1981 , pp. 778 - 803

DOI: https://doi.org/10.2307/1426973 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1981

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

[1] Bertsekas, D. P. (1976) Dynamic Programming and Stochastic Control. Academic Press, New York.Google Scholar

[2] Borkar, V. and Varaiya, P. (1980) Adaptive control of Markov chains. IEEE Trans. Automatic Control AC-24, 953–957.Google Scholar

[3] Cox, D. R. and Miller, H. D. (1965) The Theory of Stochastic Processes. Methuen, London.Google Scholar

[4] Denardo, E. V. (1973) A Markovian decision problem. In Mathematical Programming, ed. Hu, T. C. and Robinson, S. M., Academic Press, New York.Google Scholar

[5] Derman, C. (1970) Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar

[6] Doshi, B. and Shreve, S. E. (1980) Strong consistency of a modified maximum likelihood estimator for controlled Markov chains. J. Appl. Prob. 17, 726–734.CrossRef Google Scholar

[7] Durand, E. (1961) Solutions numériques des équations algébraiques, II. Masson, Paris.Google Scholar

[8] Flerov, Yu. A. (1972) Some classes of multi-input automata. J. Cybernetics 2, 112–122.CrossRef Google Scholar

[9] Howard, R. A. (1962) Dynamic Programming and Markov Processes. Wiley, New York.Google Scholar

[10] Lyubchik, L. M. and Poznyak, A. S. (1974) Learning automata in stochastic plant control problems. Automation and Remote Control 35, 777–789.Google Scholar

[11] Mandl, P. (1974) Estimation and control in Markov chains. Adv. Appl. Prob. 6, 40–60.CrossRef Google Scholar

[12] Nevelson, M. B. and Khasmin'skii, R. Z. (1972) Stochastic Approximation and Recursive Estimation. American Mathematical Society, Providence, RI.Google Scholar

[13] Polyak, B. T. and Tsypkin, Ya. Z. (1973) Pseudo-gradient adaptation and training algorithms. Automation and Remote Control 34, 377–397.Google Scholar

[14] Poznyak, A. S. (1973) Learning automata in stochastic programming problems. Automation and Remote Control 34, 1608–1619.Google Scholar

[15] Riordon, J. S. (1969) An adaptive automaton controller for discrete time Markov processes. Automatica 5, 721–730.CrossRef Google Scholar

[16] Robbins, H. and Siegmund, D. (1971) A convergence theorem for non-negative almost super martingales and some applications. In Optimization Methods in Statistics, ed. Rustagi, J. S., Academic Press, New York, 233–257.Google Scholar

[17] Ross, S. M. (1970) Applied Probability Models with Optimization Applications. Holden Day, San Francisco.Google Scholar

Article contents

Gradient approach for recursive estimation and control in finite Markov chains

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests