Hostname: page-component-cd9895bd7-lnqnp Total loading time: 0 Render date: 2024-12-26T05:26:27.645Z Has data issue: false hasContentIssue false

Open bandit processes and optimal scheduling of queueing networks

Published online by Cambridge University Press:  01 July 2016

Tze Leung Lai*
Affiliation:
Stanford University
Zhiliang Ying*
Affiliation:
Columbia University
*
Postal address: Department of Statistics, Stanford University, Stanford, CA 94305, USA.
∗∗ Postal address: Department of Statistics, Box 10 Mathematics, Columbia University, New York, NY 10027, USA.

Abstract

Asymptotic approximations are developed herein for the optimal policies in discounted multi-armed bandit problems in which new projects are continually appearing, commonly known as ‘open bandit problems’ or ‘arm-acquiring bandits’. It is shown that under certain stability assumptions the open bandit problem is asymptotically equivalent to a closed bandit problem in which there is no arrival of new projects, as the discount factor approaches 1. Applications of these results to optimal scheduling of queueing networks are given. In particular, Klimov&s priority indices for scheduling queueing networks are shown to be limits of the Gittins indices for the associated closed bandit problem, and extensions of Klimov&s results to preemptive policies and to unstable queueing systems are given.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1988 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Research supported by the National Science Foundation and the Army Research Office.

References

[1] Cox, D. R. and Smith, W. L. (1961) Queues, Methuen, London.Google Scholar
[2] Derman, C. (1970) Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar
[3] Gittins, J. C. (1979) Bandit processes and dynamic allocation indices. J. R. Statist. Soc. B 41, 148177.Google Scholar
[4] Gittins, J. C. and Glazebrook, K. D. (1977) On Bayesian models in stochastic scheduling. J. Appl. Prob. 14, 556565.Google Scholar
[5] Gittins, J. C. and Jones, D. M. (1972) A dynamic allocation index for the sequential design of experiments. Paper read at the European Meeting of Statisticians, Budapest. In Progress in Statistics (ed. Gani, J. et al., North-Holland, Amsterdam, 1974) 241266.Google Scholar
[6] Klimov, G. P. (1974) Time-sharing service systems I. Theory Prob. Appl. 19, 532551.CrossRefGoogle Scholar
[7] Klimov, G. P. (1978) Time-sharing service systems II. Theory Prob. Appl. 23, 314321.Google Scholar
[8] Mandelbaum, A. (1986) Discrete multi-armed bandits and multi-parameter processes. Prob. Theory Rel. Fields 71, 129147.Google Scholar
[9] Nash, P. (1973) Optimal Allocation of Resources between Research Projects. Ph.D. Thesis, Cambridge University.Google Scholar
[10] Rao, C. R. (1973) Linear Statistical Inference and Its Applications. Wiley, New York.Google Scholar
[11] Tcha, D. and Pliska, S. R. (1977) Optimal control of single-server queueing networks and multi-class M/G/1 queues with feedback. Operat. Res. 25, 248258.Google Scholar
[12] Varaiya, P., Walrand, J. C. and Buyukkoc, C. (1985) Extensions of the multiarmed bandit problem: the discounted case. IEEE Trans. Autom. Contr 30, 426439.Google Scholar
[13] Whittle, P. (1980) Multi-armed bandits and the Gittins index. J. R. Statist. Soc. B 42, 143149.Google Scholar
[14] Whittle, P. (1981) Arm-acquiring bandits. Ann. Prob. 9, 284292.Google Scholar
[15] Whittle, P. (1982) Optimization Over Time: Dynamic Programming and Stochastic Control, Vol. 1, Wiley, New York.Google Scholar