No CrossRef data available.
Article contents
Some reward–penalty rules for the multi-armed bandit problem which are asymptotically optimal
Published online by Cambridge University Press: 01 July 2016
Abstract
Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.
In the mathematical learning literature, reward–penalty rules have been studied in various decision-theoretic and game-theoretic contexts, the multi-armed bandit problem included. Here we propose an elaboration of Bather's randomised allocation indices which yields rules for the multi-armed bandit which are both reward-penalty and asymptotically optimal.
Keywords
- Type
- Letters to the Editor
- Information
- Copyright
- Copyright © Applied Probability Trust 1983
References
Bather, J. (1980) Randomised allocation of treatments in sequential trials. Adv. Appl. Prob.
12, 174–182.Google Scholar
Bather, J. (1981) Randomised allocation of treatments in sequential experiments (with discussion). J. R. Statist. Soc., B43, 265–292.Google Scholar
Glazebrook, K. D. (1980) On randomized dynamic allocation indices for the sequential design of experiments. J. R. Statist. Soc., B42, 342–346.Google Scholar
Meybodi, M. R. and Lackshmivarahan, S. (1982) e-optimality of a general class of learning algorithms. In Proc. Conf. Mathematical Learning Models–Theory and Applications.
To appear.Google Scholar
You have
Access