2 results
Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
- Part of
-
- Journal:
- Advances in Applied Probability / Volume 27 / Issue 4 / December 1995
- Published online by Cambridge University Press:
- 01 July 2016, pp. 1054-1078
- Print publication:
- December 1995
-
- Article
- Export citation
Minimizing the learning loss in adaptive control of Markov chains under the weak accessibility condition
-
- Journal:
- Journal of Applied Probability / Volume 28 / Issue 4 / December 1991
- Published online by Cambridge University Press:
- 14 July 2016, pp. 779-790
- Print publication:
- December 1991
-
- Article
- Export citation