Hostname: page-component-78c5997874-j824f Total loading time: 0 Render date: 2024-11-10T13:48:55.576Z Has data issue: false hasContentIssue false

BANDIT STRATEGIES EVALUATED IN THE CONTEXT OF CLINICAL TRIALS IN RARE LIFE-THREATENING DISEASES

Published online by Cambridge University Press:  07 June 2017

Sofía S. Villar*
Affiliation:
MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge Institute of Public Health University Forvie Site, Robinson Way, Cambridge CB2 0SR, UK. E-mail: sofia.villar@mrc-bsu.cam.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

In a rare life-threatening disease setting the number of patients in the trial is a high proportion of all patients with the condition (if not all of them). Further, this number is usually not enough to guarantee the required statistical power to detect a treatment effect of a meaningful size. In such a context, the idea of prioritizing patient benefit over hypothesis testing as the goal of the trial can lead to a trial design that produces useful information to guide treatment, even if it does not do so with the standard levels of statistical confidence. The idealized model to consider such an optimal design of a clinical trial is known as a classic multi-armed bandit problem with a finite patient horizon and a patient benefit objective function. Such a design maximizes patient benefit by balancing the learning and earning goals as data accumulates and given the patient horizon. On the other hand, optimally solving such a model has a very high computational cost (many times prohibitive) and more importantly, a cumbersome implementation, even for populations as small as a hundred patients. Several computationally feasible heuristic rules to address this problem have been proposed over the last 40 years in the literature. In this paper, we study a novel heuristic approach to solve it based on the reformulation of the problem as a Restless bandit problem and the derivation of its corresponding Whittle Index (WI) rule. Such rule was recently proposed in the context of a clinical trial in Villar, Bowden, and Wason [16]. We perform extensive computational studies to compare through both exact value calculations and simulated values the performance of this rule, other index rules and simpler heuristics previously proposed in the literature. Our results suggest that for the two and three-armed case and a patient horizon less or equal than a hundred patients, all index rules are a priori practically identical in terms of the expected proportion of success attained when all arms start with a uniform prior. However, we find that a posteriori, for specific values of the parameters of interest, the index policies outperform the simpler rules in every instance and specially so in the case of many arms and a larger, though still relatively small, total number of patients with the diseases. The very good performance of bandit rules in terms of patient benefit (i.e., expected number of successes and mean number of patients allocated to the best arm, if it exists) makes them very appealing in context of the challenge posed by drug development and treatment for rare life-threatening diseases.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © Cambridge University Press 2017

References

1. Bellman, R. (1956). A problem in the sequential design of experiments. Sankhyā: The Indian Journal of Statistics (1933–1960) 16(3/4): 221229.Google Scholar
2. Berry, D.A. (1978) Modified two-armed bandit strategies for certain clinical trials, Journal of the American Statistical Association 73(362): 339345. Taylor & Francis Group.Google Scholar
3. Berry, D.A. & Fristedt, B. (1985). Bandit problems: sequential allocation of experiments. Monographs on Statistics and Applied Probability Series. London, UK: Chapman & Hall.Google Scholar
4. Cheng, Y. & Berry, D.A. (2007). Optimal adaptive randomized designs for clinical trials. Biometrika, 94(3): 673689.Google Scholar
5. Cheng, Y., Su, F. & Berry, D.A. (2003) Choosing sample size for a clinical trial using decision analysis Biometrika 90(4): 923936. Biometrika Trust.Google Scholar
6. Feldman, D. (1962). Contributions to the “two-armed bandit” problem. The Annals of Mathematical Statistics 33(3): 847856.Google Scholar
7. Gittins, J., Glazebrook, K. & Weber, R. (2011). Multi-armed bandit allocation indices. Chichester, UK: Wiley.CrossRefGoogle Scholar
8. Gittins, J.C. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society Series B 41(2): 148177. with discussion.Google Scholar
9. Gittins, J.C. & Jones, D.M. (1974). A dynamic allocation index for the sequential design of experiments. In Gani, J., Sarkadi, K., & Vincze, I. (eds.), Progress in statistics (European meeting of statisticians, Budapest, 1972). Amsterdam, The Netherlands: North-Holland, pp. 241266.Google Scholar
10. Gittins, J.C. & Jones, D.M. (1979). A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika 66(3): 561565.Google Scholar
11. Katehakis, M. & Veinott, A. Jr. (1985). The multi-armed bandit problem: decomposition and computation. department of oper. res. Technical Report, Stanford University.Google Scholar
12. Katehakis, M.N. & Derman, C. (1986). Computing optimal sequential allocation rules in clinical trials. Lecture Notes-Monograph Series, pp. 2939.Google Scholar
13. Niño-Mora, J. (2001). Restless bandits, partial conservation laws and indexability. Advances in Applied Probability 33(1): 7698.Google Scholar
14. Nino-Mora, J. (2005). A marginal productivity index policy for the finite-horizon multiarmed bandit problem. In 44th IEEE Conference on Decision and Control, 2005 and 2005 European Control Conference. CDC-ECC’05, pages 17181722. IEEE.Google Scholar
15. Niño-Mora, J. (2011). Computing a classic index for finite-horizon bandits. INFORMS Journal on Computing 23(2): 254267.Google Scholar
16. Villar, S., Bowden, J. & Wason, J. (2015). Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Statistical Science 30(2), 199215.Google Scholar
17. Villar, S., Wason, J. & Bowden, J. (2015). The forward looking Gittins index: A novel bandit approach to adaptive randomization in multi-arm clinical trials. Biometrics 71(4): 969978.Google Scholar
18. Wang, L. & Arnold, K. (2002). Press release: Cancer specialists in disagreement about purpose of clinical trials. Journal of the National Cancer Institute 94(24): 1819. http://jnci.oxfordjournals.org/content/94/24/1819.2.short Google Scholar
19. Weber, R.R. & Weiss, G. (1990). On an index policy for restless bandits. Journal of Applied Probability 27: 637648.Google Scholar
20. Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability 25: 287298.Google Scholar