Book contents
- Frontmatter
- Contents
- Preface
- 1 An Introduction to Computer-intensive Methods
- 2 Maximum Likelihood
- 3 The Jackknife
- 4 The Bootstrap
- 5 Randomization and Monte Carlo Methods
- 6 Regression Methods
- 7 Bayesian Methods
- References
- Appendix A An Overview of S-PLUS Methods Used in this Book
- Appendix B Brief Description of S-PLUS Subroutines Used in this Book
- Appendix C S-PLUS Codes Cited in Text
- Appendix D Solutions to Exercises
- Index
- References
4 - The Bootstrap
Published online by Cambridge University Press: 09 December 2009
- Frontmatter
- Contents
- Preface
- 1 An Introduction to Computer-intensive Methods
- 2 Maximum Likelihood
- 3 The Jackknife
- 4 The Bootstrap
- 5 Randomization and Monte Carlo Methods
- 6 Regression Methods
- 7 Bayesian Methods
- References
- Appendix A An Overview of S-PLUS Methods Used in this Book
- Appendix B Brief Description of S-PLUS Subroutines Used in this Book
- Appendix C S-PLUS Codes Cited in Text
- Appendix D Solutions to Exercises
- Index
- References
Summary
Introduction
In Chapter 3, I introduced the idea of using an observed distribution as a descriptor of a hypothetical distribution in order to test the efficacy of a statistical method. The bootstrap method takes a similar approach, in that it attempts to generate point estimates and confidence limits by taking random samples from the observed distribution. The underlying rationale behind the method is that the observed distribution is itself an adequate descriptor of the true distribution. The term bootstrap was given by Efron (1979) and derives from the saying “to pull oneself up by one's bootstraps,” which refers to accomplishing something seemingly impossible by one's own efforts (supposedly from the book Adventures of Baron Munchausen by Rudolph Erich Raspe, though I have not been able to find the incident in my copy).
Suppose that our observed data consisted of a huge number of observations: in this case, it is clear that sampling from this distribution is equivalent to sampling from the original distribution. Herein lies the rub – if the sample is not huge then the observed distribution might be a poor descriptor. This is particularly true if the statistic to be estimated is very sensitive to outliers and the underlying distribution is skewed. The hope by many that a sample as small as 20 observations is an adequate representation of the underlying distribution is probably folly in the extreme (I have seen bootstrapping on samples as small as five, which is getting somewhat absurd).
- Type
- Chapter
- Information
- Publisher: Cambridge University PressPrint publication year: 2006