Copyright © 2011 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Chernick, Michael R.
An introduction to bootstrap methods with applications to R / Michael R. Chernick, Robert A. LaBudde.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-470-46704-6 (hardback)
1. Bootstrap (Statistics) 2. R (Computer program language) I. LaBudde, Robert A., 1947– II. Title.
QA276.8.C478 2011
519.5’4–dc22
2011010972
CONTENTS
PREFACE
ACKNOWLEDGMENTS
LIST OF TABLES
1 INTRODUCTION
1.1 HISTORICAL BACKGROUND
1.2 DEFINITION AND RELATIONSHIP TO THE DELTA METHOD AND OTHER RESAMPLING METHODS
1.3 WIDE RANGE OF APPLICATIONS
1.4 THE BOOTSTRAP AND THE R LANGUAGE SYSTEM
1.5 HISTORICAL NOTES
1.6 EXERCISES
REFERENCES
2 ESTIMATION
2.1 ESTIMATING BIAS
2.2 ESTIMATING LOCATION
2.3 ESTIMATING DISPERSION
2.4 LINEAR REGRESSION
2.5 NONLINEAR REGRESSION
2.6 NONPARAMETRIC REGRESSION
2.7 HISTORICAL NOTES
2.8 EXERCISES
REFERENCES
3 CONFIDENCE INTERVALS
3.1 SUBSAMPLING, TYPICAL VALUE THEOREM, AND EFRON’S PERCENTILE METHOD
3.2 BOOTSTRAP-T
3.3 ITERATED BOOTSTRAP
3.4 BIAS-CORRECTED (BC) BOOTSTRAP
3.5 BCA AND ABC
3.6 TILTED BOOTSTRAP
3.7 VARIANCE ESTIMATION WITH SMALL SAMPLE SIZES
3.8 HISTORICAL NOTES
3.9 EXERCISES
REFERENCES
4 HYPOTHESIS TESTING
4.1 Relationship to Confidence Intervals
4.2 WHY TEST HYPOTHESES DIFFERENTLY?
4.3 TENDRIL DX EXAMPLE
4.4 KLINGENBERG EXAMPLE: BINARY DOSE–RESPONSE
4.5 HISTORICAL NOTES
4.6 EXERCISES
REFERENCES
5 TIME SERIES
5.1 FORECASTING METHODS
5.2 TIME DOMAIN MODELS
5.3 CAN BOOTSTRAPPING IMPROVE PREDICTION INTERVALS?
5.4 MODEL-BASED METHODS
5.5 BLOCK BOOTSTRAPPING FOR STATIONARY TIME SERIES
5.6 DEPENDENT WILD BOOTSTRAP (DWB)
5.7 FREQUENCY-BASED APPROACHES FOR STATIONARY TIME SERIES
5.8 SIEVE BOOTSTRAP
5.9 HISTORICAL NOTES
5.10 EXERCISES
REFERENCES
6 BOOTSTRAP VARIANTS
6.1 BAYESIAN BOOTSTRAP
6.2 SMOOTHED BOOTSTRAP
6.3 PARAMETRIC BOOTSTRAP
6.4 DOUBLE BOOTSTRAP
6.5 THE M-OUT-OF-N BOOTSTRAP
6.6 THE WILD BOOTSTRAP
6.7 HISTORICAL NOTES
6.8 EXERCISES
REFERENCES
7 CHAPTER SPECIAL TOPICS
7.1 SPATIAL DATA
7.2 SUBSET SELECTION IN REGRESSION
7.3 DETERMINING THE NUMBER OF DISTRIBUTIONS IN A MIXTURE
7.4 CENSORED DATA
7.5 P-VALUE ADJUSTMENT
7.6 BIOEQUIVALENCE
7.7 PROCESS CAPABILITY INDICES
7.8 MISSING DATA
7.9 POINT PROCESSES
7.10 BOOTSTRAP TO DETECT OUTLIERS
7.11 LATTICE VARIABLES
7.12 COVARIATE ADJUSTMENT OF AREA UNDER THE CURVE ESTIMATES FOR RECEIVER OPERATING CHARACTERISTIC (ROC) CURVES
7.13 BOOTSTRAPPING IN SAS
7.14 HISTORICAL NOTES
7.15 EXERCISES
REFERENCES
8 WHEN THE BOOTSTRAP IS INCONSISTENT AND How TO REMEDY IT
8.1 TOO SMALL OF A SAMPLE SIZE
8.2 DISTRIBUTIONS WITH INFINITE SECOND MOMENTS
8.3 ESTIMATING EXTREME VALUES
8.4 SURVEY SAMPLING
8.5 M-DEPENDENT SEQUENCES
8.6 UNSTABLE AUTOREGRESSIVE PROCESSES
8.7 LONG-RANGE DEPENDENCE
8.8 BOOTSTRAP DIAGNOSTICS
8.9 HISTORICAL NOTES
8.10 EXERCISES
REFERENCES
AUTHOR INDEX
SUBJECT INDEX
PREFACE
The term “bootstrapping” refers to the concept of “pulling oneself up by one's bootstraps,” a phrase apparently first used in The Singular Travels, Campaigns and Adventures of Baron Munchausen by Rudolph Erich Raspe in 1786. The derivative of the same term is used in a similar manner to describe the process of “booting” a computer by a sequence of software increments loaded into memory at power-up.
In statistics, “bootstrapping” refers to making inferences about a sampling distribution of a statistic by “resampling” the sample itself with replacement, as if it were a finite population. To the degree that the resampling distribution mimics the original sampling distribution, the inferences are accurate. The accuracy improves as the size of the original sample increases, if the central limit theorem applies.
“Resampling” as a concept was first used by R. A. Fisher (1935) in his famous randomization test, and by E. J. G. Pitman (1937, 1938), although in these cases the sampling was done without replacement.
The “bootstrap” as sampling with replacement and its Monte Carlo approximate form was first presented in a Stanford University technical report by Brad Efron in 1977. This report led to his famous paper in the Annals of Statistics in 1979. However, the Monte Carlo approximation may be much older. In fact, it is known that Julian Simon at the University of Maryland proposed the Monte Carlo approximation as an educational tool for teaching probability and statistics. In the 1980s, Simon and Bruce started a company called Resampling Stats that produced a software product to do bootstrap and permutation sampling for both educational and inference purposes.
But it was not until Efron's paper that related the bootstrap to the jackknife and other resampling plans that the statistical community got involved. Over the next 20 years, the theory and applications of the bootstrap blossomed, and the Monte Carlo approximation to the bootstrap became a very practiced approach to making statistical inference without strong parametric assumptions.
Michael Chernick was a graduate student in statistics at the time of Efron's early research and saw the development of bootstrap methods from its very beginning. However, Chernick did not get seriously involved into bootstrap research until 1984 when he started to find practical applications in nonlinear regression models and classification problems while employed at the Aerospace Corporation.
After meeting Philip Good in the mid-1980s, Chernick and Good set out to accumulate an extensive bibliography on resampling methods and planned a two-volume text with Chernick to write the bootstrap methods volume and Good the volume on permutation tests. The project that was contracted by Oxford University Press was eventually abandoned. Good eventually published his work with Springer-Verlag, and later, Chernick separately published his work with Wiley. Chernick and Good later taught together short courses on resampling methods, first at UC Irvine and later the Joint Statistical Meetings in Indianapolis in 2000. Since that time, Chernick has taught resampling methods with Peter Bruce and later bootstrap methods for statistics.com.
Robert LaBudde wrote his PhD dissertation in theoretical chemistry on the application of Monte Carlo methods to simulating elementary chemical reactions. A long time later, he took courses in resampling methods and bootstrap methods from statistics.com, and it was in this setting in Chernick's bootstrap methods course that the two met. In later sessions, LaBudde was Chernick's teaching assistant and collaborator in the course and provided exercises in the R programming language. This course was taught using the second edition of Chernick's bootstrap text. However, there were several deficiencies with the use of this text including lack of homework problems and software to do the applications. The level of knowledge required was also variable. This text is intended as a text for an elementary course in bootstrap methods including Chernick's statistics.com course.
This book is organized in a similar way as Bootstrap Methods: A Guide for Practitioners and Researchers. Chapter 1 provides an introduction with some historical background, a formal description of the bootstrap and its relationship to other resampling methods, and an overview of the wide variety of applications of the approach. An introduction to R programming is also included to prepare the student for the exercises and applications that require programming using this software system. Chapter 2 covers point estimation, Chapter 3 confidence intervals, and Chapter 4 hypothesis testing. More advanced topics begin with time series in Chapter 5. Chapter 6 covers some of the more important variants of the bootstrap. Chapter 7 covers special topics including spatial data analysis, P-value adjustment in multiple testing, censored data, subset selection in regression models, process capability indices, and some new material on bioequivalence and covariate adjustment to area under the curve for receiver operating characteristics for diagnostic tests. The final chapter, Chapter 8, covers various examples where the bootstrap was found not to work as expected (fails asymptotic consistency requirements). But in every case, modifications have been found that are consistent.
This text is suitable for a one-semester or one-quarter introductory course in bootstrap methods. It is designed for users of statistics more than statisticians. So, students with an interest in engineering, biology, genetics, geology, physics, and even psychology and other social sciences may be interested in this course because of the various applications in their field. Of course, statisticians needing a basic understanding of the bootstrap and the surrounding literature may find the course useful. But it is not intended for a graduate course in statistics such as Hall (1992), Shao and Tu (1995), or Davison and Hinkley (1997). A shorter introductory course could be taught using just Chapters 1–4. Chapters 1–4 could also be used to incorporate bootstrap methods into a first course on statistical inference. References to the literature are covered in the historical notes sections in each chapter. At the end of each chapter is a set of homework exercises that the instructor may select from for homework assignments.
Initially, it was our goal to create a text similar to Chernick (2007) but more suitable for a full course on bootstrapping with a large number of exercises and examples illustrated in R. Also, the intent was to make the course elementary with technical details left for the interested student to read the original articles or other books. Our belief was that there were few new developments to go beyond the coverage of Chernick (2007). However, we found that with the introduction of “bagging” and “boosting,” a new role developed for the bootstrap, particularly when estimating error rates for classification problems. As a result, we felt that it was appropriate to cover the new topics and applications in more detail. So parts of this text are not at the elementary level.
MICHAEL R. CHERNICK
ROBERT A. LABUDDE
ACKNOWLEDGMENTS
The authors would like to thank our acquisitions editor Steve Quigley, who has always been enthusiastic about our proposals and always provides good advice. Also, Jackie Palmieri at Wiley for always politely reminding us when our manuscript was expected and for cheerfully accepting changes when delays become necessary. But it is that gentle push that gets us moving to completion. We also thank Dr. Jiajing Sun for her review and editing of the manuscript as well as her help putting the equations into Latex. We especially would like to thank Professor Dmitiris Politis for a nice and timely review of the entire manuscript. He also provided us with several suggestions for improving the text and some additions to the literature to take account of important ideas that we omitted. He also provided us with a number of reference that preceded Bickel, Gotze, and van Zwet (1996) on the value of taking bootstrap samples of size m less than n, as well as some other historical details for sieve bootstrap, and subsampling.
M.R.C.
R.A.L.
2.1 | Summary of Estimators Using Root Mean Square Error (Shows the Number of Simulations for Which the Estimator Attained One of the Top Three Ranks among the Set of Estimators) |
2.2 | Square Root of MSRE for QDA Normal (0, 1) versus Normal (Δ, 1) |
2.3 | Square Root of MSRE for Five Nearest Neighbor Classifier Chi-Square 1 Degree Of Freedom (df) versus Chi-Square 3 df and Chi-Square 5 df (Equal Sample Sizes of n/2) |
2.4 | Square Root of MSRE for Three Nearest Neighbor Classifier Normal Mean = Δ5 dim. and 10 dim. (Equal Sample Sizes of n/2) |
2.5 | Square Root of MSRE for Three and Five Nearest Neighbor Classifiers with Real Microarray Data (Equal Sample Sizes of n/2) |
2.6 | Truth Tables for the Six Bootstrap Samples |
2.7 | Resubstitution Truth Table for Original Data |
2.8 | Linear Discriminant Function Coefficients for Original Data |
2.9 | Linear Discriminant Function Coefficients for Bootstrap Samples |
3.1 | Selling Price of 50 Homes Sold in Seattle in 2002 |
3.2 | Uniform(0, 1) Distribution: Results for Various Confidence Intervals |
3.3 | ln[Normal(0, 1)] Distribution: Results for Nominal Coverage Ranging 50 |
3.4 | ln[Normal(0, 1)] Distribution |
4.1 | Summary Statistics for Bootstrap Estimates of Capture Threshold Differences in Atrium Mean Difference Nonsteroid versus Steroid |
4.2 | Summary Statistics for Bootstrap Estimates of Capture Threshold Differences in Ventricle Mean Difference Nonsteroid versus Steroid |
4.3 | Dose–Response Models Used to Determine Efficacy of Compound for IBS |
4.4 | -Statistics, P-Values, Target Dose Estimates, and Model Weights |
7.1 | Comparison of Treatment Failure Rates (Occurrences of Restenosis) |
7.2 | Comparison of P-Value Adjustments to Pairwise Fisher Exact Test Comparing Treatment Failure Rates (Occurrences of Restenosis) |
7.3 | Comparisons from Table 1 of Opdyke (2010) Run Times and CPU Times Relative to OPDY (Algorithm Time/OPDY Time) |
The “bootstrap” is one of a number of techniques that is now part of the broad umbrella of nonparametric statistics that are commonly called resampling methods. Some of the techniques are far older than the bootstrap. Permutation methods go back to Fisher (1935) and Pitman (1937, 1938), and the jackknife started with Quenouille (1949). Bootstrapping was made practical through the use of the Monte Carlo approximation, but it too goes back to the beginning of computers in the early 1940s.
However, 1979 is a critical year for the bootstrap because that is when Brad Efron’s paper in the Annals of Statistics was published (Efron, 1979). Efron had defined a resampling procedure that he coined as bootstrap. He constructed it as a simple approximation to the jackknife (an earlier resampling method that was developed by John Tukey), and his original motivation was to derive properties of the bootstrap to better understand the jackknife. However, in many situations, the bootstrap is as good as or better than the jackknife as a resampling procedure. The jackknife is primarily useful for small samples, becoming computationally inefficient for larger samples but has become more feasible as computer speed increases. A clear description of the jackknife and its connecton to the bootstrap can be found in the SIAM monograph Efron (1982). A description of the jackknife is also given in Section 1.2.1.
Although permutation tests were known in the 1930s, an impediment to their use was the large number (i.e., n!) of distinct permutations available for samples of size n. Since ordinary bootstrapping involves sampling with replacement n times for a sample of size n, there are nn possible distinct ordered bootstrap samples (though some are equivalent under the exchangeability assumption because they are permutations of each other). So, complete enumeration of all the bootstrap samples becomes infeasible except in very small sample sizes. Random sampling from the set of possible bootstrap samples becomes a viable way to approximate the distribution of bootstrap samples. The same problem exists for permutations and the same remedy is possible. The only difference is that n! does not grow as fast as nn, and complete enumeration of permutations is possible for larger n than for the bootstrap.
The idea of taking several Monte Carlo samples of size n with replacement from the original observations was certainly an important idea expressed by Efron but was clearly known and practiced prior to Efron (1979). Although it may not be the first time it was used, Julian Simon laid claim to priority for the bootstrap based on his use of the Monte Carlo approximation in Simon (1969). But Simon was only recommending the Monte Carlo approach as a way to teach probability and statistics in a more intuitive way that does not require the abstraction of a parametric probability model for the generation of the original sample. After Efron made the bootstrap popular, Simon and Bruce joined the campaign (see Simon and Bruce, 1991, 1995).
Efron, however, starting with Efron (1979), first connected bootstrapping to the jackknife, delta method, cross-validation, and permutation tests. He was the first to show it to be a real competitor to the jackknife and delta method for estimating the standard error of an estimator. Also, quite early on, Efron recognized the broad applicability of bootstrapping for confidence intervals, hypothesis testing, and more complex problems. These ideas were emphasized in Efron and Gong (1983), Diaconis and Efron (1983), Efron and Tibshirani (1986), and the SIAM monograph (Efron 1982). These influential articles along with the SIAM monograph led to a great deal of research during the 1980s and 1990s. The explosion of bootstrap papers grew at an exponential rate. Key probabilistic results appeared in Singh (1981), Bickel and Freedman (1981, 1984), Beran (1982), Martin (1990), Hall (1986, 1988), Hall and Martin (1988), and Navidi (1989).
In a very remarkable paper, Efron (1983) used simulation comparisons to show that the use of bootstrap bias correction could provide better estimates of classification error rate than the very popular cross-validation approach (often called leave-one-out and originally proposed by Lachenbruch and Mickey, 1968. These results applied when the sample size was small, and classification was restricted to two or three classes only, and the predicting features had multivariate Gaussian distributions. Efron compared several variants of the bootstrap with cross-validation and the resubstitution methods. This led to several follow-up articles that widened the applicability and superiority of a version of the bootstrap called 632. See Chatterjee and Chatterjee (1983), Chernick et al. (1985, 1986, 1988a, b), Jain et al. (1987), and Efron and Tibshirani (1997).
Chernick was a graduate student at Stanford in the late 1970s when the bootstrap activity began on the Stanford and Berkeley campuses. However, oddly the bootstrap did not catch on with many graduate students. Even Brad Efron’s graduate students chose other topics for their dissertation. Gail Gong was the first student of Efron to do a dissertation on the bootstrap. She did very useful applied work on using the bootstrap in model building (particularly for logistic regression subset selection). See Gong (1986). After Gail Gong, a number of graduate students wrote dissertations on the bootstrap under Efron, including Terry Therneau, Rob Tibshirani, and Tim Hesterberg. Michael Martin visited Stanford while working on his dissertation on bootstrap confidence intervals under Peter Hall. At Berkeley, William Navidi did his thesis on bootstrapping in regression and econometric models under David Freedman.
While exciting theoretical results developed for the bootstrap in the 1980s and 1990s, there were also negative results where it was shown that the bootstrap estimate is not “consistent” in the probabilistic sense (i.e., approaches the true parameter value as the sample size becomes infinite). Examples included the mean when the population distribution does not have a finite variance and when the maximum or minimum is taken from a sample. This is illustrated in Athreya (1987a, b), Knight (1989). Angus (1993), and Hall et al. (1993). The first published example of an inconsistent bootstrap estimate appeared in Bickel and Freedman (1981). Shao et al. (2000) showed that a particular approach to bootstrap estimation of individual bioequivalence is also inconsistent. They also provide a modification that is consistent. Generally, the bootstrap is consistent when the central limit theorem applies (a sufficient condition is Lyapanov’s condition that requires existence of the 2 + δ moment of the population distribution). Consistency results in the literature are based on the existence of Edgeworth expansions; so, additional smoothness conditions for the expansion to exist have also been assumed (but it is not known whether or not they are necessary).
One extension of the bootstrap called m-out-of-n was suggested by Bickel and Ren (1996) in light of previous research on it, and it has been shown to be a method to overcome inconsistency of the bootstrap in several instances. In the m-out-of-n bootstrap, sampling is with replacement from the original sample but with a value of m that is smaller than n. See Bickel et al. (1997), Gine and Zinn (1989), Arcones and Gine (1989), Fukuchi (1994), and Politis et al. (1999).
Some bootstrap approaches in time series have been shown to be inconsistent. Lahiri (2003) covered the use of bootstrap in time series and other dependent cases. He showed that there are remedies for the m-dependent and moving block bootstrap cases (see Section 5.5 for some coverage of moving block bootstrap) that are consistent.
We will first provide an informal definition of bootstrap to provide intuition and understanding before a more formal mathematical definition. The objective of bootstrapping is to estimate a parameter based on the data, such as a mean, median, or standard deviation. We are also interested in the properties of the distribution for the parameter’s estimate and may want to construct confidence intervals. But we do not want to make overly restrictive assumptions about the form of the distribution that the observed data came from.
For the simple case of independent observations coming from the same population distribution, the basic element for bootstrapping is the empirical distribution. The empirical distribution is just the discrete distribution that gives equal weight to each data point (i.e., it assigns probability 1/n to each of the original n observations and shall be denoted Fn).
Most of the common parameters that we consider are functionals of the unknown population distribution. A functional is simply a mapping that takes a function F into a real number. In our case, we are only interested in the functionals of cumulative probability distribution functions. So, for example, the mean and variance of a distribution can be represented as functionals in the following way. Let μ be the mean for a distribution function F, then μ = ∫ xdF (x) Let σ2 be the variance then σ2 = ∫(x – μ)2 dF (x). These integrals over the entire possible set of x values in the domain of F are particular examples of functionals. It is interesting that the sample estimates most commonly used for these parameters are the same functionals applied to the Fn.
Now the idea of bootstrap is to use only what you know from the data and not introduce extraneous assumptions about the population distribution. The “bootstrap principle” says that when F is the population distribution and T(F) is the functional that defines the parameter, we wish to estimate based on a sample of size n, let Fn play the role of F and , the bootstrap distribution (soon to be defined), play the role of Fn in the resampling process. Note that the original sample is a sample of n independent identically distributed observations from the distribution F and the sample estimate of the parameter is T(Fn). So, in bootstrapping, we let Fn play the role of F and take n independent and identically distributed observations from Fn. Since Fn is the empirical distribution, this is just sampling randomly with replacement from the original data.
Suppose we have n = 5 and the observations are X1 = 7, X2 = 5, X3 = 3, X4 = 9, and X5 = 6 and that we are estimating the mean. Then, the sample estimate of the population parameter is the sample mean, (7 + 5 + 3 + 9 + 6)/5 = 6.0. Then sampling from the data with replacement generates what we call a bootstrap sample.
The bootstrap sample is denoted , and . The distribution for sampling with replacement from Fn is called the bootstrap distribution, which we previously denoted by . The bootstrap estimate is then T(). So a bootstrap sample might be , and = 5, with estimate (5+ 9 + 7 + 7 + 5)/5 = 6.6.
Note that, although it is possible to get the original sample back typically some values get repeated one or more times and consequently others get omitted. For this bootstrap sample, the bootstrap estimate of the mean is (5 + 9 + 7 + 7 + 5)/5 = 6.6. Note that the bootstrap estimate differs from the original sample estimate, 6.0. If we take another bootstrap sample, we may get yet another estimate that may be different from the previous one and the original sample. Assume for the second bootstrap sample we get in this case the observation equal to 9 repeated once. Then, for this bootstrap sample, , and = 5, and the bootstrap estimate for the mean is 7.2.
If we repeat this many times, we get a histogram of values for the mean, which we will call the Monte Carlo approximation to the bootstrap distribution. The average of all these values will be very close to 6.0 since the theoretical mean of the bootstrap distribution is the sample mean. But from the histogram (i.e., resampling distribution), we can also see the variability of these estimates and can use the histogram to estimate skewness, kurtosis, standard deviation, and confidence intervals.
In theory, the exact bootstrap estimate of the parameter could be calculated by averaging appropriately over all possible bootstrap samples, and in this example for the mean, that value would be 6.0. As noted before, there can be nn distinct bootstrap samples (taking account of the ordering of the observations), and so even for n = 10, this becomes very large (i.e., 10 billion). So, in practice, a Monte Carlo approximation is used.
If you randomly generate M = 10,000 or 100,000 bootstrap samples, the distribution of bootstrap estimates will approximate the bootstrap distribution for the estimate. The larger M is the closer the histogram approaches the true bootstrap distribution. Here is how the Monte Carlo approximation works:
Now a very important thing to remember is that with the Monte Carlo approximation to the bootstrap, there are two sources of error:
If T() converges to T(F) as n → ∞, then bootstrapping works. It is nice that this works out often, but it is not guaranteed. We know by a theorem called the Glivenko–Cantelli theorem that Fn converges to F uniformly. Often, we know that the sample estimate is consistent (as is the case for the sample mean). So, (1) T(Fn) converges to T(F) as n → ∞. But this is dependent on smoothness conditions on the functional T. So we also need (2) T() – T(Fn) to tend to 0 as n → ∞. In proving that bootstrapping works (i.e., the bootstrap estimate is consistent for the population parameter), probability theorists needed to verify (1) and (2). One approach that is commonly used is by verifying that smoothness conditions are satisfied for expansions like the Edgeworth and Cornish∞Fisher expansions. Then, these expansions are used to prove the limit theorems.
The probability theory associated with the bootstrap is beyond the scope of this text and can be found in books such as Hall (1992). What is important is that we know that consistency of bootstrap estimates has been demonstrated in many cases and examples where certain bootstrap estimates fail to be consistent are also known. There is a middle ground, which are cases where consistency has been neither proved nor disproved. In those cases, simulation studies can be used to confirm or deny the usefulness of the bootstrap estimate. Also, simulation studies can be used when the sample size is too small to count on asymptotic theory, and its use in small to moderate sample sizes needs to be evaluated.
The jackknife was introduced by Quenouille (1949). Quenouille’s aim was to improve an estimate by correcting for its bias. Later on, Tukey (1958) popularized the method and found that a more important use of the jackknife was to estimate standard errors of an estimate. It was Tukey who coined the name jackknife because it was a statistical tool with many purposes. While bootstrapping uses the bootstrap samples to estimate variability, the jackknife uses what are called pseudovalues.
First, consider an estimate based on a sample of size n of observations independently drawn from a common distribution F. Here, just as with the bootstrap, we again let Fn be the empirical distribution for this data set and assume that the parameter u = T(F), a functional; = T(Fn), and , where Fn(i) is the empirical distribution function for the n – 1 observations obtained by leaving the ith observation out. If is the population variance, the jackknife estimate of variance of σ2 is obtained as follows:
where . The jackknife estimate of standard error for is just the square root of . Tukey defined the pseudovalue as . Then the jack-knife estimate of the parameter u is . So the name pseudovalue comes about because the estimate is the average of the pseudovalues. Expressing the estimate of the variance of the estimate in terms of the pseudovalues we get
In this form, we see that the variance is the usual estimate for variance of a sample mean. In this case, it is the sample mean of the pseudovalues. Like the bootstrap, the jackknife has been a very useful tool in estimating variances for more complicated estimators such as trimmed or Winsorized means.
One of the great surprises about the bootstrap is that in cases like the trimmed mean, the bootstrap does better than the jackknife (Efron, 1982, pp. 28–29). For the sample median, the bootstrap provides a consistent estimate of the variance but the jackknife does not! See Efron (1982, p. 16 and chapter 6). In that monograph, Efron also showed, using theorem 6.1, that the jackknife estimate of standard error is essentially the bootstrap estimate with the parameter estimate replaced by a linear approximation of it. In this way, there is a close similarity between the two methods, and if the linear approximation is a good approximation, the jackknife and the bootstrap will both be consistent. However, there are complex estimators where this is not the case.
It is often the case that we are interested in the moments of an estimator. In particular, for these various methods, the variance is the moment we are most interested in. To illustrate the delta method, let us define φ =f(α) where the parameters φ and α are both one-dimensional variables and f is a function differentiable with respect to α. So there exists a Taylor series expansion for f at a point say α0. Carrying it out only to first order, we get + remainder terms and dropping the remainder terms leaves
or
Squaring both sides of the last equation gives us . Now we want to think of φ = f(α) as a random variable, and upon taking expectations of the random variables on each side of the equation, we get
(1.1)
Here, α and f(α) are random variables, and α0, f(α0), and f′(α0) are all constants. Equation 1.1 provides the delta method approximation to the variance of φ = f(α) since the left-hand side is approximately the variance of φ and the right-hand side is the variance of α multiplied by the constant [f′(α0)]2 if we choose α0 to be the mean of α.
Cross-validation is a general procedure used in statistical modeling. It can be used to determine the best model out of alternative choices such as order of an autoregressive time series model, which variables to include in a logistic regression or a multiple linear regression, number of distributions in a mixture model, and the choice of a parametric classification model or for pruning classification trees.
The basic idea of cross-validation is to randomly split the data into two subsets. One is used to fit the model, and the other is used to test the model. The extreme case would be to fit all the data except for a single observation and see how well that model predicts the value of the observation left out. But a sample of size 1 is not very good for assessment. So, in the case of classification error rate estimation, Lachenbruch and Mickey (1968) proposed the leave-one-out method of assessment. In this case, a model is fit to the n – 1 observations that are included and is tested on the one left out. But the model fitting and prediction is then done separately for all n observations by testing the model fit without observation (for predicting the class for the case i. Results are obtained from each i and then averaged. Efron (1983) included a simulation study that showed for bivariate normal distributions the “632” variant of the bootstrap does better than leave-one-out. For pruning classification trees, see Brieman et al. (1984).
The idea of subsampling goes back to Hartigan (1969), who developed a theory of confidence intervals for random subsampling. He proved a theorem called the typical value theorem when M-estimators are used to estimate parameters. We shall see in the chapter on confidence intervals that Hartigan’s results were motivating factors for Efron to introduce the percentile method bootstrap confidence intervals.
More recently the theory of subsampling has been further developed and related to the bootstrap. It has been applied when the data are independent observations and also when there are dependencies among the data. A good summary of the current literature along with connections to the bootstrap can be found in Politis et al. (1999), and consistency under very minimal assumptions can be found in Politis and Romano(1994). Politis, Romano, and Wolf included applications when the observations are independent and also for dependent situations such as stationary and nonstationary time series, random fields, and marked point processes. The dependent situations are also well covered in section 2.8 of Lahiri (2003).
We shall now define random subsampling. Let S1, S2, …, SB–1 be B – 1 of the 2n– 1 nonempty subsets of the integers 1, 2, …, n. These B – 1 subsets are selected at random without replacement. So a subset of size 3 might be drawn, and it would contain {1, 3, 5}. Another subset of size 3 that could be drawn could be {2, 4, n}. Subsets of other sizes could also be drawn. For example, a subset of size 5 is {1, 7, 9, 12, 13}. There are many subsets to select from. There is only 1 subset of size n, and it contains all the integers from 1 to n. There are n subsets of size n – 1. Each distinct subset excludes one and only one of the integers from 1 to n. For more details on this and M-estimators and the typical value theorem see sections 3.1.1 and 3.1.2 of Chernick (2007).
There is a great deal of temptation to apply the bootstrap in a wide variety of settings. But as we have seen, the bootstrap does not always work. So how do we know when it will work? We either have to prove a consistency theorem under a set of assumptions or we have to verify that it is well behaved through simulations.
In regression problems, there are at least two approaches to bootstrapping. One is called “bootstrapping residuals,” and the other is called “bootstrapping vectors or cases.” In the first approach, we fit a model to the data and compute the residuals from the model. Then we generate a bootstrap sample by resampling with replacement from the model residuals. In the second approach, we resample with replacement from the n, k +1 dimensional vectors:
In the first approach, the model is fixed. In the second, it is redetermined each time. Both methods can be applied when a parametric regression model is assumed. But in practice, we might not be sure that the parametric form is correct. In such cases, it is better to use the bootstrapping vectors approach.
The bootstrap has also been successfully applied to the estimation of error rates for discriminant functions using bias adjustment as we will see in Chapter 2. The bootstrap and another resampling procedure called “permutation tests,” as described in Good (1994), are attractive because they free the scientists from restrictive parametric assumptions that may not apply in their particular situation.
Sometimes the data can have highly skewed or heavy-tailed distributions or multiple modes. There is no need to simplify the model by, say, a linear approximation when the appropriate model is nonlinear. The estimator can be defined through an algorithm and there does not need to be an analytic expression for the parameters to be estimated.
Another feature of the bootstrap is its simplicity. For almost any problem you can think of, there is a way to construct bootstrap samples. Using the Monte Carlo approximation to the bootstrap estimate, all the work can be done by the computer. Even though it is a computer-intensive method, with the speed of the modern computer, most problems are feasible, and in many cases, up to 100,000 bootstrap samples can be generated without consuming hours of CPU time. But care must be taken. It is not always apparent when the bootstrap will fail, and failure may not be easy to diagnose.
In recent years, we are finding that there are ways to modify the bootstrap so that it will work for problems where the simple (or naive) bootstrap is known to fail. The “m-out-n” bootstrap is one such example.
In many situations, the bootstrap can alert the practitioner to variability in his procedures that he otherwise would not be aware of. One example in spatial statistics is the development of pollution level contours based on a smoothing method called “kriging.” By generating bootstrap samples, multiple kriging contour maps can be generated, and the differences in the contours can be determined visually.
Also, the stepwise logistic regression problem that is described in Gong (1986) shows that variable selection can be somewhat of a chance outcome when there are many competing variables. She showed this by bootstrapping the entire stepwise selection procedure and seeing that the number of variables and the choice of variables selected can vary from one bootstrap sample to the next.
Babu and Feigelson (1996) applied the bootstrap to astronomy problems. In clinical trials, the bootstrap is used to estimate individual bioequivalence, for P-value adjustment with multiple end points, and even to estimate mean differences when the sample size is not large enough for asymptotic theory to take hold or the data are very nonnormal and statistics other that the mean are important.
In subsequent chapters of this text, we will illustrate examples with calculations and short programs using the R language system and its associated packages.
R is an integrated suite of an object-oriented programming language and software facilities for data manipulation, calculation, and graphical display. Over the last decade, R has become the statistical environment of choice for academics, and probably is now the most used such software system in the world. The number of specialized packages available in R has increased exponentially, and continues to do so. Perhaps the best thing about R (besides its power and breadth) is this: It is completely free to use. You can obtain your own copy of the R system at http://www.cran.r-project.org/.
From this website, you can get not only the executable version of R for Linux, Macs, or Windows, but also even the source programs and free books containing documentation. We have found The R Book by Michael J. Crawley a good way to learn how to use R, and have found it to be an invaluable reference afterword.
There are so many good books and courses from which you can learn R, including courses that are Internet based, such as at http://statistics.com. We will not attempt to teach even the basics of R here. What we will do is show those features of direct applicability, and give program snippets to illustrate examples and the use of currently available R packages for bootstrapping. These snippets will be presented in the Courier typeface to distinguish them from regular text and to maintain spacing in output generated.
At the current time, using R version 2.10.1, the R query (“>“ denotes the R command line prompt)
> ?? bootstrap
or
> help.search(’bootstrap‘)
results in
agce::resamp.std Compute the standard deviation by bootstrap.
alr3::boot.case Case bootstrap for regression models
analogue::RMSEP Root mean square error of prediction
analogue::bootstrap Bootstrap estimation and errors
analogue::bootstrap.waBootstrap estimation and errors for WA models
analogue::bootstrapObject Bootstrap object description
analogue::getK Extract and set the number of analogues
analogue::performance Transfer function model performance statistics
analogue::screeplot.mat Screeplots of model results
analogue::summary.bootstrap.mat Summarise bootstrap resampling for MAT models
animation::boot.iid Bootstrapping the i.i.d data
ape::boot.phylo Tree Bipartition and Bootstrapping Phylogenies
aplpack::slider.bootstrap.lm.plot interactive bootstapping for lm
bnlearn::bn.boot Parametric and nonparametric bootstrap of Bayesian networks
bnlearn::boot.strength Bootstrap arc strength and direction
boot::nested.corr Functions for Bootstrap Practicals
boot::boot Bootstrap Resampling
boot::boot.array Bootstrap Resampling Arrays
boot::boot.ci Nonparametric Bootstrap Confidence Intervals
boot::cd4.nested Nested Bootstrap of cd4 data
boot::censboot Bootstrap for Censored Data
boot::freq.array Bootstrap Frequency Arrays
boot::jack.after.boot Jackknife-after-Bootstrap Plots
boot::linear.approx Linear Approximation of Bootstrap Replicates
boot::plot.boot Plots of the Output of a Bootstrap Simulation
boot::print.boot Print a Summary of a Bootstrap Object
boot::print.bootci Print Bootstrap Confidence Intervals
boot::saddle Saddlepoint Approximations for Bootstrap Statistics
boot::saddle.distn Saddlepoint Distribution Approximations for Bootstrap Statistics
boot::tilt.boot Non-parametric Tilted Bootstrap
boot::tsboot Bootstrapping of Time Series
BootCL::BootCL.distribution Find the Bootstrap distribution
BootCL::BootCL.plot Display the bootstrap distribution and p-value
BootPR::BootAfterBootPI Bootstrap-after-Bootstrap Prediction
BootPR::BootBC Bootstrap bias-corrected estimation and forecasting for AR models
BootPR::BootPI Bootstrap prediction intevals and point forecasts with no bias-correction
BootPR::BootPR-package Bootstrap Prediction Intervals and Bias-Corrected Forecasting
BootPR::ShamanStine.PI Bootstrap prediction interval using Shaman and Stine bias formula
BootRes::bootRes-package The bootRes Package for Bootstrapped Response and Correlation Functions
BootRes::dendroclim Calculation of Bootstrapped response and correlation functions.
Bootspecdens::specdens Bootstrap for testing equality of spectral densities
BootStepAIC::boot.stepAIC Bootstraps the Stepwise Algorithm of stepAIC() for Choosing a Model by AIC
Bootstrap::bootpred Bootstrap Estimates of Prediction Error
Bootstrap::bootstrap Non-Parametric Bootstrapping
Bootstrap::boott Bootstrap-t Confidence Limits
Bootstrap::ctsub Internal functions of package bootstrap
Bootstrap::lutenhorm Luteinizing Hormone
Bootstrap::scor Open/Closed Book Examination Data
Bootstrap::spatial Spatial Test Data
BSagri::BOOTSimpsonD Simultaneous confidence intervals for Simpson indices
cfa::bcfa Bootstrap-CFA
ChainLadder::BootChainLadder Bootstrap-Chain-Ladder Model
CircStats::vm.bootstrap.ci Bootstrap-Confidence Intervals
circular::mle.vonmises.bootstrap.ci Bootstrap Confidence Intervals
clue::cl_boot Bootstrap Resampling of Clustering Algorithms
CORREP::cor.bootci Bootstrap Confidence Interval for Multivariate Correlation
Daim::Daim.data1 Data set: Artificial Bootstrap data for use with Daim
DCluster::achisq.boot Bootstrap replicates of Pearson’s Chi-square statistic
DCluster::besagnewell.boot Generate boostrap replicates of Besag and Newell‘s statistic
DCluster::gearyc.boot Generate bootstrap replicates of Moran‘s I autocorrelation statistic
DCluster::kullnagar.boot Generate bootstrap replicates of Kulldorff and Nagarwalla’s statistic
DCluster::moranI.boot Generate bootstrap replicates of Moran‘s I autocorrelation statistic
DCluster::pottwhitt.boot Bootstrap replicates of Potthoff-Whittinghill‘s statistic
DCluster::stone.boot Generate boostrap replicates of Stone‘s statistic
DCluster::tango.boot Generate bootstrap replicated of Tango‘s statistic
DCluster::whittermore.boot Generate
Bootstrap replicates of Whittermore’s statistic
degreenet::rplnmle Rounded Poisson Lognormal Modeling of Discrete Data
degreenet::bsdp Calculate Bootstrap Estimates and Confidence Intervals for the Discrete Pareto Distribution
degreenet::bsnb Calculate Bootstrap Estimates and Confidence Intervals for the Negative Binomial Distribution
degreenet::bspln Calculate Bootstrap Estimates and Confidence Intervals for the Poisson Lognormal Distribution
degreenet::bswar Calculate Bootstrap Estimates and Confidence Intervals for the Waring Distribution
degreenet::bsyule Calculate Bootstrap Estimates and Confidence Intervals for the Yule Distribution
degreenet::degreenet-internal Internal degreenet Objects
delt::eval.bagg Returns a bootstrap aggregation of adaptive histograms
delt::lstseq.bagg Calculates a scale of bootstrap aggregated histograms
depmix::depmix Fitting Dependent Mixture Models
Design::anova.Design Analysis of Variance (Wald and F Statistics)
Design::bootcov Bootstrap Covariance and Distribution for Regression Coefficients
Design::calibrate Resampling Model Calibration
Design::predab.resample Predictive Ability using Resampling
Design::rm.impute Imputation of Repeated Measures
Design::validate Resampling Validation of a Fitted Model’s Indexes of Fit
Design::validate.cph Validation of a Fitted Cox or Parametric Survival Model’s Indexes of Fit
Design::validate.lrm Resampling Validation of a Logistic Model
Design::validate.ols Validation of an Ordinary Linear Model
dynCorr::bootstrapCI Bootstrap Confidence Interval
dynCorr::dynCorrData An example dataset for use in the example calls in the help files for the dynamicCorrelation and bootstrapCI functions
e1071::bootstrap.lca Bootstrap Samples of LCA Results
eba::boot Bootstrap for Elimination-By-Aspects (EBA) Models
EffectiveDose::Boot.CI Bootstrap confidence intervals for ED levels
EffectiveDose::EffectiveDose-package Estimation of the Effective Dose including Bootstrap confindence intervals
el.convex::samp sample from bootstrap
equate::se.boot Bootstrap Standard Errors of Equating
equivalence::equiv.boot Regression-based TOST using bootstrap
extRemes::boot.sequence Bootstrap a sequence.
FactoMineR::simule Simulate by bootstrap
FGN::Boot Generic Bootstrap Function
FitAR::Boot Generic Bootstrap Function
FitAR::Boot.ts Parametric Time Series Bootstrap
fitdistrplus::bootdist Bootstrap simulation of uncertainty for non-censored data
fitdistrplus::bootdistcens Bootstrap simulation of uncertainty for censored data
flexclust::bootFlexclust Bootstrap Flexclust Algorithms