The continuing revolution in computing is having a dramatic influence on statistics. Not only is the exploratory analysis of data becoming easier, with more and more automated graphs and calculations, but also the statistical study of very large and very complex data sets (e.g., big data) is now feasible. Companies that have long collected information on customers are now using business analytics to discover meaningful patterns in these data, which in turn drive decision making.
This fast and inexpensive computing has also led to the development of new analysis methods—methods that apply previously unthinkable amounts of computation. In this chapter, we focus on bootstrap confidence intervals and permutation tests. These methods apply the power of the computer to relax some of the conditions needed for traditional inference and to do inference in new settings. They can be used to answer questions such as:
As with all the other methods discussed in this textbook, the most important requirement for trustworthy conclusions about a population is still that our data can be regarded as random samples from a population or populations; not even the computer can rescue voluntary response samples or confounded experiments. But these new methods set us free from the need for Normal data or samples large enough to rely on the central limit theorem. They also work the same way for many different statistics in many different settings. They can, with sufficient computing power, give results that are more accurate than those from traditional methods.
Bootstrap confidence intervals (Sections 16.2 and 16.4) and permutation tests (Section 16.5) are conceptually simple because they appeal directly to the basis of all inference: the sampling distribution, which shows what would happen if we took very many samples under the same conditions. Although the new methods do have limitations, some of which we will illustrate, their effectiveness and range of use are so great that they are now widely used in a variety of settings.
Bootstrapping and permutation tests are feasible in practice only with software that automates the heavy computation that these methods require. If you are sufficiently expert, you can program at least the basic methods yourself. It is easier to use software that offers bootstrap intervals and permutation tests preprogrammed, just as most software offers the various t intervals and tests. You can expect the new methods to become more common in standard statistical software.
This chapter primarily uses R, the software choice of many statisticians doing research on resampling methods.1 There are several packages of functions for resampling in R. We will focus on the boot package, which offers the most capabilities. Unlike software such as Minitab and SPSS, R is not menu driven and requires command line requests to load data and access various functions. All commands used in this chapter are available on the text website.
JMP, Minitab, SPSS, and SAS also offer preprogrammed bootstrap and permutation methods. JMP offers single-click bootstrapping capabilities with many statistical reports (see page 403 for an example). Minitab offers its methods under the Calc > Resampling menu item. SPSS has an auxiliary bootstrap module that contains most of the methods described in this chapter. In SAS, the SURVEYSELECT procedure can be used to do the necessary resampling. The bootstrap macro contains most of the confidence interval methods offered by R. You can find links for downloading these modules or macros on the text website.