You are on page 1of 14

Bootstrapping Regression Models

Appendix to An R and S-PLUS Companion to Applied Regression

John Fox
January 2002

Basic Ideas

Bootstrapping is a general approach to statistical inference based on building a sampling distribution for
a statistic by resampling from the data at hand. The term bootstrapping, due to Efron (1979), is an
allusion to the expression pulling oneself up by ones bootstraps in this case, using the sample data as
a population from which repeated samples are drawn. At rst blush, the approach seems circular, but has
been shown to be sound.
Two S libraries for bootstrapping are associated with extensive treatments of the subject: Efron and
Tibshiranis (1993) bootstrap library, and Davison and Hinkleys (1997) boot library. Of the two, boot,
programmed by A. J. Canty, is somewhat more capable, and will be used for the examples in this appendix.
There are several forms of the bootstrap, and, additionally, several other resampling methods that are
related to it, such as jackknifing, cross-validation, randomization tests, and permutation tests. I will stress
the nonparametric bootstrap.
Suppose that we draw a sample S = {X1 , X2 , ..., Xn } from a population P = {x1 , x2 , ..., xN }; imagine
further, at least for the time being, that N is very much larger than n, and that S is either a simple random
sample or an independent random sample from P;1 I will briey consider other sampling schemes at the
end of the appendix. It will also help initially to think of the elements of the population (and, hence, of the
sample) as scalar values, but they could just as easily be vectors (i.e., multivariate).
Now suppose that we are interested in some statistic T = t(S) as an estimate of the corresponding
population parameter = t(P). Again, could be a vector of parameters and T the corresponding vector
of estimates, but for simplicity assume that is a scalar. A traditional approach to statistical inference is
to make assumptions about the structure of the population (e.g., an assumption of normality), and, along
with the stipulation of random sampling, to use these assumptions to derive the sampling distribution of T ,
on which classical inference is based. In certain instances, the exact distribution of T may be intractable,
and so we instead derive its asymptotic distribution. This familiar approach has two potentially important
deciencies:
1. If the assumptions about the population are wrong, then the corresponding sampling distribution of
the statistic may be seriously inaccurate. On the other hand, if asymptotic results are relied upon,
these may not hold to the required level of accuracy in a relatively small sample.
2. The approach requires sucient mathematical prowess to derive the sampling distribution of the statistic of interest. In some cases, such a derivation may be prohibitively dicult.
In contrast, the nonparametric bootstrap allows us to estimate the sampling distribution of a statistic
empirically without making assumptions about the form of the population, and without deriving the sampling
distribution explicitly. The essential idea of the nonparametric bootstrap is as follows: We proceed to draw a
sample of size n from among the elements of S, sampling with replacement. Call the resulting bootstrap sample

, X12
, ..., X1n
}. It is necessary to sample with replacement, because we would otherwise simply
S1 = {X11
reproduce the original sample S. In eect, we are treating the sample S as an estimate of the population
1 Alternatively,

P could be an infinite population, specified, for example, by a probability distribution function.

P; that is, each element Xi of S is selected for the bootstrap sample with probability 1/n, mimicking the
original selection of the sample S from the population P. We repeat this procedure a large number of times,

, Xb2
, ..., Xbn
}.
R, selecting many bootstrap samples; the bth such bootstrap sample is denoted Sb = {Xb1
The key bootstrap analogy is therefore as follows:

The population is to the sample


as
the sample is to the bootstrap samples.

Next, we compute the statistic T for each of the bootstrap samples; that is Tb = t(Sb ). Then the
distribution of Tb around the original estimate T is analogous to the sampling distribution of the estimator
T around the population parameter . For example, the average of the bootstrapped statistics,

 (T ) =
T =E

R

b=1

Tb

 = T T is an estimate of the bias of T ,


estimates the expectation of the bootstrapped statistics; then B
that is, T . Similarly, the estimated bootstrap variance of T ,
V (T ) =

R

b=1 (Tb

T )2
R1

estimates the sampling variance of T .


The random selection of bootstrap samples is not an essential aspect of the nonparametric bootstrap:
At least in principle, we could enumerate all bootstrap samples of size n. Then we could calculate E (T )
and V (T ) exactly, rather than having to estimate them. The number of bootstrap samples, however, is
astronomically large unless n is tiny.2 There are, therefore, two sources of error in bootstrap inference: (1)
the error induced by using a particular sample S to represent the population; and (2) the sampling error
produced by failing to enumerate all bootstrap samples. The latter source of error can be controlled by
making the number of bootstrap replications R suciently large.

Bootstrap Confidence Intervals

There are several approaches to constructing bootstrap condence intervals. The normal-theory interval
assumes that the statistic T is normally distributed (which is often approximately the case for statistics in
suciently large samples), and uses the bootstrap estimate of sampling variance, and perhaps of bias, to
construct a 100(1 )-percent condence interval of the form

 ) z1/2 SE
 (T )
= (T B


 (T ) = V (T ) is the bootstrap estimate of the standard error of T , and z1/2 is the 1 /2
Here, SE
quantile of the standard-normal distribution (e.g., 1.96 for a 95-percent condence interval, where = .05).
An alternative approach, called the bootstrap percentile interval, is to use the empirical quantiles of Tb
to form a condence interval for :

< < T(upp


T(lower)
er)
2 If we distinguish the order of elements in the bootstrap samples and treat all of the elements of the population as distinct
(even when some have the same values) then there are nn bootstrap samples, each occurring with probability 1/nn .

where T(1)
, T(2)
, . . . , T(R)
are the ordered bootstrap replicates of the statistic; lower = [(R + 1)/2]; upper
= [(R + 1)(1 /2)]; and the square brackets indicate rounding to the nearest integer. For example, if
= .05, corresponding to a 95-percent condence interval, and R = 999, then lower = 25 and upper = 975.
Although they do not articially assume normality, percentile condence intervals often do not perform
well. So-called bias-corrected, accelerated (or BC a ) percentile intervals are preferable. To nd the BCa
interval for :

Calculate

#
(T

T
)

b=1 b

z = 1

R+1

where 1 () is the standard-normal quantile function, and # (Tb T ) /(R + 1) is the (adjusted)
proportion of bootstrap replicates at or below the original-sample estimate T of . If the bootstrap
sampling distribution is symmetric, and if T is unbiased, then this proportion will be close to .5, and
the correction factor z will be close to 0.
Let T(i) represent the value of T produced when the ith observation is deleted from the sample;3

there are n of these quantities. Let T represent the average of the T(i) ; that is T = ni=1 T(i) /n.
Then calculate
3
n
T(i) T
a = i=1 
 32
n
2
T
6

T
(i)
i=1
With the correction factors z and a in hand, compute


z z1/2
a1 = z +
1 a(z z1/2 )


z + z1/2
a2 = z +
1 a(z + z1/2 )
where () is the standard-normal cumulative distribution function. The values a1 and a2 are used to
locate the endpoints of the corrected percentile condence interval:

< < T(upper*)


T(lower*)

where lower* = [Ra1 ] and upper* = [Ra2 ]. When the correction factors a and z are both 0, a1 =
(z1/2 ) = (z/2 ) = /2, and a2 = (z1/2 ) = 1 /2, which corresponds to the (uncorrected)
percentile interval.
To obtain suciently accurate 95-percent bootstrap percentile or BCa condence intervals, the number
of bootstrap samples, R, should be on the order of 1000 or more; for normal-theory bootstrap intervals we
can get away with a smaller value of R, say, on the order of 100 or more, since all we need to do is estimate
the standard error of the statistic.

Bootstrapping Regressions

Recall (from Chapters 1 and 6 in the text) Duncans regression of prestige on income and education for 45
occupations. In the Appendix on robust regression, I ret this regression using an M-estimator with the
Huber weight function (via the rlm function in the MASS library):
3 The T
(i) are called the jackknife values of the statistic T . Although I will not pursue the subject here, the jackknife values
can also be used as an alternative to the bootstrap to find a nonparametric confidence interval for .

> library(car)
. . .
>
>
>
>

# for Duncan data set and (later) data.ellipse

library(MASS)
data(Duncan)
mod.duncan.hub <- rlm(prestige ~ income + education, data=Duncan)
summary(mod.duncan.hub)

Call: rlm.formula(formula = prestige ~ income + education, data = Duncan)


Residuals:
Min
1Q Median
3Q
Max
-30.12 -6.89
1.29
4.59 38.60
Coefficients:
Value Std. Error t value
(Intercept) -7.111 3.881
-1.832
income
0.701 0.109
6.452
education
0.485 0.089
5.438
Residual standard error: 9.89 on 42 degrees of freedom
Correlation of Coefficients:
(Intercept) income
income
-0.297
education -0.359
-0.725
The coecient standard errors reported by rlm rely on asymptotic approximations, and may not be
trustworthy in a sample of size 45. Let us turn, therefore, to the bootstrap.
There are two general ways to bootstrap a regression like this: We can treat the predictors as random,
potentially changing from sample to sample, or as fixed. I will deal with each case in turn, and then compare
the two approaches. For reasons that should become clear in the subsequent sections, random-x resampling
is also called case resampling, and xed-x resampling is also called model-based resampling.

3.1

Random-x Resampling

Broadening the scope of the discussion, assume that we want to t a regression model with response variable
y and predictors x1 , x2 , . . . , xk . We have a sample of n observations zi = (yi, xi1 , xi2 , . . . , xik ), i = 1, . . . , n.
In random-x resampling, we simply select R bootstrap samples of the zi , tting the model and saving
the coecients from each bootstrap sample. We can then construct condence intervals for the regression
coecients using the methods described in the preceding section.
Although it is not hard to program bootstrap calculations directly in S, it is more convenient to use
the boot function in the boot library. This function takes several arguments, the rst three of which are
required:
data: A data vector, matrix, or data frame to which bootstrap resampling is to be applied. Each
element of the data vector, or each row of the matrix or data frame, is treated as an observation.
statistic: A function that returns the (possibly vector-valued) statistic to be bootstrapped. The
rst two arguments of this function must specify the original (sample) data set and an index vector,
giving the indices of the observations included in the current bootstrap sample.
R: the number of bootstrap replications.
Some of the other arguments of boot are described below. For complete information, examine the on-line
help for boot.
4

To use boot, we therefore have to write a function that returns the statistic to be bootstrapped. For
example, for the Huber M -estimator of Duncans regression,
> boot.huber <- function(data, indices, maxit=20){
+
data <- data[indices,] # select obs. in bootstrap sample
+
mod <- rlm(prestige ~ income + education, data=data, maxit=maxit)
+
coefficients(mod) # return coefficient vector
+
}
>
Notice that the function boot.huber takes a third argument, maxiter, which sets the maximum number
of iterations to perform in each M -estimation; I included this provision because I found that the default of
20 iterations in rlm is not always sucient. The boot function is able to pass additional arguments through
to the function specied in its statistic argument:
> library(boot)
> system.time(duncan.boot <- boot(Duncan, boot.huber, 1999, maxit=100))
[1] 86.58 0.30 87.01
NA
NA
> duncan.boot
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Duncan, statistic = boot.huber, R = 1999, maxit = 100)
Bootstrap Statistics :
original
bias
t1* -7.11070 0.0331471
t2* 0.70145 -0.0104721
t3* 0.48544 0.0074838

std. error
3.07869
0.17373
0.13495

I ran boot within a call to the function system.time to provide a sense of how long a bootstrapping
operation like this takes in this case, generating R = 1999 bootstrap replicates of the regression coecients.
The rst number returned by system.time is the CPU (processing) time for the operation, in seconds, while
the third number is the total elapsed time. Here, both CPU and elapsed time are about a minute and a
half.4 Although this is a small problem, the time spent depends more upon the number of bootstrap samples
than upon the sample size. There are two ways to think about waiting 90 seconds for the bootstrapping to
take place: On the one hand, we tend to be spoiled by the essentially instantaneous response that S usually
provides, and by this standard, 90 seconds seems a long time; on the other hand, bootstrapping is not an
exploratory procedure, and 90 seconds is a trivial proportion of the time typically spent on a statistical
investigation.
The boot function returns an object, here boot.duncan, of class boot. Printing the object gives the
original sample value for each component of the bootstrapped statistic, along with the bootstrap estimates of

bias (i.e., the dierence T T between the average bootstrapped value of the statistic and its original-sample

 (T )]. As explained in the previous section, these


value), and the bootstrap estimates of standard error [SE
values may be used to construct normal-theory condence intervals for the regression coecients (see below).
Notice that the bootstrap standard errors of the income and education coecients are substantially larger
than the asymptotic estimates reported by rlm.
In addition to a print method, the boot library contains several useful functions for examining and
manipulating boot objects. The boot.array function, for example, returns an R n matrix; the entry in
row b, column i indicates how many times the ith observation appears in the bth bootstrap sample:
4 These calculations were performed with R version 1.4.0 under Windows 2000, running on an 800 MHz Pentium 3 computer
with 512 Mb of memory.

> duncan.array <- boot.array(duncan.boot)


> duncan.array[1:2,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]
2
1
1
1
1
0
0
0
0
1
1
[2,]
0
1
0
2
1
2
1
2
0
1
1
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
[1,]
0
2
3
0
0
0
1
2
1
0
[2,]
2
1
2
0
1
0
1
0
1
3
. . .
[,42] [,43] [,44] [,45]
[1,]
2
1
0
1
[2,]
0
3
1
0
Thus, for example, observations 1 appears twice in the rst bootstrap sample, but not at all in the second
sample.
Plotting a boot object draws a histogram and normal quantile-comparison plot of the bootstrap replications for one of the coecients, selected via the index argument:
> plot(duncan.boot, index=2)
> plot(duncan.boot, index=3)
>

# income coef.
# education coef.

The resulting graphs are shown in Figure 1. The bootstrap distributions of the income and education
coecients are both reasonably symmetric but somewhat heavy-tailed, and appear to have more than one
mode.
I next use the data.ellipse function from the car library to examine the joint distribution of the
bootstrapped income and education coecients. The function draws a scatterplot of the pairs of coecients,
with concentration ellipses superimposed (Figure 2):
> data.ellipse(duncan.boot$t[,2], duncan.boot$t[,3],
+
xlab=income coefficient, ylab=education coefficient,
+
cex=.3, levels=c(.5, .95, .99), robust=T)
>
Notice that the bootstrap replicates of the regression coecients are stored in the t component of the
boot object; this component is a matrix containing one row for each bootstrap replication and one column
for each coecient. The negative correlation of the income and education coecients reects the positive
correlation between the two predictors.
The boot.ci function calculates several kinds of bootstrap condence intervals, including the biascorrected normal-theory, percentile, and BCa intervals described in the previous section:
>
# income coefficient
> boot.ci(duncan.boot, index=2, type=c("norm", "perc", "bca"))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1999 bootstrap replicates
CALL :
boot.ci(boot.out = duncan.boot, type = c("norm", "perc", "bca"),
index = 2)
Intervals :
Level
Normal
Percentile
95%
( 0.3714, 1.0524 )
( 0.3309, 1.0277 )
Calculations and Intervals on Original Scale

BCa
( 0.2286,

0.9302 )

(a) Income Coefficient

1.0
0.6

t*

2.0

0.2

1.0
0.0

Density

3.0

1.4

Histogram of t

0.0

0.5

1.0

1.5

-3

t*

-1

Quantiles of Standard Normal

(b) Education Coefficient

t*
0.2

2
0

-0.2

Density

0.6

Histogram of t

-0.2

0.2

0.6

1.0

-3

t*

-1

Quantiles of Standard Normal

Figure 1: Histograms and normal quantile-comparison plots for the bootstrap replications of the income (a)
and education (b) coecients in the Huber regression for Duncans occupational-prestige data. The broken
vertical line in each histogram shows the location of the regression coecient for the model t to the original
sample.

education coefficient

0.8
0.6
0.4
0.2
0.0
-0.2
0.2

0.4

0.6

0.8

1.0

1.2

1.4

income coefficient

Figure 2: Scatterplot of bootstrap replications of the income and education coecients from the Huber
regression for Duncans occupational-prestige data. The concentration ellipses are drawn at the 50, 90, and
99-percent level using a robust estimate of the covariance matrix of the coecients.
>
# education coefficient
> boot.ci(duncan.boot, index=3, type=c("norm", "perc", "bca"))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1999 bootstrap replicates
CALL :
boot.ci(boot.out = duncan.boot, type = c("norm", "perc", "bca"),
index = 3)
Intervals :
Level
Normal
Percentile
95%
( 0.2134, 0.7425 )
( 0.2333, 0.7788 )
Calculations and Intervals on Original Scale

BCa
( 0.2776,

0.8268 )

The index argument to boot.ci selects the coecient for which condence intervals are to be constructed, while type species the type of intervals; the condence level or levels for the intervals is set via
the conf argument, which defaults to 0.95 (for 95-percent intervals). In this example, the normal-theory
and percentile intervals are reasonably similar to each other, but the more trustworthy BCa intervals are
somewhat dierent.
The (unfortunately named) jack.after.boot function displays a diagnostic jackknife-after-bootstrap
plot. This plot shows the sensitivity of the statistic and of the percentiles of its bootstrapped distribution to
deletion of individual observations. An illustration, for the coecients of income and education, appears
in Figure 3, which is produced by the following commands:
> par(mfcol=c(2,1))
> jack.after.boot(duncan.boot, index=2, main=(a) income coefficient)
> jack.after.boot(duncan.boot, index=3, main=(b) education coefficient)
>
The horizontal axis of the graph, labelled standardized jackknife value is a measure of the inuence
of each observation on the coecient. The observation indices corresponding to the points in the graph are
8

5, 10, 16, 50, 84, 90, 95 %-iles of (T*-t)


-0.6
-0.2 0.0 0.2 0.4
5, 10, 16, 50, 84, 90, 95 %-iles of (T*-t)
-0.4 -0.2
0.0
0.2

(a) income coefficient

*
**

*
*
*
*

*
*
*
*

*
*
*

16

-4

*
*
*
*

*
*
*
********************* * ****
*** **
*
**************
******** ****** * *
** ** * **
* * ************ * ***
** * * ********************* * ***

*
*
*

*
*** ** ** **************************************** ** **** **
* * * * ** ** * ***** * *** *
** * ***** ***** *** **

33

22

30
8

*
**
*

5 25 29
4044 14 26
3436 1138 2845
27
32 197 442 3941
21
37 43233 1 3113
17
15 35 122 20 1024
18

-3
-2
-1
0
standardized jackknife value

(b) education coefficient


**
**
**

* *
*
*
**
**** * **************** **************************** *
*
*
* * ******** *************** *

*
**
**
*

** * ******** **************
* * ******** **************
*
**
*** * ************** ***********************
*
*
*
*
*
*
*
*
*
*
*
*
*28 12 36 10342 38 7

17
21
18

27
26

-1

14 32 13 39
3134 35
2423 9 41 43 15
254441 112 8
22
29 4540 1920 37
5

*
*
*

**
***
*

**

*
*
*

*
*
*

**
**
**

30

33

0
1
2
3
standardized jackknife value

16
6

Figure 3: Jackknife-after-bootstrap plot for the income (a) and education (b) coecients in the Huber
regression for Duncans occupational-prestige data.

shown near the bottom of the plot. Thus observations 6 and 16 serve to decrease the income coecient
and increase the education coecient. As is familiar from Chapters 1 and 6 of the text, these are the
occupations minister and railroad-conductor.
The horizontal broken lines on the plot are quantiles of the bootstrap distribution of each coecient,
centered at the value of the coecient for the original sample. By default the .05, .10, .16, .50, .84, .90,
and .95 quantiles are plotted. The points connected by solid lines show the quantiles estimated only from
bootstrap samples in which each observation in turn did not appear. Therefore, deleting the occupation
minister or conductor makes the bootstrap distributions for the coecients slightly less dispersed. On the
other hand, removing occupation 27 (railroad-engineer) or 30 (plumber) makes the bootstrap distributions
somewhat more variable. From our earlier work on Duncans data (see, in particular, Chapter 6), I recognize
railroad-engineer as a high-leverage but in-line occupation; it is unclear to me, however, why the occupation
plumber should stand out in this manner.

3.2

Fixed-x Resampling

The observations in Duncans occupational-prestige study are meant to represent a larger population of
occupations, but they are not literally sampled from that population. It therefore makes some sense to think
of these occupations, and hence the pairs of income and education values used in the regression, as xed
with respect to replication of the study. The response values, however, are random, because of the error
component of the model. There are other circumstances in which it is even more compelling to treat the
predictors in a study as xed for example, in a designed experiment where the values of the predictors
are set by the experimenter.
How can we generate bootstrap replications when the model-matrix X is xed? One way to proceed
is to treat the tted values yi from the model as giving the expectation of the response for the bootstrap

}. The
samples. Attaching a random error to each yi produces a xed-x bootstrap sample, yb = {ybi
errors could be generated parametrically from a normal distribution with mean 0 and variance s2 (the
estimated error variance in the regression), if we are willing to assume that the errors are normally distributed,
or nonparametrically, by resampling residuals from the original regression.5 We would then regress the
bootstrapped values yb on the xed X matrix to obtain bootstrap replications of the regression coecients.
Applying this procedure to the Huber regression for Duncans data proceeds as follows:
>
>
>
>
+
+
+
+

fit <- fitted(mod.duncan.hub)


e <- residuals(mod.duncan.hub)
X <- model.matrix(mod.duncan.hub)
boot.huber.fixed <- function(data, indices, maxit=20){
y <- fit + e[indices]
mod <- rlm(y ~ X - 1, maxit=maxit)
coefficients(mod)
}

> duncan.fix.boot <- boot(Duncan, boot.huber.fixed, 1999, maxit=100)


> duncan.fix.boot
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Duncan, statistic = boot.huber.fixed, R = 1999, maxit = 100)
Bootstrap Statistics :
original
bias

std. error

5 A slightly better approach would be to take account of the leverages of the observations, since the residuals from a regression
have variances that depend upon the leverages, even when the errors arehomoscedastic (see Chapter 6 of the text). In ordinary
least-squares regression, we can simply resample modified residuals ei / 1 hi , where ei and hi are, respectively, the residual
and hat-value for the ith observation. In robust regression a similar, if more complex, adjustment can be made. See Davison
and Hinkley (1997: Sec. 6.5).

10

t1* -7.11070 0.05575762


t2* 0.70145 -0.00098526
t3* 0.48544 -0.00219429

3.77567
0.10860
0.08861

Notice that I calculated tted values, residuals, and the model matrix outside of the function
boot.huber.fixed so that these quantities are not unnecessarily recomputed for each bootstrap sample. In
this example, the bootstrap coecient standard errors for xed-x resampling are smaller than for random-x
resampling, and indeed the former are quite similar to the estimated asymptotic standard errors for the
Huber estimator.
Examining the jackknife-after-bootstrap plot for the xed-x resampling results (Figure 4) provides some
insight into the properties of the method:
> par(mfcol=c(2,1))
> jack.after.boot(duncan.fix.boot, index=2, main=(a) income coefficient)
> jack.after.boot(duncan.fix.boot, index=3, main=(b) education coefficient)
>
The quantile traces in Figure 4 are much less variable than in Figure 3 for random-x resampling, because
in xed-x resampling residuals are decoupled from the original observations. In eect, xed-x resampling
enforces the assumption that the errors are identically distributed by resampling residuals from a common
distribution. Consequently, if the model is incorrectly specied for example, if there is unmodelled
nonlinearity, non-constant error variance, or outliers these characteristics will not carry over into the
resampled data sets. For this reason, it may be preferable to perform random-x resampling even when it
makes sense to think of the model matrix as xed.

Bootstrap Hypothesis Tests

It is also possible to use the bootstrap to construct an empirical sampling distribution for a test statistic.
Imagine, for example, that in Duncans regression, we want to use the robust-regression estimator to test
the hypothesis that the income and education coecients are the same, H0 : 1 = 2 . This hypothesis
arguably makes some sense, since both predictors are scaled as percentages. We could test the hypothesis
with the Wald-like statistic
b1 b2
z=

1/2
0

(0, 1, 1)V(b)
1
1
where b is the vector of estimated regression coecients; b1 and b2 are respectively the income and education

coecients; and V(b)
is the estimated asymptotic covariance matrix of the coecients. Applying this formula
to the data:
>
>
>
>
>
>
>

b <- coefficients(mod.duncan.hub)
sumry <- summary(mod.duncan.hub) # to obtain coef. cov. matrix
V <- sumry$cov.unscaled * sumry$stddev^2 # coef. cov. matrix
L <- c(0, 1, -1) # hypothesis matrix
b1 <- b[2] - b[3]
t <- b1/sqrt(L %*% V %*% L)
t # test statistic
[,1]
[1,] 1.174

If we can trust the asymptotic normality of b and its asymptotic covariance matrix, then z is distributed
as a standard normal variable under the null hypothesis. The test statistic z = 1.174 (two-tail p = .240)
therefore suggests that the dierence between 1 and 2 is not statistically signicant. In a small sample
such as this, however, we might be more comfortable relying on the bootstrap.
Using random-x resampling, I generated R = 999 resampled values of z:
11

5, 10, 16, 50, 84, 90, 95 %-iles of (T*-t)


-0.3
-0.1
0.1 0.2
5, 10, 16, 50, 84, 90, 95 %-iles of (T*-t)
-0.2 -0.1 0.0 0.1 0.2

(a) income coefficient


* * * * ** * **** ***** **** * ** ******
* * ** *****
* * * * *** * **** **** ***
* * * * * * ** *** *** * ** *****

* ** ** * **** * * * *
* ** ** ** *** * * * *
* * ** ** *** * * * *

* * * * ** * *** *** **** * ** ***** * * ** * * *** * * * *


* * * * ** ** **** *** **** * *** *****
* * * * ** **** **** **** * * *****
* * * * * * *** *** **** ** *****
*28 3 * 35 10** 30
39
41

-2

38
42

16

11

* * *** * **** * * * **
* * * * * *** * * *
* * ** * * *** * * * *

1
24
33
9
7 15
45
4
20
40
37 2
22
36
5
6
31
12
8
26
25
44
43
21
29
32 27
13 34
17
14

18

19

-1
0
1
standardized jackknife value

23

(b) education coefficient


**
** *
*** *

*** *
*
*****
*****

*
*
*

*
*
*

** ** * *** * * * **
*
***** * *** * * * **
***** * ***** * * **

*
*
*

* ** * ***** ********* * ***** *** * * * **

*
*
*

*
*
*

19

-3

23

** *
** *
** *
18
2

******
*** *
*****

*
********** **
******* ** **
* *
****** *** *

**** * *
************** **
*** ** *
* ****

**** * *** * * * **
***** * *** * * * **
****** *** ** ** **
25
8
3

31
2429 27 9
40
15 1 5
3238
45
14
1721 44
13 7
36
33
6 30 34 2237
11
20
4
10 35 16 28 43
26

12

-2
-1
0
1
standardized jackknife value

*
*
*

39
41

42

Figure 4: Jackknife-after-bootstrap plot for the income (a) and education (b) coecients in the Huber
regression for Duncans occupational-prestige data, using xed-x resampling

12

> boot.test <- function(data, indices, maxit=20){


+
data <- data[indices,]
+
mod <- rlm(prestige ~ income + education, data=data, maxit=maxit)
+
sumry <- summary(mod)
+
V.star <- sumry$cov.unscaled * sumry$stddev^2
+
b.star <- coefficients(mod)
+
b1.star <- b.star[2] - b.star[3]
+
t <- (b1.star - b1)/sqrt(L %*% V.star %*% L)
+
t
+
}
> duncan.test.boot <- boot(Duncan, boot.test, 999, maxit=100)
> summary(duncan.test.boot$t)
object
Min.
:-6.075
1st Qu.:-1.242
Median : 0.159
Mean
:-0.111
3rd Qu.: 1.076
Max.
: 5.795
Notice that the proper bootstrap analogy is to test the bootstrapped dierence in regression coecients,
bb1 bb2 for the bth bootstrap sample, against the hypothesized value b1 b2 (that is, the dierence in the
original sample), not against a dierence of 0. The summary of the distribution of the bootstrapped test
statistics z suggests that the observed value z = 1.174 is not unusually large. Let us look a little more
closely by plotting the distribution of z (Figure 5):
>
>
>
>

par(mfrow=c(1,2))
qqnorm(duncan.test.boot$t)
qqline(duncan.test.boot$t)
abline(0, 1, lty=2)

> hist(duncan.test.boot$t, breaks=50)


> box()
> abline(v=t, lwd=3)
>
The distribution of z is not far from normal, but it is much more variable than the standard normal
distribution; the position of z = 1.174, indicated by the vertical line in the histogram, shows that this is
not a rare value under the null hypothesis. We may nd a p-value for the bootstrap test by computing the
(adjusted) two-tail proportion of z s beyond |z|,
p=2

1 + #R
> z)
b=1 (z
R+1

>
# 2-sided p-value
> 2 * (1 + sum(duncan.test.boot$t > as.vector(t)))/1000
[1] 0.454
The nonparametric p-value of .45 is much larger than the value p = .24 obtained from the standard-normal
distribution.6
6 The slightly awkward construction as.vector(t) changes t from a 1 1 matrix into a one-element vector, allowing the
computation to proceed.

13

Frequency

2
0
-2
-4

-6

Sample Quantiles

10 20 30 40 50 60

Histogram of duncan.test.boot$t

Normal Q-Q Plot

-3

-2

-1

-6

Theoretical Quantiles

-4

-2

duncan.test.boot$t

Figure 5: Distribution of the bootstrapped test statistic z for the hypothesis that the coecients of income
and education are equal. The broken line in the normal quantile-comparison plot is for a standard-normal
distribution; the solid line is t to the data. The vertical line in the histogram is at z = 1.174, the observed
value of the test statistic in the original sample.

Concluding Remarks

Extending random-x resampling to other sorts of parametric regression models, such as generalized linear
models, is straightforward. In many instances, however, xed-x resampling requires special treatment, as
does resampling for nonparametric regression.
The discussion in the preceding sections assumes independent random sampling, but bootstrap methods
can easily be adapted to other sampling schemes. For example, in stratied sampling, bootstrap resampling
is simply performed within strata, building up a bootstrap sample much as the original sample was composed
from subsamples for the strata. Likewise, in a cluster sample, we resample clusters rather than individual
observations. If the elements of the sample were selected with unequal probability, then so must the elements
of each bootstrap sample.
The essential point is to preserve the analogy between the selection of the original sample from the
population and the selection of each bootstrap sample from the original sample. Indeed, one of the attractions
of the bootstrap is that it can provide correct statistical inference for complex sampling designs which often
are handled by ad-hoc methods. The software in the boot library can accommodate these complications;
see, in particular, the stype and strata arguments to the boot function.

References
Davison, A. C. & D. V. Hinkley. 1997. Bootstrap Methods and their Application. Cambridge: Cambridge
University Press.
Efron, B. 1979. Bootstrap Methods: Another Look at the Jackknife. Annals of Statistics 7:126.
Efron, B. & R. J. Tibshirani. 1993. An Introduction to the Bootstrap. New York: Chapman and Hall.

14

You might also like