You are on page 1of 58

1 An introduction to the Bootstrap

The bootstrap is an important tool of modern statistical analysis.


It establishes a general framework for simulation-based statisti-
cal inference. In simple situations the uncertainty of an estimate
may be gauged by analytical calculations leading, for example,
to the construction of condence intervals based on an assumed
probability model for the available data. The bootstrap repla-
ces complicated and often inaccurate approximations to biases,
variances and other measures of uncertainty by computer simu-
lations.
The idea of the bootstrap:
The random sample Y1 , . . . , Yn is generated by drawing ob-
servations independently and with replacement from the un-
derlying population (with distribution function F ) For each
interval [a, b] the probability of drawing an observation in
[a, b] is given by P (Y [a, b]) = F (b) F (a).
n large: The empirical distribution of the sample values is
close to the distribution of Y in the underlying population.
The relative frequency Fn (b) Fn (a) of observations in [a, b]
converges to P (Y [a, b]) = F (b) F (a) as n .
The idea of the bootstrap consists in mimicking the data ge-
nerating process. Random sampling from the true population
is replaced by random sampling from the observed data.
This is justied by the insight that the empirical distributi-
on of the observed data is similar to the true distribution
(Fn F for n ).
Literature: Davison, A.C. and Hinkley, D.V. (2005): Bootstrap
Methods and their Applications; Cambridge University Press
Inference@LS-Kneip 11
Setup:
Original data: i.i.d. random sample Y1 , . . . , Yn ; the distribu-
tion of Yi depends on an unknown parameter (vector)
The data Y1 , . . . , Yn is used to estimate estimator
(Y1 , . . . , Yn )
We are interested in evaluating the distribution of (resp.
) in order to provide standard errors, to construct con-
dence intervals, or to perform tests of hypothesis.
The bootstrap approach:
1) Bootstrap samples: Random samples Y1 , . . . , Yn are genera-
ted by drawing drawing observations independently and with
replacement from the available sample Y1 , . . . , Yn .
2) Bootstrap estimates: (Y1 , . . . , Yn )
3) In practice: Steps 1) and 2) are repeated m times (e.g. m =
2000) m values 1 , 2 , . . . , m

4) The (empirical) distribution of is used to approximate the


distribution of .

Inference@LS-Kneip 12
1.1 Why does the bootstrap work?

The theoretical justication of the bootstrap is based on asym-


ptotic arguments. Usually the bootstrap does not provide very
good approximations for extremely small sample size. It must,
however, be emphasized that in some cases bootstrap condence
intervals can be more accurate for moderate sample sizes than
condence intervals based on standard asymptotic approximati-
ons.

Example 1: Estimating a proportion


Data: i.i.d. random sample Y1 , . . . , Yn ; Yi {0, 1} is dicho-
tomous, P (Yi = 1) = p, P (Yi = 0) = 1 p.
The problem is to estimate p.
Let S denote the number of Yi which are equal to 1. The
maximum likelihood estimate of p is p = S/n.
Recall: np = S B(n, p)
As n the central limit theorem implies that

n(p p)
L N (0, 1)
p(1 p)

n large: the distributions of n(pp) and pp can be appro-
ximated by N (0, p(1 p)) and N (0, p(1 p)/n), respectively.

For simplicity we will write distr( n(pp)) N (0, p(1p))
as well as distr(p p) N (0, p(1 p)/n).
Bootstrap:
Random sample Y1 , . . . , Yn generated by drawing observati-
ons independently and with replacement from Yn := {Y1 , . . . , Yn }.
Let S denote the number of Yi which are equal to 1.
Inference@LS-Kneip 13
Bootstrap estimate of p: p = S /n
The distribution of p depends on the observed sample Yn :=
{Y1 , . . . , Yn }!. A dierent sample will lead to a dierent distri-
bution. The bootstrap now tries to approximate the true distri-
bution of p p by the conditional distribution of p p given
the observed sample Yn . The bootstrap is called consistent if
asymptotically (n ) the conditional distribution of p p
coincides with the true distribution of pp (note: a proper scaling
is required!)
We obtain

P (Yi = 1) = P (Yi = 1| Yn ) = p,
P (Yi = 0) = P (Yi = 0| Yn ) = 1 p

and

E (p ) = E(p | Yn ) = p,
p(1 p]
V ar (p ) = E[(p p)2 | Yn ] =
n
The conditional distribution of np = S given Yn is equal
to B(n, p). In a slight abuse of notation we will write

(np |Yn ) B(n, p)

or
distr(np |Yn ) = B(n, p)

Inference@LS-Kneip 14
As n the centrallimit theorem implies that the (condi-
n(p p)
tional) distribution ( p(1p) |Yn ) converges (stochastically)
to a N (0, 1)-distribution. Moreover, p is a consistent estima-
tor of p and therefore p(1 p) P p(1 p) as n .
This implies that asymptotically p(1 p) may be replaced
by p(1 p), and


n(p p)
The law of ( |Yn ) converges stochastically
p(1 p)
to a N (0, 1)-distribution

More precisely, as n
( )
n(p p)
sup |P |Yn ()| P 0,
p(1 p)
where denotes the distribution function of the standard
normal distribution.
We can conclude that for large n

distr( n(p p)|Yn ) distr( n(p p)) N (0, p(1 p))

as well as

distr(p p|Yn ) distr(p p) N (0, p(1 p)/n)

Bootstrap consistent

Inference@LS-Kneip 15
Example 2: Estimating a population mean
Let Y1 , . . . , Yn denote an i.i.d. random sample with mean
and variance 2 . In the following F will denote the corre-
sponding distribution function.
n
Y = n1 i=1 Yi is an unbiased estimator of
Problem: Construct a condence interval

Traditional approach for constructing a 1 condence interval:


2
Y N (, n )
n
Estimation of 2 : S 2 = n1
1
i=1 (Yi Y )
2


This implies: n Y S tn1 , and hence
S S
P (tn1,1 2 Y tn1,1 2 )
n n

95% condence interval: [Y tn1,1 2 Sn , Y + tn1,1 2 Sn ]

Remark: The construction relies on the assumption that Y


2
N (, n ). This is necessarily true if Y is normally distributed.
If the underlying distribution is not normal, then this condition
is approximately fullled is the sample size n is suciently large
(central limit theorem). In this case the constructed condence
interval must also be seen as an approximation

The bootstrap oers an alternative method for constructing such


condence intervals.

Inference@LS-Kneip 16
The bootstrap approach:
Random samples Y1 , . . . , Yn are generated by drawing obser-
vations independently and with replacement from the availa-
ble sample Yn := {Y1 , . . . , Yn }.
n
Y1 , . . . , Yn estimator Y = n1 i=1 Yi
Means and variances of the conditional distributions of Yi
and Y given Yn :
E (Yi ) = E(Yi |Yn ) = Y ,
1
n

V ar (Yi ) = E[(Yi Y ) | Yn ] = S :=
2 2
(Yi Y )2
n i=1

Moreover,
E (Y ) = Y ,
V ar (Y ) = S 2 /n

As n the central limit



theorem implies that the (condi-

tional) distribution of ( n(YS Y ) |Yn ) converges (stochasti-
cally) to a N (0, 1)-distribution.. Moreover, S 2 is a consistent
estimator of 2 and therefore S 2 P 2 as n . This
implies that asymptotically S may be replaced by , and


n(Y Y )
The law of ( |Yn ) converges stochastically

to a N (0, 1)-distribution
More precisely, as n
( )
n(Y Y )
sup |P |Yn ()| P 0,

where denotes the distribution function of the standard
normal distribution.
Inference@LS-Kneip 17
We can conclude that for large n

distr( n(Y Y )|Yn ) distr( n(Y )) N (0, 2 )

as well as

distr(Y Y |Yn ) distr(Y ) N (0, 2 /n)

Bootstrap consistent

Construction of a symmetric condence interval of level


1 :
Determine 2 and 1 2 quantiles t 2 and t1 2 of the con-
ditional distribution of Y given Yn := {Y1 , . . . , Yn } (the
bootstrap distribution):

P (Y t 2 ) , P (Y > t 2 ) 1 ,
2 2


P (Y t1 2 ) 1 , P (Y > t1 2 ) ,
2 2
Here, P denotes probabilities with respect to conditional
distribution of Y given Yn := {Y1 , . . . , Yn }.
In practice:
Draw m bootstrap samples (e.g. m = 2000) and calculate
the corresponding estimates Y1 , Y2 , . . . , Ym .

Order the resulting estimates Y(1) Y(2) Y(m) .

Set t := Y([m+1]
) and t1 := Y([m+1][1 ]) .
2 2

Inference@LS-Kneip 18
A basic bootstrap condence interval:
By construction of t 2 and t1 2 we have

P (Y Y t 2 Y ) , P (Y Y t1 2 Y ) 1 .
2 2
We have seen that the bootstrap is consistent, and therefore
distr(Y Y |Yn ) distr(Y ) asymptotically. This implies
that for large n

P (Y t 2 Y ) , P (Y t1 2 Y ) 1 ,
2 2
and therefore
( )
P Y (t1 2 Y ) Y (t 2 Y ) 1

Approximate 1 (symmetric) condence interval:

[2Y t1 2 , 2Y t 2 ]

The percentile interval:


In the older bootstrap literature the so-called percentile in-
terval
[t 2 , t1 2 ]
is usually recommended as a 1 condence interval.
The percentile interval can easily be justied if all underlying
distributions are symmetric, distr(Y Y |Yn ) distr(Y
Y |Yn ), distr(Y ) distr( Y ).
In practice the percentile interval is usually less precise than
the standard interval discussed above; there are however so-
me bias-corrected modications of the percentile interval which
allow better approximations.
Inference@LS-Kneip 19
General Setup: The nonparametric (naive) bootstrap

Data: Random sample Yn := {Y1 , . . . , Yn }; the distribution


of Yi depends on an unknown parameter (vector)
The data Y1 , . . . , Yn is used to estimate
estimator (Y1 , . . . , Yn )
Bootstrap: Random samples Y1 , . . . , Yn are generated by
drawing observations independently and with replacement
from the available sample Y1 , . . . , Yn Bootstrap estimates
(Y1 , . . . , Yn )
distr( |Yn ) is used to approximate distr( )

The bootstrap works for a large number of statistical and eco-


nometrical problems. Indeed, it can be shown that under some
mild regularity conditions the bootstrap is consistent, if
1) Generation of the bootstrap sample reects appropriately
the way in which the original sample has been generated
(i.i.d. sampling!).
2) The distribution of the estimator is asymptotically normal.
More precisely,

single parameter ( IR): n( ) N (0, v 2 ); v -

standard error of n( )

multivariate parameter vector ( IRd ): n( )

Nd (0, V ); V - covariance matrix of n( )

Consistent Bootstrap: distr( n( )|Yn ) distr( n(
)) [and distr( |Yn ) distr( )] if n is suciently large.
Bootstrap condence intervals, tests, etc.
Inference@LS-Kneip 110
Note:
Standard approaches to construct condence intervals and
tests are usually based on asymptotic normal approximati-

ons. For example, if IR and n( ) N (0, v 2 ) one
usually tries to determine an approximation v of v from the
data. An approximate 1 condence interval is then given
by
v v
[ z1 2 , + z1 2 ]
n n
In some cases it is very dicult to obtain approximations v of
v. Statistical inference is then usually based on the bootstrap
In contemporary statistical analysis the bootstrap is frequent-
ly used even for standard problems, where estimates v of v
are easily constructed. The reason is that in many situa-
tions bootstrap it can be shown that bootstrap condence
intervals or tests are more precise than those determined
analytically based on asymptotic formulas.

It must be emphasized that the bootstrap does not always work.


The bootstrap may fail if one of the above conditions 1) or 2) is
violated. Examples are
The naive bootstrap will not work if the i.i.d re-sample Y1 , . . . , Yn
from Y1 , . . . , Yn does not properly reect the way how the
Y1 , . . . , Yn is generated from the underlying population (e.g.
dependent data; Y1 , . . . , Yn not i.i.d.).
The distribution of the estimator is not asymptotically nor-
mal (e.g. extreme value problems)

Inference@LS-Kneip 111
General approach: Basic bootstrap 1 condence in-
terval
Random sample Yn := {Y1 , . . . , Yn }; unknown parameter (vec-
tor)
We will assume that the bootstrap is consistent: distr( |Yn )
distr( ) if n is suciently large.
Determine
2 and 1
2 quantiles t 2 and t1 2 of the condi-

tional distribution of given Yn := {Y1 , . . . , Yn } (the boot-
strap distribution):

P ( t 2 ) , P ( > t 2 ) 1 ,
2 2


P ( t1 2 ) 1 , P ( > t1 2 ) ,
2 2
Here, P denotes probabilities with respect to conditional
distribution of given Yn := {Y1 , . . . , Yn }.
Consistency of the bootstrap implies that for large n

P ( t 2 ) , P ( t1 2 ) 1 ,
2 2
and therefore
( )
P (t1 2 ) (t 2 ) 1

Approximate 1 (symmetric) condence interval:

[2 t1 2 , 2 t 2 ]

Inference@LS-Kneip 112
Example: Bootstrap condence interval for a median
Given: i.i.d. sample Yn := {Y1 , . . . , Yn }; Yi possesses a continuous
distribution with (unknown) density f .
We are now interested in estimating the median med of the
underlying distribution. Recall that the median is dened by
P (Yi med ) = P (Yi med ) = 0.5

med is estimated by the sample median med . Based on the


ordered sample Y(1) Y(2) Y(n) , med is given by

Y( n+1 ) if n is an odd number
med = 2
(Y( n ) + Y( n +1) )/2 if n is an even number
2 2

Construction of a condence interval for med is not an easy task.


Asymptotically we obtain
1
n(med med ) L N (0, )
4f (med )2
The problem is that the density f is unknown. In principle it
may be estimated by nonparametric kernel density estimation
and a corresponding plug-in estimate f(med ) may be used to
approximate the asymptotic variance. However, the bootstrap
oers a simple alternative.
Construction of a bootstrap condence interval:
Draw i.i.d. random samples Y1 , . . . , Yn from Yn and deter-
mine the corresponding medians med
Determine 2 and 1 2 quantiles t 2 and t1 2 of the condi-
tional distribution of med given Yn := {Y1 , . . . , Yn }.
Approximate 1 (symmetric) condence interval:
[2med t1 2 , 2med t 2 ]
Inference@LS-Kneip 113
1.2 Pivot statistics and the bootstrap-t me-
thod
In many situations it is possible to get more accurate bootstrap
condence intervals by using the bootstrap-t method (one also
speaks of studentized bootstrap condence intervals). The con-
struction relies on so-called pivot statistics.
Let Y1 , . . . , Yn be an i.i.d. random sample and assume that the
distribution of Y depends on an unknown parameter (or para-
meter vector) .
A statistics Tn T (Y1 , . . . , Yn ) is called pivot statistics, if
the distribution of Tn does not depend on any unknown
parameter.
A statistics Tn T (Y1 , . . . , Yn ) is called asymptotic pivot
statistics, if for suitable sequences an , bn of real numbers
the transformed statistics an Tn + bn possesses a well-dened,
non-degenerate asymptotic distribution, which does not de-
pend on the parameters of the unknown distribution of Y .

Example: Population mean: Y1 , . . . , Yn with mean , variance


2 > 0, and E|Y |3 = < . If Y is normally distributed we
obtain
n(Y )
tn1
S
n
i=1 (Yi Y ) , where tn1 denotes Students t-
1
with S 2 = n1 2

distribution with n 1 degrees of freedom. We can conclude that


Tn is a pivot statistics
Even if Y is not normally distributed, the central limit theorem
implies that
n(Y )
L N (0, 1)
S
Inference@LS-Kneip 114
In this case Tn is an asymptotic pivot statistics.

Bootstrap:
i.i.d. re-sample Y1 , . . . , Yn Y1 , . . . , Yn from Yn estimators
n
Y = n1 i=1 Yi and
n
S 2 = n11
i=1 (Yi Y )
2

n large approximately

n(Y Y ) n(Y )
distr( |Yn ) distr( ) N (0, 1)
S S
or
Y Y Y
distr( |Yn ) distr( )
S S

Therefore, the (conditional) distribution of Y S Y (given Yn )
Y
can be used to approximate the distribution of S .

Construction of a bootstrap-t condence interval of level 1 :


Determine
2 and 1 2 quantiles 2 and 1 2 of the condi-
Y Y
tional distribution of S given Yn :
Y Y
Y Y
P ( ) , P ( > ) 1 ,
S 2
2 S 2
2

Y Y Y Y
P ( 1 ) 1 , P ( > 1 ) ,
S 2
2 S 2
2
In practice:
Draw m bootstrap samples (e.g. m = 2000) and cal-
Y Y
culate the corresponding estimates Z1 : 1S , Z2 :=
1
Y2 Y
Ym Y
S2 , . . . , Zm := Sm .

Order the resulting estimates Z(1) Z(2)

Z(m) .
Inference@LS-Kneip 115

Set := Z([m+1]
) and 1 := Z([m+1][1 ]) .
2 2

Consistency of the bootstrap implies that asymptotically also


Y Y
P( 2 ) , P ( > 2 ) 1 ,
S 2 S 2

Y Y
P ( 1 2 ) 1 , P ( > 1 2 ) ,
S 2 S 2
This yields the 1 condence interval

[Y 1 S, Y S]

Inference@LS-Kneip 116
General construction of a bootstrap-t interval (unknown
real values parameter IR):
Random sample Yn := {Y1 , . . . , Yn }; unknown parameter (vec-
tor) . Assume that the estimator of is asymptotically normal,

( )
n( ) L N (0, v ) n 2
L N (0, 1)
v
and that a consistent estimator v v(Y1 , . . . , Yn ) of v is availa-
ble. One might then replace v by v to obtain

n L N (0, 1)
v

Obviously, n ()
v and
v are asymptotic pivot statistics.
Based on an i.i.d. re-sample Y1 , . . . , Yn from {Y1 , . . . , Yn },
calculate Bootstrap estimates and v .
Determine
2 and 1 2 quantiles 2 and 1 2 of the condi-

tional distribution of v given Yn .
Bootstrap-t interval

[ 1 v, v]

Inference@LS-Kneip 117
1.3 The Parametric bootstrap

A further increase of accuracy can be obtained in applications,


where the distribution of Y is known up to some parameter vec-
tors , (e.g: Y is normal with mean and variance 2 ; Y follows
an exponential distribution with parameter ). The dierence to
the nonparametric bootstrap discussed above consists in the way
how to generate a bootstrap re-sample Y1 , . . . , Yn .
Let = (1 , . . . , p ) , and for some known F let F (y, , ) denote
the distribution function of Y as a function of , . F is assumed
to be known. For simplicity, we will concentrate on constructing
a condence interval for .
The parametric bootstrap now proceeds as follows:
The unknown parameter vectors , are estimated by the
maximum likelihood method. Likelihood estimators ,
An i.i.d re-sample Y1 , . . . , Yn is generated by randomly
drawing observations from a F (, , ) distribution (using
a random number generator)
, .
The conditional distribution of given F (, , ) is used to
approximate the distribution of the estimator .
In almost all cases of practical interest condence intervals based
on the parametric bootstrap are more accurate than standard
intervals based on rst order asymptotic approximations. The
parametric bootstrap usually also provides more accurate appro-
ximations than its nonparametric counterpart discussed above.
Of course, this requires that the underlying distributional as-
sumption is satised (otherwise, the parametric bootstrap will
lead to incorrect results).
Inference@LS-Kneip 118
Basic parametric bootstrap condence interval:

[2 t1 2 , 2 t 2 ],

where t 2 and t1 2 now denote the


2 and 1 2 quantiles of the
conditional distribution of given F (, , ).
Bootstrap-t intervals:

Assume that the standard error v(, ) of n( ) can be
determined in dependence of the parameter (vectors) , .
i.i.d re-sample Y1 , . . . , Yn generated by randomly drawing
observations from a F (, , ) distribution
Parameter estimates , as well as bootstrap approxi-
mations v( , ) of the standard error.
Bootstrap-t interval

[ 1 v(, ), v(, )],

2 and 1 2 quantiles

where 2 and 1 2 now denote the

of the conditional distribution of v( , )
given F (, , ).

Note: Sometimes the following modication leads to even more


accurate intervals:
Determine the
2 and 1
2 quantiles 2 and 1 2 of he

conditional distribution of v(, )
given F (, , ).

Asymptotically we obtain


P ( 2 1 2 ) 1
v(, )

1 condence interval: Set of all with 2


v(,) 1 2

Inference@LS-Kneip 119
Example: Exponential distribution
Assume that Y follows an exponential distribution with parame-
ter . Density and distribution function are then given by
1 x/
f (y, ) = e , F (y, ) = 1 ex/

We have E(Yi ) = and V ar(Yi ) = 2 . The maximum likelihood
n 2
estimator of is given by = n1 i=1 Yi , and V ar() = n .
The parametric bootstrap can then be used to construct con-
dence intervals. The following procedure is straightforward, but
there also exist alternative approaches.
An i.i.d re-sample Y1 , . . . , Yn is generated by randomly dra-
wing observations from an exponential distribution with pa-
rameter .
Y1 , . . . , Yn Estimator
Calculation of
2 and 1
2 quantiles 2 and 1 2 with


P ( | 2 ) =
2

P ( | 1 2 ) = 1
2
where P () denotes probabilities calculated with respect to
the exponential distribution with parameter .
This yields

P ( 2 1 2 ) =

Condence interval: [
, ]
1
2 2

It can be shown, that or any nite sample of size n the coverage


probability of this interval is exactly equal to 1 .
Inference@LS-Kneip 120
1.4 More on Bootstrap Condence Intervals

Setup: i.i.d. random sample Yn := {Y1 , . . . , Yn }; unknown para-


meter (vector)
We will assume that the bootstrap is consistent: distr( |Yn )
distr( ) if n is suciently large.
In the previous sections we have already dened basic bootstrap
condence intervals as well as bootstrap-t intervals.

1.4.1 Basic condence interval

[2 t1 2 , 2 t 2 ],
where t 2 and t1 2 are the
2 and 1
2 quantiles of the condi-
tional distribution of given Yn .

1.4.2 Bootstrap-t Intervals

[ 1 v, v],
where 2 and 1 2 are the
2 and 1
2 quantiles of the condi-

tional distribution of v given Yn .

1.4.3 Percentile Intervals

The classical percentile condence interval is given by

[t 2 , t1 2 ]

Generally, this interval does not work extremely well in practice.


Inference@LS-Kneip 121
The so-called BCa method allows to construct better condence
intervals. The term BCa stands for bias-corrected and accelerated.
The BCa interval of intended coverage 1 is given by
[t1 , t2 ],
where t1 and t2 are the 1 and 2 quantiles of the conditional
distribution of given Yn , and
( )
+ z 2
1 = +
1 a( + z 2 )
( )
+ z1 2
2 = + ,
1 a( + z1 2 )
where is the standard normal distribution function, and where
z is the quantile of a standard normal distribution.
Note that the BCa interval reduces to a standard percentile in-
terval if = a = 0. However, a dierent choice of and a leads
to more accurate intervals:
The value of the bias-correction can be obtained from the pro-
portion of the bootstrap replications less than the original esti-
mate
( )
1
= P [ < ]

Calculation of the acceleration a is slightly more complicated. It


is based on Jacknife values of the estimator : For any i = 1, . . . , n
calculate the estimate i from the sample Y1 , . . . , Yi1 , Yi+1 , . . . , Yn
1
n
with the ith observation deleted. Let = n i=1 i and deter-
mine n
i=1 ( i )
3
a = n
6[ i=1 ( i )2 ]3/2
The BCa interval is motivated by theoretical results which show
Inference@LS-Kneip 122
that it is second order accurate.
Consider generally 1 condence intervals of the form [tlow , tup ]
of . Upper and lower bounds of such intervals are determined
from the data, tlow tlow (Y1 , . . . , Yn ), tup tup (Y1 , . . . , Yn ), and
their accuracy depends on the particular procedure applied.
(Symmetric) condence intervals are said to be rst-order
accurate if there exist some constant d1 , d2 < such that
for suciently large n
d1 d2
|P ( < tlow ) | , |P ( > tup ) | .
2 n 2 n
(Symmetric) condence intervals are said to be second-order
accurate if there exist some constant d3 , d4 < such that
for suciently large n
d3 d4
|P ( < tlow ) | , |P ( > tup ) | .
2 n 2 n
If the distribution of is asymptotically normal, then under some
additional regularity conditions it can usually be shown that
Standard condence intervals based on asymptotic appro-
ximations are rst-order accurate. The same holds for the
basic bootstrap intervals [2 t1 2 , 2 t 2 ] as well as for
the classical percentile method.
Bootstrap-t intervals as well as BCa intervals are second-
order accurate.
The dierence between rst and second-order accuracy is not
just a theoretical nicety. In many practically important situati-
ons second-order accurate intervals lead to much better approxi-
mations.
Another approach for constructing condence intervals is the
Inference@LS-Kneip 123
ABC method: ABC, standing for for approximate bootstrap
condence intervals, allows to approximate the BCa interval end-
points analytically, without using any Monte Carlo replications
at all ( reduced computational costs). The procedure works by
approximating the bootstrap sampling results by Taylor expan-
sions. It is then, however, required that (Y1 , . . . , Yn ) is a
smooth function of Y1 , . . . , Yn . This is for example not true for
the sample median.

Inference@LS-Kneip 124
1.5 Subsampling: Inference for a sample maxi-
mum
Data: i.i.d. random sample Yn := {Y1 , . . . , Yn }.
We now consider the situation that the Yi only takes values in a
compact interval [0, ] such that
P (Yi [0, ]) = 1.
Furthermore, Yi possesses a density f which is continuous on [0, ]
and satises f (y) > 0 for y (0, ], and f (y) = 0 for y [0, ].
The maximum of Yi is unknown and has to be estimated from
the data.
Similar type of extreme value problems frequently arise in eco-
nometrics. An example is the analysis of production eciencies
of dierent rms. The above situation may arise if we consider
production outputs Yi of a sample of rms with identical inputs.
A rm then is ecient if its output equals the maximal possible
value . Note that in practice usually more complicated problems
have to be considered, where production outputs dependent on
individually dierent values of input variables Frontier Ana-
lysis.
Consistent estimator of :
:= max Yi
i=1,...,n

Constructing a condence interval for is not an easy task. The


distribution of is not asymptotically normal. Indeed, it can
be shown that n( ) follows asymptotically an exponential
1
distribution with parameter = f () :
1
n( ) L Exp( )
f ()
Inference@LS-Kneip 125
The naive bootstrap fails:
i.i.d. re-sample Y1 , . . . , Yn from {Y1 , . . . , Yn } bootstrap
estimator := maxi=1,...,n Yi
Unfortunately, the bootstrap is not consistent
The reason is as follows: = Y(n) , and hence = = Y(n)
whenever Y(n) {Y1 , . . . , Yn }. Some calculations then show
that for large n

P ( = 0) = P ( = 0|Yn ) 1 e1 ,

while P ( = 0) = 0!
One can conclude that even for large sample sizes distr(
|Yn ) will be very dierent from distr( ) Basic boot-
strap condence intervals are incorrect.
A possible remedy is to use subsampling. Similar to the ordinary
bootstrap, subsampling relies on i.i.d. re-sampling from Y, and
the only dierence consists in the fact that subsampling is based
on drawing a smaller number < n of observations.

Inference@LS-Kneip 126
Subsampling bootstrap:
Choose some < n
Determine an i.i.d. re-sample Y1 , . . . , Yk by drawing ran-
domly observations from {Y1 , . . . , Yn } bootstrap esti-
mator := maxi=1,...,k Yi
For the above problem subsampling is consistent.
If = n for some 0 < < 1, then

The law of (( )|Yn ) converges stochastically


1
to a Exp( )-distribution
f ()

More precisely, as n , = n for some 0 < < 1,


( ) 1

sup |P ( ) |Yn F (; )| P 0,
f ()
1
where F (; f () ) denotes the distribution function of an ex-
1
ponential distribution with parameter = f () .

Asymptotically: distr(( )|Yn ) distr(n( )).


The subsampling bootstrap works under extremely general con-
ditions, and it can often be applied in situations where the ordi-
nary bootstrap fails. However, it usually does not make any sense
to apply subsampling in regular cases, where standard nonpara-
metric bootstrap is consistent. Then subsampling is less ecient,
and condence intervals based on subsampling are less accurate.
In practice, a major problem is the choice of .

Inference@LS-Kneip 127
Condence interval based on subsampling:
Calculation of
2 and 1
2 quantiles t 2 and t1 2 with

P ( t 2 ) =
2

P ( t1 2 ) = 1
2
where P () denotes probabilities calculated with respect to
the conditional distribution of given Yn .
This yields

P (t 2 ( ) t1 2 ) 1 ,

and consistency of the bootstrap implies

P (t 2 n( ) t1 2 ) 1 .

Condence interval for :



[ + t 2 , + t1 2 ]
n n

Inference@LS-Kneip 128
1.6 Appendix

1.6.1 The empirical distribution function

Data: i.i.d. sample X1 , . . . , Xn ; ordered sample X(1)


X(n) . The distribution of Xi possesses a distribution function F
dened by
F (x) = P (Xi x)

Let Hn (x) denote the number of observations Xi satisfying Xi


X. The empirical distribution function is then dened by

Fn (x) = Hn (x)/n = Proportion of observations Xi with Xi x

Properties:
0 Fn (x) 1
Fn (x) = 0 if x < X(1)
F (x) = 1 if x X(n)
Fn is a monotonically increasing step function

Inference@LS-Kneip 129
Example:

x1 x2 x3 x4 x5 x6 x7 x8
5,20 4,80 5,40 4,60 6,10 5,40 5,80 5,50

Empirical distribution function:

1.0

0.8

0.6

0.4

0.2

0.0
4.0 4.5 5.0 5.5 6.0 6.5

Inference@LS-Kneip 130
Theoretical properties of Fn
Theorem: For every x IR we obtain

Fn (x) B(n, F (x)),

i.e. Fn (x) follows a binomial distribution with parameters n and


F (x). The probability distribution of Fn (x) is thus givenn by

( m) n
P Fn (x) = = F (x)m (1F (x))nm , m = 0, 1, . . . , n
n m

Consequences:
E(Fn (x)) = F (x), i.e.. Fn (x) is an unbiased estimator of
F (x)
V ar(Fn (x)) = n1 F (x)(1 F (x)) the standard error of
Fn (x) decreases as n increaases. Fn (x) is a consistent esti-
mator of F (x)).

Theorem of Glivenko-Cantelli:
( )
P lim sup |Fn (x) F (x)| = 0 =1
n
xIR

Inference@LS-Kneip 131
1.6.2 Consistency of estimators

Any reasonable estimator of a parameter must be consistent.


Intuitively this means that the distribution of n must be-
come more and more concentrated around the true value as
n . The mathematical formalization of consistency relies on
general concepts quantifying convergence of random variables.

 
Convergence in probability:
Let X1 , X2 , . . . and X be random variables dened on a pro-
bability space (, A, P). Xn converges in probability to X
if
lim P [|Xn X| < ] = 1
n
for every > 0. One often uses the notation Xn P X
 

weak consistency:
An estimator is called weakly consistent if n P
 
 
Convergence in mean square:
Let X1 , X2 , . . . and X be random variables dened on a pro-
bability space (, A, P). Xn converges in mean square to
X if
( )
lim E |Xn X| = 0
2
n

Notation: Xn M SE X 

mean square consistency:
is mean square consistent if n M SE .
 

Inference@LS-Kneip 132

Strong Convergence (Convergence with probability 1):
Let X1 , X2 , . . . and X be random variables dened on a pro-
bability space (, A, P). Xn converges with probability 1
(or almost surely) to X if
[ ]
P lim Xn = X = 1
n

Notation: Xn a.s. X

Strong consistency (consistency with probability 1):
An estimator is strongly consistent if n a.s.
 

Xn M SE X implies Xn P X
Xn a.s. X implies Xn P X

Application: Law of large numbers


2
We obtain E(X) = as well as V ar(X) = n

2
M SE(X) := E((X ) ) = V ar(X) =
2
n 0
n
X P as n

Example: Consider a normally distributed random variable X


N (, (0, 18)2 ) with unknown mean but known standard deviation
= 0.18.
Random sample X1 , . . . , Xn Estimator X of .
2 2
Recall: X N (, n ) = N (, 0.18
n ).

n = 9 : standard error = 0, 06, M SE(X) = 0, 0036


n = 144 : standard error = 0, 015, M SE(X) = 0, 000225
Inference@LS-Kneip 133
n=9: P [ 0, 1176 X + 0, 1176] = 0, 95
n = 144 : P [ 0, 0294 X + 0, 0294] = 0, 95

n=9 n=144
1.5 1.5

1.0 1.0

0.5 0.5

0,025 0,025 0,025 0,025


0.0 0.0

Inference@LS-Kneip 134
1.6.3 Convergence in distribution

Let Z1 , Z2 , . . . be a sequence of random functions with distri-
bution functions F1 , F2 , . . . , and let Z be a random variable
with distribution function F . Zn konverges in distribution to
Z if

lim Fn (t) F (t) an every continuity point t von F


n

Notation: Zn L Z

The central limit theorem


Theorem (Ljapunov): Let X1 , X2 , . . . be a sequence of inde-
pendent random variables with means E(Xi ) = i and variances
V ar(Xi ) = E((Xi i )2 ) = i2 > 0. Furthermore assume that
E(|Xi i |3 ) = i < .

( ni=1 i )
1/3
If n
( i=1 i2 )1/2
0 as n then
n
i=1 (Xi i )
n L N (0, 1)
( i=1 i2 )1/2

Sometimes the notation Zn AN (0, 1) is used instead of Zn L


N (0, 1).
Important information about the speed of convergence to a nor-
mal distribution is given by the Berry-Esen theorem:

Inference@LS-Kneip 135
Theorem (Berry-Esen): Let X1 , X2 , . . . be a sequence of i.i.d.
random variables with mean E(Xi ) = and variance V ar(Xi ) =
E((Xi i )2 ) = 2 > 0. Then, if Gn denotes the distribution
function of n(X)
,

33 E(|Xi |3 )
sup |Gn (t) (t)|
t 4 3 n1/2

1.6.4 Stochastic order symbols (rates of convergence)

In mathematical notation the symbols O() and o() are often


used in order to quantify the speed (rate) of convergence of a
sequence of numbers.
Let 1 , 1 , 3 , . . . and 1 , 1 , 3 , . . . be a (deterministic) sequence
of numbers.
The notation n = O(1) indicates that the sequence 1 , 2 , . . .
is bounded. More precisely, there exists an M < such that
n M for all n IN.
n = o(1) means that Zn 0.
Zn = O(rn ) means that |Zn |/|rn | = O(1).
Z = o(rn ) means that |Zn |/|rn | 0.
n n
Examples: i=1 i = O(n2 ), i=1 i = o(n3 )
Stochastic order symbols OP () and oP () are used to quantify
the speed (rate) of convergence of a sequence of random varia-
bles. Let Z1 , Z2 , Z3 , . . . be a sequence of random variables, and
let r1 , r2 , . . . be either a deterministic sequence of number or a
sequence of random variables.
Inference@LS-Kneip 136
We will write Zn = Op (1) if for every > 0 there exists an
M < and an n IN such that

P (|Zn | > M ) fr alle n n

In other words, Zn = Op (1) indicates that the r.v. Zn are


stochastically bounded.
We will write Zn = oP (1) if and only if Zn P 0.
Zn = OP (Vn ) means|Zn |/|Vn | = OP (1).
Zn = oP (Vn ) means that |Zn |/|Vn | P 0.
Example: X = OP (n1/2 )

1.6.5 Important inequalities

Inequality of Chebychev:
1
P [|X | > k] for all k > 0
k2
1
P [ k X + k] 1
k2
k P [ k X + k]
2 1 1
4 = 0, 75
3 1 9 0, 89
1

4 1 1
16 = 0, 9375

Generalization:
E(|X |r )
P [|X | > k] for all k > 0, r = 1, 2, . . .
kr

Inference@LS-Kneip 137
Cauchy-Schwarz inequality:
Let x1 , . . . , xn and y1 , . . . , yn be arbitrary real numbers. Then
n
n n
( xi yi )2 ( x2i )( yi2 )
i=1 i=1 i=1

Integrated version:
( )2
b b b
f (x)g(x)dx ( f (x)2 dx)( g(x)2 dx)
a a a

Application to random variables:


2
(E(XY )) E(X 2 ) E(Y 2 )

Hlder inequality:
Sei p > 1 und p1 + 1q = 1
Let xi , yi 0, i = 1, . . . , n be arbitrary numbers. Then

n n
p 1/p

n
xi yi ( xi ) ( yiq )1/q
i=1 i=1 i=1

Integrated version: (f (x) 0, g(x) 0)


b b b
f (x)g(x)dx ( f (x)p dx)1/p ( g(x)q dx)1/q
a a a

Application to random variables:

E(|X| |Y |) (E(|X|p ))1/p (E(|Y |q ))1/q

Inference@LS-Kneip 138
2 Bootstrap and Regression Models
Problem: Analyze the inuence of some explanatory (indepen-
dent) variables X1 , X2 , . . . , Xp on a response variable (or de-
pendent variable) Y .
Observations
(Y1 , X11 , . . . , X1p ), (Y2 , X21 , . . . , X2p ), . . . , (Yn , Xn1 , . . . , Xnp )
Model
 

Yi = 0 + 1 Xi1 + 2 Xi2 + . . . + p Xip + i

1 , . . . , n i.i.d., E(i ) = 0, Var(i ) = 2

[ ]
i N (0, )
2
 

The linear structure of the regression function as postulated


by the model,

0 + 1 Xi1 + . . . + p Xip = m(Xi1 , . . . , Xip )


= E(Y |X1 = Xi1 , . . . , Xp = Xip ),

is necessarily fullled, if (Yi , Xi1 , Xi2 , . . . , Xip )T is a multi-


variate normal random vector.

Inference@LS-Kneip 21
Remark: Regression analysis is usually a conditional analysis.
The goal is to estimate the regression function m which is the con-
ditional expectation of Y given X1 , . . . , Xp . Standard inference
studies the behavior of estimators conditional on the observed
values.
However, dierent types of bootstrap may be used depending on
how the data is generated.
1) Random design:
(Y1 , X11 , . . . , X1p ), (Y2 , X21 , . . . , X2p ), . . . , (Yn , Xn1 , . . . , Xnp )
is a sample of i.i.d. random vectors, i.e. observations are in-
dependent and identically distributed.
Example: p + 1 measurements from n individuals randomly
drawn from an underlying population.
2) (Xj1 , . . . , Xjp ), j = 1, . . . , p, random vectors which are, ho-
wever, not independent or not identically distributed (e.g.
time series data, the X-variables are observed in successive
time periods).
3) Fixed design: Data are collected at are pre-specied, non-
random values Xjk (corresponding for example to dierent
experimental conditions).

Inference@LS-Kneip 22
The model can be rewritten in matrix notation:

Y =X+
E() = 0, Cov() = 2 In ,
[ Nn (0, 2 In )]


X X12 X1p
Y 11
1
. X21 X22 X2p
with Y = .. , X= .. ..

..
. . .
Yn
Xn1 Xn2 Xnp


0 1

1 2
=
.. , =
..
. .

p n

The parameter vector = (0 , . . . p )T is usually estimated by


least squares:
Least squares method: Determine 0 , 1 , . . . , p by minim-
zing

n
Q(0 , . . . , p ) = (Yi Yi )2
i=1
n
= (Yi 0 1 Xi1 . . . p Xip )2
i=1

Least squares estimator: = [XT X]1 XT Y


Inference@LS-Kneip 23
Let E and Cov denote conditional expectation and covariances
given the observed X-values.
Properties of
1. is an unbiased estimator of


E (0 )
0
. .
E () = .. = .. =

E (p ) p

2. Covariance matrix:

Cov () = Cov ([XT X]1 XT Y )


= [XT X]1 XT Cov(Y )X[XT X]1
= 2 [XT X]1 XT X[XT X]1
= 2 [XT X]1

3. Distribution under normality:


If i N (0, i2 ) then Nn (0, 2 In ), and consequently
( )
Np+1 , 2 [XT X]1

4. Asymptotic distribution: Assume that n1 i Xij Xik cjk as

well as n1 i Xij c0k as n Note that cjk = E(Xj Xk )
and c0j = E(Xj ) in the case of random design. Furthermore,
Let C denote the (p + 1) (p + 1) matrix with elements cjk,
j, k = 0, . . . , p, c00 = 1, cj0 = c0j , and assume that C is of
full rank. Then
( 2 1
)
n( ) Np+1 0, C

Inference@LS-Kneip 24
Estimation of 2 :

p
The residuals i = Yi Yi = Yi 0 j Xij estimate
j=1
the error term i
Estimator 2 of 2 :
1 n
2
= (Yi Yi )2
n p 1 i=1

2 is an unbiased estimator 2
If the true error terms i are normally distributed, then (n
2
p 1) 2 2np1
Let ij , i, j = 1, . . . , p + 1 denote the elements of the matrix
= [XT X]1 . Then, for normal errors,

j j
tnp1
jj

Standard condence intervals and tests for the parameter esti-


mates.

j j
Note: Under the normality assumption, is a Pivot stati-
jj
stics. In the general case (under some weak regularity conditions),
this quantity is an asymptotic Pivot statistics. jj /n converges
to the j th diagonal element of the matrix C, and therefore

j j
L N (0, 1) as n
jj

Inference@LS-Kneip 25
2.1 Bootstrapping Pairs

The usual, nonparametric is applicable if the data is generated by


a random design. Let Xi = (Xi1 , . . . , Xip ). The construction
of bootstrap condence intervals then proceeds as follows:
Basic bootstrap condence interval:
Original data: i.i.d. sample (Y1 , X1 ), . . . , (Yn , Xn )
Random samples (Y1 , X1 ), . . . , (Yn , Xn ) are generated by
drawing observations independently and with replacement
from the available sample Yn := {(Y1 , X1 ), . . . , (Yn , Xn )}.
(Y1 , X1 ), . . . , (Yn , Xn ) least squares estimators j , j =
1, . . . , p + 1.
Determine
2 and 1 2 quantiles t 2 ,j and t1 2 ,j of the condi-
tional distribution of j given Yn := {(Y1 , X1 ), . . . , (Yn , Xn )}.

P (j t 2 ,j ) , P (j > t 2 ,j 1 ,
2 2


P (j t1 2 ,j ) 1 , P (j > t1 2 ,j ) ,
2 2
Here, P denotes probabilities with respect to conditional
distribution of j given Yn .
Approximate 1 (symmetric) condence interval:

[2j t1 2 ,j , 2j t 2 ,j ]

Inference@LS-Kneip 26
Remark: Under some weak regularity conditions the bootstrap
is consistent, whenever

Yi = 0 + 1 Xi1 + 2 Xi2 + . . . + p Xip + i

for independent errors i with E(i ) = 0 and var(i ) = 2 (Xi ) <


. In other words, the basic bootstrap condence interval provi-
des an asymptotically (rst order) accurate condence interval,
even if the errors are heteroscedastic (unequal variances)! This is
not true for the standard t-intervals.
Modication: Bootstrap-t intervals:
Random samples (Y1 , X1 ), . . . , (Yn , Xn ) are generated by
drawing observations independently and with replacement
from the available sample Yn := {(Y1 , X1 ), . . . , (Yn , Xn )}.
Use (Y1 , X1 ), . . . , (Yn , Xn ) to determine least squares esti-
mators j , j = 1, . . . , p + 1 as well as estimators ( 2 ) of the
error variance 2 .

With jj denoting the j-th diagonal element of the matrix
= [(X )T X ]1 compute

j j

jj

Determine
2 and 1
2 quantiles 2 ,j and 1 2 ,j of the
j j
conditional distribution of
jj

This yields the 1 condence interval



[j 1 2 ,j jj , j 2 ,j jj ]

Dierent from the basic bootstrap interval, this bootstrap-t in-


terval will be incorrect for heteroscedastic error.
Inference@LS-Kneip 27
In order to understand bootstrap behavior for random design let
us analyze the simplest case with p = 1. Then Yi = 0 +1 Xi +i .
Consider the estimator

i (Xi X)i
1
(X i X)Y i
1 = i = 1 + n

i (Xi X) i (Xi X)
2 1 2
n

of the slope 1 .
Random design implies that (Yi , Xi ), and hence (i , Xi ), i =
1, . . . , n are independent and identically distributed. Under some
regularity conditions (existence of moments) we have
1
(Xi X)2 p E(Xi x )2 = X
2
,
n i

and the central limit theorem implies that


1
(Xi X)i L N (0, v,X
2
),
n i

where
( )
2
v,X = E (Xi x )2 2i .
If i and Xi are independent and 2 = var(i ) does not depend
2 2
on Xi , then v,X = X . We then generally obtain for large n
( 1 )

n i (X i X) i
distr( n(1 1 )) distr
i (Xi X)
1 2
n
( 1 )

n i (X i X) i
2
v,X
distr 2 N (0, 4 )
X x

Inference@LS-Kneip 28
Now consider the bootstrap estimator 1 ,


i (Xi X )i
1
(X X )Y
1 = i i
X )2
i
= 1 + 1
n
X )2
,
i (X i n i (X i

where i = Yi 0 1 Xi .
Recall that by denition, (Yi , Xi ), and hence (i , Xi ), i = 1, . . . , n
are independent and identically distributed observations (condi-

tional on Yn ). We obtain E( n1 i (Xi X )2 |Yn ) = n1 i (Xi
X)2 =: X
2
, and
1
| (Xi X )2 X
2
| P 0
n i
( )

as n . Moreover, E 1
n i (Xi X )i |Yn = 0 and
( )
1 1
var (Xi X )i |Yn = (Xi X)2 2i
n i n i

By the central limit theorem we obtain that for large n


( )
i (Xi X) i
1 2 2

distr n(1 1 )|Yn N (0, n
4 ).
X

Since n1 i (Xi X)2 2i P v,X
2
, x P x , we can conclude
that asymptotically
( )

distr( n(1 1 )) distr n(1 1 )|Yn

Bootstrap consistent

Inference@LS-Kneip 29
2.2 Bootstrapping Residuals

Bootstrapping residuals is applicable independent of the particu-


lar design of the regression model. The only crucial assumption
is that the error terms i are i.i.d. with constant variance 2 .
Residuals:

p
i = Yi Yi = Yi o j Xij
j=1

Matrix notation:


1
.
= .. = (I X[XT X]1 XT )Y = (I X[XT X]1 XT )
| {z }
n H

Cov() = 2 (I H)

With hii > 0 denoting the i-th diagonal element of H we thus


obtain
var(i ) = 2 (1 hii ) < 2

Standardized residuals:
i
ri = var(ri ) = 2
1 hii

We have i i = 0. For the standardized residuals it is, however,
1

not guaranteed that r = n i ri is equal to zero. The residual
bootstrap thus relies on resampling centered standardized resi-
duals ri := ri r.

Inference@LS-Kneip 210
Note: Residual plots play an important role in validating regres-
sion models.
a.) Nonlinear model:
Mangelnde Modellanpassung

4
2
residuals

0
2

0 50 100 150

fitted y

b.) Heteroscedasticity
Heteroskedadastizitt
100
50
0
Residuals

50
100
150
200

0 50 100 150

fitted y_i

Inference@LS-Kneip 211
Bootstrapping Residuals
Original data: i.i.d. sample (Y1 , X1 ), . . . , (Yn , Xn ) Estima-
tor
Calculate (centered) standardized residuals
i
ri = , ri = ri r, i = 1, . . . , n
1 hii
Generate random samples 1 , . . . , n of residuals by drawing
observations independently and with replacement from {r1 , . . . , rn }.
Calculate

p
Yi = 0 + j Xij + i , i = 1, . . . , n
j=1

Bootstrap estimators are determined by least squares esti-


mation from the data (Y1 , X1 ), . . . , (Yn , Xn ).
Basic bootstrap condence intervals:
Determine 2 and 1 2 quantiles t 2 ,j and t1 2 ,j of the
conditional distribution of j .

P (j t 2 ,j ) , P (j > t 2 ,j 1 ,
2 2

P (j t1 2 ,j ) 1 , P (j > t1 2 ,j ) ,
2 2
Here, P denotes probabilities with respect to conditional
distribution of j given Yn .
Approximate 1 (symmetric) condence interval:

[2j t1 2 ,j , 2j t 2 ,j ]

Bootstrap-t intervals can be determined similarly.


Inference@LS-Kneip 212
In order to understand the residual bootstrap let us again analyze
the simplest case with p = 1, and recall that


i (Xi X)i
1
(X i X)Y i
1 = i = 1 + n

i (Xi X) i (Xi X)
2 1 2
n

Let X := n i (Xi X)2 . If the errors i are i.i.d. zero mean
2 1

random variables with var(i ) = 2 , then (under some regularity


conditions) the central limit theorem implies that conditional on
the observed values X1 , . . . , Xn
( 1 )

n i (X i X) i 2
distr( n(1 1 )) = distr N (0, 2 )
i (Xi X)
1 2 X
n

holds for large n.


By denition,


i (Xi X)Yi i (Xi X)i
1

1 = = 1 + n
.
i (X i X)2
n
1
i (X i X) 2

We have
1 2
E(i |Yn ) = 0, var(i |Yn ) = ri =: 2 ,
n i

and therefore
( )
1 1

var (Xi X)i Yn = (Xi X)2 2
n i n i

The central limit theorem then leads to


( ) 2

distr n(1 1 )|Yn N (0, 2 ).
X

Bootstrap consistent, since 2 P 2 as n .


Inference@LS-Kneip 213
2.3 Wild Bootstrap

The residual bootstrap is not consistent if the errors i are


heteroscedastic, i.e. var(i ) = i2 . In this case the wild bootstrap
oers an alternative.
There are several versions of the wild bootstrap. In its simplest
form this procedure works as follows: Conditional on Yn , a boot-
strap sample 1 , . . . , n of residuals is determined by generating
n independent random variables from the following binary dis-
tributions:
( )
1 5
P i = i = ,
2
( )
1 5
P i = i = 1 ,
2

5+ 5
i = 1, . . . , n, where = 10 .

The constants are chosen in such a way that


E(i |Yn ) = E (i ) = 0
var(i |Yn ) = var (i ) = 2i
E((i )3 |Yn ) = E ((i )3 ) = 3i

Inference@LS-Kneip 214
Implementation of the wild bootstrap:
Original data: i.i.d. sample (Y1 , X1 ), . . . , (Yn , Xn ) Estima-
tor
Calculate (centered) standardized residuals
i
ri = , ri = ri r, i = 1, . . . , n
1 hii
Generate n independent random variables i from binary
distributions,
( )
1 5
P i = i = ,
2
( )
1 5
P i = i = 1 ,
2

5+ 5
i = 1, . . . , n, where = 10 .

Calculate

p
Yi = 0 + j Xij + i , i = 1, . . . , n
j=1

Bootstrap estimators are determined by least squares esti-


mation from the data (Y1 , X1 ), . . . , (Yn , Xn ).
Basic bootstrap condence intervals:
Determine 2 and 1 2 quantiles t 2 ,j and t1 2 ,j of the
conditional distribution of j .
Approximate 1 (symmetric) condence interval:

[2j t1 2 ,j , 2j t 2 ,j ]

Bootstrap-t intervals can be determined similarly.


Inference@LS-Kneip 215
In order to understand the basic intuition let us again analyze
the simplest case with p = 1, and recall that


(Xi X)Yi 1
(Xi X)i
1 = i = 1 + n
i
i (Xi X) i (Xi X)
2 1 2
n

It is now assumed that the errors i are independent with var(i ) =



i2 . Let X
2
:= n1 i (Xi X)2 and v,X
2
= n1 i (Xi X)2 i2 . Un-
der some regularity conditions the central limit theorem implies
that conditional on the observed values X1 , . . . , Xn
( 1 )

n i (X i X) i
2
v,X
distr( n(1 1 )) = distr N (0, 4 )
i (Xi X)
1 2 X
n

holds for large n.


As above,


i (Xi X)Yi i (Xi X)i
1

1 = = 1 + n
,
i (X i X)2
n
1
i (X i X) 2

and by construction
( )
1 1

var (Xi X)i Yn = (Xi X)2 2i =: w,X
2
.
n i n i

For large n, the central limit theorem then leads to


( ) 2
w,X
distr n(1 1 )|Yn N (0, 4 ).
X

We have E (2i ) = i2 + O( n1 ), and thus for large n


1
2
E (w,X ) = (Xi X)2 E (2i ) v,X
2
n i

Under some regularity conditions the law of large numbers then


Inference@LS-Kneip 216
implies that |w,X
2
v,X
2
| 0 as n . Wild bootstrap
consistent.

2.4 Generalizations

The above types of bootstrap (bootstrapping pairs, bootstrap-


ping residuals, wild bootstrap) can also be useful in more com-
plex regression setups. An appropriate method then has to be
selected in dependence of existing knowledge about underlying
design and structure of residuals.
1) Nonlinear regression:

Yi = g(Xi , ) + i ,

where g is a nonlinear function of .

Example: Depreciation of a car (CV Citroen )


X - Age of the car (in years)
selling price
Y - depreciation = original price (new car)

Wertverlust eines Autos


1.0
0.8
Y = relativer Wertverlust

0.6
0.4
0.2
0.0

0 2 4 6 8 10

X= Alter in Jahren

Inference@LS-Kneip 217
Model: Yi = eXi + i
An estimator is determined by (nonlinear) least squares;
residual: i = Yi eXi
Bootstrap: Random design bootstrapping pairs; bootstrap-
ping residuals for homoscedastic errors; wild bootstrap for
heteroscedastic errors.
2) Median Regression:

Linear model: Yi = 0 + j j Xij + i
In some applications the errors possess heavy tails ( out-
liers!). In such situations estimation of by least squares
may not be appropriate, and statisticians tend to use more
robust method. A sensible procedure then is to determine
estimates by minimizing

n
|Yi 0 j Xij |
i=1 j

over all possible . Solutions can be determined by numerical


optimization algorithms.
Inference is the usually based on the bootstrap. Random de-
sign bootstrapping pairs; bootstrapping residuals for ho-
moscedastic errors; wild bootstrap for heteroscedastic errors.
3) Nonparametric regression:
Model:
Yi = m(Xi ) + i
for some unknown function m. The function m can be esti-
mated by nonparametric smoothing procedures (kernel esti-
mation; local linear estimation; spline estimation). Inference
is often based on the bootstrap.
Inference@LS-Kneip 218
2.5 Time series
The general idea of the residual bootstrap can be adapted to
many dierent situations. For example, it can also be used in the
context of time series models.
Example: AR(1)-process:
Xt = Xt1 + t , t = 1, . . . , n
for i.i.d zero mean error terms with var(t ) = 2 . If || < 1 this
denes a stationary stochastic process.
Standard estimator of :
n
(Xt X)(Xt1 X)
= i=2n
i=1 (Xt X)
2

Asymptotic distribution:


n( ) L N (0, 1 2 )

Bootstrapping residuals
Calculate centered residuals
1
t = Xt Xt1 , t = t t , t = 2, . . . , n
n1 t

For some k > 0 generate random samples


k , k+1 , . . . , 0 , 1 , . . . , n
of residuals by drawing n + k + 1 observations independently
and with replacement from {1 , . . . , n }.

Generate a bootstrap time series by Xk = k and
Xt = Xt1 + 1 , t = k + 1, . . . , n

Determine bootstrap estimators from X1 , . . . , Xn .


Inference@LS-Kneip 219
Under the standard assumptions of AR(1) models this bootstrap
is consistent.
Basic bootstrap condence intervals:
Determine 2 and 1 2 quantiles t 2 and t1 2 of the condi-
tional distribution of .
Approximate 1 (symmetric) condence interval:

[2 t1 2 , 2 t 2 ,j ]

Bootstrap-t intervals can be determined similarly.

Inference@LS-Kneip 220