# Intro to Bootstrap April 2004

’

&

$

%

An introduction to Bootstrap Methods

Dimitris Karlis

Department of Statistics

Athens University of Economics

Lefkada, April 2004

17th Conference of Greek Statistical Society

http://stat-athens.aueb.gr/~karlis/lefkada/boot.pdf

Intro to Bootstrap April 2004

’

&

$

%

Outline

1. Introduction

2. Standard Errors and Bias

3. Conﬁdence Intervals

4. Hypothesis Testing

5. Failure of Bootstrap

6. Other resampling plans

7. Applications

Intro to Bootstrap April 2004

’

&

$

%

Monte Carlo Approximation

Suppose that the cdf F of the population is known. We want to calculate

µ(F) =

_

φ(y)dF(y)

We can approximate it by using

ˆ µ(F) =

1

M

M

i=1

φ(y

i

)

where y

i

, i =, . . . , M random variables simulated from F (or just a sample from

F).

We know that if M →∞ then ˆ µ(F) →µ(F).

What about if F in not known?

Intro to Bootstrap April 2004

’

&

$

%

Motivating Bootstrap

Remedy: Why not use an estimate of F based on the sample (x

1

, . . . , x

n

) at

hand?

The most well known estimate of F is the empirical distribution function

ˆ

F

n

(x) =

# observations ≤ x

n

or more formally

ˆ

F

n

(x) =

n

i=1

I(x

i

≤ x)

n

where I(A) is the indicator function and the subscript n reminds us that it is

based on sample of size n.

Intro to Bootstrap April 2004

’

&

$

%

Example: Empirical Distribution Function

Empirical Distribution Function

x

F

(

x

)

-2 -1 0 1 2

0

.

0

0

.

2

0

.

4

0

.

6

0

.

8

1

.

0

Figure 1: Empirical distribution function from a random sample of size n from a

standard normal distribution

Intro to Bootstrap April 2004

’

&

$

%

Bootstrap Idea

Use as an estimate the quantity µ(

ˆ

F

n

) instead of µ(F). Since

ˆ

F

n

is a consistent

estimate of F (i.e.

ˆ

F

n

→F if n →∞) then µ(

ˆ

F

n

) →µ(F).

Important: µ(

ˆ

F

n

) is an exact result. In practice it is not easy to ﬁnd it, so we use

a Monte Carlo approximation of it.

So, the idea of bootstrap used in practice is the following:

Generate samples from

ˆ

F

n

and use as an estimate of µ(F) the quantity the

ˆ µ(F) =

1

M

M

i=1

φ(y

∗

i

)

where y

∗

i

, i =, . . . , M random variables simulated from

ˆ

F

n

.

Intro to Bootstrap April 2004

’

&

$

%

Simulation from

ˆ

F

n

Simulating from

ˆ

F

n

is a relatively easy and straightforward task. The density

function

ˆ

f

n

associated with

ˆ

F

n

will be the one that gives probability 1/n to

all observed points x

i

, i = 1, . . . , n and 0 elsewhere.

Note: if some value occurs more than one time then it is given probability larger

than 1/n.

So we sample by selecting randomly with replacement observations from the

original sample.

A value can occur more than one in the bootstrap sample! Some other values

may not occur in the bootstrap sample

Intro to Bootstrap April 2004

’

&

$

%

A quick view of Bootstrap (1)

• Appeared in 1979 by the seminal paper of Efron. Predecessors existed for a

long time

• Popularized in 80’s due to the introduction of computers in statistical practice

• It has a strong mathematical background (though not treated here).

• In practice, it is based on simulation but for some few examples there are

exact solutions without need of simulation

• While it is a method for improving estimators, it is well known as a method

for estimating standard errors, bias and constructing conﬁdence intervals for

parameters

Intro to Bootstrap April 2004

’

&

$

%

A quick view of Bootstrap (2)

• It has minimum assumptions. It is merely based on the assumption that the

sample is a good representation of the unknown population

• It is not a black box method. It works for the majority of problems but it

may be problematic for some others

• In practice it is computationally demanding, but the progress on computer

speed makes it easily available in everyday practice

Intro to Bootstrap April 2004

’

&

$

%

Types of Bootstrap

• Parametric Bootstrap: We know that F belongs to a parametric family of

distributions and we just estimate its parameters from the sample. We

generate samples from F using the estimated parameters.

• Non-parametric Bootstrap: We do not know the form of F and we estimate it

by

ˆ

F the empirical distribution obtained from the data

Intro to Bootstrap April 2004

’

&

$

%

The general bootstrap algorithm

1. Generate a sample x

∗

of size n from

ˆ

F

n

.

2. Compute

ˆ

θ

∗

for this bootstrap sample

3. Repeat steps 1 and 2, B time.

By this procedure we end up with bootstrap values

ˆ

θ

∗

= (

ˆ

θ

∗

1

,

ˆ

θ

∗

2

, . . . ,

ˆ

θ

∗

B

). We will

use these bootstrap values for calculating all the quantities of interest.

Note that

ˆ

θ

∗

is a sample from the unknown distribution of

ˆ

θ and thus it contains

all the information related to

ˆ

θ!

Intro to Bootstrap April 2004

’

&

$

%

An example: Median

Consider data x = (x

1

, . . . , x

n

). We want to ﬁnd the standard error of the sample

median. Asymptotic arguments exist but they refer to huge sample sizes, not

applicable in our case if n small. We use bootstrap.

• We generate a sample x

1

∗

by sampling with replacement from x. This is our

ﬁrst bootstrap sample

• For this sample we calculate the sample median, denote it as

ˆ

θ

∗

1

• Repeat steps 1 and 2, B times.

At the end we have B values

ˆ

θ

∗

= (

ˆ

θ

∗

1

,

ˆ

θ

∗

2

, . . . ,

ˆ

θ

∗

B

). This is a random sample from

the distribution of the sample median and hence we can use it to approximate

every quantity of interest (e.g. mean, standard deviation, percentiles etc).

Moreover, a histogram is an estimate of the unknown density of the sample

median. We can study skewness etc.

Intro to Bootstrap April 2004

’

&

$

%

Bootstrap Standard Errors

Denote θ

∗

i

the bootstrap value from the i−th sample, i = 1, . . . , B. The bootstrap

estimate of the standard error of

ˆ

θ is calculated as

se

B

(

ˆ

θ) =

_

1

B

B

i=1

_

ˆ

θ

∗

i

−

ˆ

θ

∗

_

2

where

ˆ

θ

∗

=

1

B

B

i=1

ˆ

θ

∗

i

This is merely the standard deviation of the bootstrap values.

Intro to Bootstrap April 2004

’

&

$

%

Bootstrap Estimate of Bias

Similarly an estimate of the bias of

ˆ

θ is obtained as

Bias(

ˆ

θ) =

ˆ

θ

∗

−

ˆ

θ

Note that even if

ˆ

θ is an unbiased estimate, since the above is merely an estimate

it can be non-zero. So, this estimate must be seen in connection with the

standard errors.

Intro to Bootstrap April 2004

’

&

$

%

Bootstrap Estimate of Covariance

In a similar manner with the standard errors for one parameter we can obtain

bootstrap estimates for the covariance of two parameters. Suppose that

ˆ

θ

1

and

ˆ

θ

2

are two estimates of interest (e.g. in the normal distribution they can be the

mean and the variance, in regression setting two of the regression coeﬃcients).

Then the bootstrap estimate of covariance is given by

Cov

B

(

ˆ

θ

1

,

ˆ

θ

2

) =

1

M

M

i=1

_

ˆ

θ

∗

1i

−

ˆ

θ

∗

1

__

ˆ

θ

∗

2i

−

ˆ

θ

∗

2

_

where (

ˆ

θ

∗

1i

,

ˆ

θ

∗

2i

) are the bootstrap values for the two parameters taken from the

i-th bootstrap sample

Intro to Bootstrap April 2004

’

&

$

%

Example: Covariance between sample mean and variance

Parametric bootstrap. Samples of sizes n = 20, 200 were generated from N(1, 1)

and Gamma(1, 1) densities. Both have µ = 1 and σ

2

= 1. suppose that

ˆ

θ

1

= ¯ x

and

ˆ

θ

2

= s

2

. Based on B = 1000 replications we estimated the covariance

between

ˆ

θ

1

and

ˆ

θ

2

.

Distribution

Normal Gamma

n=20 0.00031 0.0998

n=200 0.0007 0.0104

Table 1: Estimated covariance for sample mean and variance based on parametric

bootstrap (B = 1000). From theory, for the normal distribution the covariance is

0.

Intro to Bootstrap April 2004

’

&

$

%

Normal, n=20

mean

v

a

r

ia

n

c

e

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

0

.5

1

.0

1

.5

2

.0

Normal, n=200

mean

v

a

r

ia

n

c

e

0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

0

.5

1

.0

1

.5

2

.0

Gamma(1,1), n=20

mean

v

a

r

ia

n

c

e

0.0 0.5 1.0 1.5 2.0

0

1

2

3

4

Gamma(1,1), n=200

mean

v

a

r

ia

n

c

e

0.0 0.5 1.0 1.5 2.0

0

1

2

3

4

Figure 2: Scatterplot of sample mean and variance based on B = 1000 replications.

Intro to Bootstrap April 2004

’

&

$

%

Simple Bootstrap CI

Use the bootstrap estimate of the standard error and a normality assumption

(arbitrary in many circumstances) to construct an interval of the form

(

ˆ

θ −Z

1−a/2

se

B

(

ˆ

θ),

ˆ

θ −Z

1−a/2

se

B

(

ˆ

θ))

where Z

a

denotes the a-th quantile of the standard normal distribution. This is a

(1 −a) conﬁdence Interval for θ. It implies that we assume that

ˆ

θ follows a

Normal distribution.

Intro to Bootstrap April 2004

’

&

$

%

Percentile CI

The conﬁdence interval is given as

_

κ

a/2

, κ

1−a/2

_

where κ

a

denotes the a-th empirical quantile of the bootstrap values

ˆ

θ

∗

i

.

This is clearly non-symmetric and takes into account the distributional form of

estimate.

Intro to Bootstrap April 2004

’

&

$

%

Percentile-t CI

Improves the simple bootstrap CI in the sense that we do not need the normality

assumption. The interval has the form

(

ˆ

θ −ζ

1−a/2

se

B

(

ˆ

θ),

ˆ

θ −ζ

a/2

se

B

(

ˆ

θ))

where ζ

a

is the a-th quantile of the values ξ

i

, where

ξ

i

=

ˆ

θ

∗

i

−

ˆ

θ

se(

ˆ

θ

∗

i

)

Note that for the ξ

i

’s we need the quantities se(

ˆ

θ

∗

i

). If we do not know them in

closed forms (e.g. asymptotic standard errors) we can use bootstrap to estimate

them. The computational burden is double!

Note that ξ

i

are studentized values of

ˆ

θ

∗

i

.

Intro to Bootstrap April 2004

’

&

$

%

Bias Corrected CI

Improves the Percentile bootstrap ci in the sense that we take into account the

bias

The bootstrap bias-corrected percentile interval (BC):

(κ

p1

, κ

p2

)

where p

1

= Φ(z

α/2

+ 2b

0

) and p

2

= Φ(z

1−α/2

+ 2b

0

) with Φ(·) the standard

normal distribution function,

κ

a

is the a−quantile of the distribution of the bootstrap values (similar to the

notation for the percentile CI)

b

0

= Φ

−1

_

1

B

B

i=1

I(

ˆ

θ

∗

i

≤

ˆ

θ)

_

If the distribution of

ˆ

θ

∗

i

is symmetric, then b

0

= 0 and p

1

= a/2 and p

2

= 1 −a/2,

therefore the simple percentile CI are obtained.

Intro to Bootstrap April 2004

’

&

$

%

Comparison of CI

Which interval to use? things to take into account:

• Percentile CI are easily applicable in many situations

• Percentile-t intervals need to know se(

ˆ

θ).

• In order to estimate consistently extreme percentiles we need to increase B.

• Resulting CI not symmetric (except for the simple CI, which is the less

intuitive)

• Note the connection between CI and hypothesis testing!

Intro to Bootstrap April 2004

’

&

$

%

Example 1: Correlation Coeﬃcient

Consider data (x

i

, y

i

), i = 1, . . . , n Pearson’s correlation coeﬃcient is given by

ˆ

θ =

n

i=1

(x

i

− ¯ x)(y

i

− ¯ y)

¸

n

i=1

(x

i

− ¯ x)

2

n

i=1

(y

i

− ¯ y)

2

Inference is not so easy, results exist only under the assumption of a bivariate

normal population. Bootstrap might be a solution

Intro to Bootstrap April 2004

’

&

$

%

American Elections

The data refer to n = 24 counties in America. They are related to the American

presidential elections in 1844. The two variables are: the participation proportion

in the election for this county and the diﬀerence between the two candidates. The

question is whether there is correlation between the two variables. The observed

correlation coeﬃcient is

ˆ

θ = −0.369

participation

d

iffe

re

n

c

e

20 40 60 80 100

0

1

0

2

0

3

0

Figure 3: The data for the American Elections

Intro to Bootstrap April 2004

’

&

$

%

American Elections: Results

From B = 1000 replications we found

ˆ

θ

∗

= −0.3656, se

B

(

ˆ

θ) = 0.1801. The

asymptotic standard error given by normal theory as se

N

(

ˆ

θ) =

1−

ˆ

θ

2

√

n−3

= 0.1881.

Bias(

ˆ

θ) = 0.0042

simple bootstrap CI (-0.7228, -0.0167)

Percentile (-0.6901, 0.0019)

Percentile-t (-0.6731, 0.1420)

Bias Corrected (-0.6806, 0.0175)

Table 2: 95% bootstrap conﬁdence intervals for I

Intro to Bootstrap April 2004

’

&

$

%

More details in the Percentile-t intervals

In this case and since we have an estimate of the standard error of the correlation

coeﬃcient (though it is an asymptotic estimate) we can use percentile-t intervals

without need to reiterate bootstrap. To do so, from the bootstrap values

ˆ

θ

∗

i

,

i = 1, . . . , B, we calculate

ξ

i

=

ˆ

θ

∗

i

−

ˆ

θ

se(

ˆ

θ

∗

i

)

where

se(

ˆ

θ

∗

i

) =

1 −

ˆ

θ

∗2

i

√

n −3

Then we found the quantiles of ξ. Note that the distribution of ξ is skewed, this

explains the diﬀerent left limit of the percentile-t conﬁdence intervals.

Intro to Bootstrap April 2004

’

&

$

%

-0.8 -0.6 -0.4 -0.2 0.0 0.2

0

5

0

1

0

0

1

5

0

2

0

0

bootstrap values

-6 -4 -2 0 2

0

1

0

0

2

0

0

3

0

0

ksi

Figure 4: Histogram of the bootstrapped values and the ξ

i

’s. Skewness is evident

in both plots, especially the one for ξ

i

’s

Intro to Bootstrap April 2004

’

&

$

%

Example 2: Index of Dispersion

For count data a quantity of interest is I = s

2

/¯ x. For data from a Poisson

distribution this quantity is, theoretically, 1. We have accidents for n = 20

crossroads in one year period. We want to built conﬁdence intervals for I. The

data values are: (1,2,5,0,3,1,0,1,1,2,0,1,8,0,5,0,2,1,2,3).

We use bootstrap by resampling from the observed values. Using

ˆ

θ = I we found

(B = 1000):

ˆ

θ = 2.2659,

ˆ

θ

∗

= 2.105,

ˆ

Bias(

ˆ

θ) = 0.206, se

B

(

ˆ

θ) = 0.6929.

simple bootstrap CI (0.9077, 3.6241)

Percentile (0.8981, 3.4961)

Percentile-t (0.8762, 3.4742)

Bias Corrected (1.0456, 3.7858)

Table 3: 95% bootstrap conﬁdence intervals for I

Intro to Bootstrap April 2004

’

&

$

%

0 1 2 3 4

0

5

0

1

0

0

1

5

0

2

0

0

2

5

0

Index of dispersion

1

2

3

4

In

d

e

x

o

f d

is

p

e

rs

io

n

Figure 5: Histogram and boxplot for the index of dispersion. A small skewness is

apparent

Intro to Bootstrap April 2004

’

&

$

%

Some comments

• Asymptotic theory for I is not available; bootstrap is an easy approach to

estimate standard errors

• The CI are not the same. This is due to the small skewness of the bootstrap

values.

• The sample estimate is biased.

• For the percentile-t intervals we used bootstrap to estimate standard errors

for each bootstrap sample.

Intro to Bootstrap April 2004

’

&

$

%

Hypothesis Tests

Parametric Bootstrap very suitable for hypothesis testing.

Example: We have independent data (x

1

, . . . , x

n

), and we know that their

distribution is Gamma with some parameters. We wish to test a certain value of

the population mean µ: H

0

: µ = 1 versus H

1

: µ = 1.

Standard t-test applicable only via the Central Limit Theorem implying a large

sample size. For smaller size the situation is not so easy. The idea is to construct

the distribution of the test statistic using bootstrap (parametric).

So, the general algorithm is the following

Intro to Bootstrap April 2004

’

&

$

%

Hypothesis Tests

• Set the two hypotheses.

• Choose a test statistic T that can discriminate between the two hypotheses.

Important: We do not care that our statistic has a known distribution

under the null hypothesis

• Calculate the observed value t

obs

of the statistic for the sample

• Generate B samples from the distribution implied by the null hypothesis.

• For each sample calculate the value t

i

of the statistic, i = 1, . . . , B.

• Find the proportion of times the sampled values are more extreme than the

observed. ”Extreme” depends on th form of the alternative.

• Accept or reject according to the signiﬁcance level.

Intro to Bootstrap April 2004

’

&

$

%

Hypothesis Tests (2)

More formally, let us assume that we reject the null hypothesis at the right tail of

the distribution. Then an approximate p-value is given by

ˆ p =

B

i=1

I(t

i

≥ t

obs

) + 1

B + 1

ˆ p is an estimate of the true p-value, we can build conﬁdence intervals for this. A

good strategy can be to increase B if ˆ p is close to the signiﬁcance level.

Note that some researchers advocate the use of the simpler estimate of the

p-value, namely

˜ p =

B

i=1

I(t

i

≥ t

obs

)

B

ˆ p has better properties than ˜ p.

Intro to Bootstrap April 2004

’

&

$

%

Hypothesis Tests (Non-parametric Bootstrap)

The only diﬃculty is that we do not know the population density. Thus the

samples must be taken from

ˆ

F

n

. According to standard hypothesis testing theory,

we need the distribution of the test statistic under the null hypothesis. The data

are not from the null hypothesis, thus

ˆ

F

n

is not appropriate. A remedy can be to

rescale

ˆ

F

n

so as to fulﬁll the null hypothesis. then we take the bootstrap samples

from this distribution and we built the distribution of the selected test statistic.

Intro to Bootstrap April 2004

’

&

$

%

Example

Consider the following data

x = (−0.89, −0.47, 0.05, 0.155, 0.279, 0.775, 1.0016, 1.23, 1.89, 1.96).

We want to test H

0

: µ = 1, vs H

1

: µ = 1. We select as a test statistic

T = |¯ x −1|. Several other statistic could be used. Since ¯ x = 0.598 we ﬁnd

T

obs

= 0.402. In order

ˆ

F

10

to represent the null hypothesis we rescale our data so

as to have a mean equal to 1. Thus we add in each observation 0.402. The the

new sample is x

null

= x + 0.402. We resample with replacement from

ˆ

F(x

null

).

Taking B = 100 samples we found ˆ p = 0.18.

Important: Rescaling is not obvious in certain hypotheses.

Intro to Bootstrap April 2004

’

&

$

%

Permutation test

Hypothesis testing via parametric bootstrap is also known as Monte Carlo tests.

Alternative testing procedures are the so-called permutation or randomization

tests. The idea is applicable when the null hypothesis implies that the data do

not have any structure and thus, every permutation of the sample data, under the

null hypothesis is equally probable. Then we test whether the observed value is

extreme relative to the totality of the permutations. In practice since the

permutations are huge we take a random sample from them and we built the

distribution of the test statistic under the null hypothesis from them. The main

diﬀerence from permutation test to bootstrap tests is that in permutation tests

we sample without replacement in order to take a permutation of the data.

Intro to Bootstrap April 2004

’

&

$

%

Permutation test- Example

Consider the data about the American elections in 1844. We want to test the

hypothesis that H

0

: ρ = 0, vs H

1

: ρ < 0. We use as a test statistic r, i.e. the

sample correlation coeﬃcient. The observed value is −0.3698. We take B = 999

random permutations of the data by ﬁxing the one variable and taking

permutations of the other (sampling without replacement). We estimate the

p-values as right tail of the distribution. Then an approximate p-value is given by

ˆ p =

B

i=1

I(t

i

≤ t

o

bs) + 1

B + 1

=

B

i=1

I(t

i

≤ −0.3698) + 1

1000

=

39 + 1

1000

= 0.04

This is an estimate of the true p-value, we can built conﬁdence intervals for this,

either via asymptotic results or bootstrap etc. Note that in this case it is not so

easy to construct a bootstrap test. Sampling from the null hypothesis is not so

easy because we need to transform

ˆ

F

n

to reﬂect the null hypothesis! Therefore,

permutation tests are complementary to the bootstrap tests.

Intro to Bootstrap April 2004

’

&

$

%

rho

d

e

n

s

i

t

y

-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6

0

.

0

0

.

5

1

.

0

1

.

5

Figure 6: Density estimation for the bootstrapped values. The shadowed area is

the area where we reject the null hypothesis

Intro to Bootstrap April 2004

’

&

$

%

Non-Parametric Bootstrap Hypothesis Tests

Suppose two samples x = (x

1

, x

2

, . . . , x

n

) and y = (y

1

, y

2

, . . . , y

m

). We wish to

test the hypothesis that they have the same mean, i.e. H

0

: µ

x

= µ

y

versus

H

1

: µ

x

= µ

y

. Use as a test statistic T = |¯ x − ¯ y|. Under the null hypothesis a

good estimate of the population distribution is the combined sample

z = (x

1

, . . . , x

n

, y

1

, . . . , y

m

). Thus, sample with replacement from z. For each of

the B bootstrap samples calculate T

∗

i

, i = 1, . . . B Estimate the p-value of the

test as

ˆ p =

B

i=1

I(T

i

≥ t

obs

) + 1

B + 1

Other test statistics are applicable, as for example the well known two-sample

t-statistic. A general warning for selecting test statistics is the following: we

would like to select a ”pivotal” test statistic, i.e. a test statistic for which the

distribution does not vary. For example the t-statistic has this property as it is

standardized.

Intro to Bootstrap April 2004

’

&

$

%

Goodness of Fit test using Bootstrap

Parametric Bootstrap suitable for Goodness of ﬁt tests. We wish to test

normality, i.e. H

0

: F = N(µ, σ

2

) versus H

1

: F = N(µ, σ

2

). A well known test

statistic is the Kolmogorov - Smirnov test statistic D = max(|

ˆ

F

n

(x) −F(x)|).

This has asymptotically and under the null hypothesis a known and tabulated

distribution. Bootstrap based test do not use the asymptotic arguments but we

take samples from the normal distribution in H

0

(if the parameters are not

known, we need to estimate them). For each sample we obtain the value of the

test statistic and we construct the distribution of the test statistic. There is no

need to use D. Other ”distances” can be also use to measure deviations from

normality! The test can be used for any distribution!

Intro to Bootstrap April 2004

’

&

$

%

Failures of Bootstrap

• Small Data sets (because

ˆ

F

n

is not a good approximation of F)

• Inﬁnite moments (e.g. the mean of the Cauchy distribution).

• Dependence structures (e..g time series, spatial problems). Bootstrap is

based on the assumption of independence. Remedies exist

• Estimate extreme values (e.g. 99.99% percentile or max(X

i

)). The problem

is the non-smoothness of the functional under consideration

• Dirty Data: If outliers exist in our sample, clearly we do not sample from a

good estimate of F and we add variability in our estimates.

• Unsmooth quantities: There are plenty of theoretical results that relate the

success of bootstrap with the smoothness of the functional under

consideration

• Multivariate data: When the dimensions of the problem are large, then

ˆ

F

n

becomes less good as an estimate of F. This may cause problems.

Intro to Bootstrap April 2004

’

&

$

%

Choice of B

Choice of B depends on

• Computer availability

• Type of the problem: while B = 1000 suﬃces for estimating standard errors,

perhaps it is not enough for conﬁdence intervals.

• Complexity of the problem

Intro to Bootstrap April 2004

’

&

$

%

Variants of Bootstrap

• Smoothed Bootstrap: Instead of using

ˆ

f

n

we may use a smoothed estimate of

it for simulating the bootstrap samples. Such an estimate might be the

Kernel Density estimate

• Iterated Bootstrap: For non-smooth functionals (e.g. the median), we

perform another bootstrap round using at the bootstrap values

ˆ

θ

∗

i

• Bayesian Bootstrap: We generate from

ˆ

F but the probabilities associated

with each observation are not exactly 1/n for each bootstrap sample but they

vary around this value.

Intro to Bootstrap April 2004

’

&

$

%

Other Resampling Schemes: The Jackknife

The method was initially introduced as a bias-reduction technique. It is quite

useful for estimating standard errors. Let

ˆ

θ

(i)

, denotes the estimate when all the

observations except the i-th are used for estimation. Deﬁne

ˆ

θ

J

= n

ˆ

θ −(n −1)

ˆ

θ

(·)

where

ˆ

θ

(·)

=

1

n

ˆ

θ

(i)

ˆ

θ

J

is called the jackknifed version of

ˆ

θ and usually it has less bias than

ˆ

θ. It holds

that

se(

ˆ

θ

J

) =

¸

¸

¸

_

n −1

n

n

i=1

_

ˆ

θ

(i)

−

ˆ

θ

(·)

_

2

This is a good approximation of se(

ˆ

θ) as well.

Intro to Bootstrap April 2004

’

&

$

%

Other Resampling Schemes: Subsampling

In jackknife we ignore one observation at each time. The samples are created

without replacement. Thus, we have n −1 diﬀerent samples. Generalize the idea

by ignoring b ≥ 1 observations at each time. Similar formulas can be derived.

Complete enumeration is now diﬃcult: use Monte Carlo methods by taking a

sample of them.

This is the idea of subsampling. Subsamples are in fact samples from F and not

from

ˆ

F

n

. It can remedy the failure of bootstrap in some cases. Subsamples are of

smaller size and thus we need to rescale them (recall the factor n −1 in the se of

the jackknife version.

Split the entire sample in subsamples of equal size.

Intro to Bootstrap April 2004

’

&

$

%

Bootstrap in Linear Regression

There are two diﬀerent approaches:

1. Resample with replacement from the observations. Now each observation is

the entire vector associated with the original observation. For each

2. Apply bootstrap on the residuals of the model ﬁtted to the original data

The second one is preferable, since the ﬁrst approach violates the assumption for

constant design matrix

Bootstrapping in linear regression removes any distributional assumptions on the

residuals and hence allows for inference even if the errors do not follow normal

distribution.

Intro to Bootstrap April 2004

’

&

$

%

Bootstrapping the residuals

Consider the model Y = βX + using the standard notation.

The bootstrap algorithm is the following

• Fit the model to the original data. Obtain the estimates

ˆ

β and the residuals

form the ﬁtted model ˆ

i

, i = 1, . . . , n.

• Take a bootstrap sample

∗

= (

∗

1

, . . . ,

∗

n

) from the residuals by sampling

with replacement.

• Using the design matrix, create the bootstrap values for the response using

Y

∗

=

ˆ

βX +

∗

• Fit the model using as response Y

∗

and the design matrix X.

• Keep all the quantities of interest from ﬁtting the model (e.g. MSE,

F-statistic, coeﬃcients etc)

• Repeat the procedure B times.

Intro to Bootstrap April 2004

’

&

$

%

Example

n = 12 observations, Y is the wing length of a bird, X is the age of the bird. We

want to ﬁt a simple linear regression Y = α +βX.

x

y

4 6 8 10 12 14 16

2

3

4

5

Figure 7: The data and the ﬁtted line

Intro to Bootstrap April 2004

’

&

$

%

x

y

4 6 8 10 12 14 16

2

3

4

5

Figure 8: The ﬁtted lines for all the B = 200 bootstrap samples

Intro to Bootstrap April 2004

’

&

$

%

Results

θ mean std. err. 95% CI sample value

ˆ α 0.774 0.106 (0.572 , 0.963) 0.778

ˆ

β 0.266 0.009 (0.249 , 0.286) 0.266

ˆ σ 0.159 0.023 (0.106 , 0.200) 0.178

F -statistic 717.536 288.291 ( 406.493 , 1566.891) 523.78

R

2

0.984 0.0044 (0.975 , 0.993) 0.9812

Table 4: Bootstrap values for certain quantities. The correlation between ˆ α and

ˆ

β

was −0.89.

Intro to Bootstrap April 2004

’

&

$

%

0.5 0.6 0.7 0.8 0.9 1.0 1.1

0

1

0

2

0

3

0

4

0

alpha

0.23 0.25 0.27 0.29

0

1

0

2

0

3

0

beta

0.08 0.12 0.16 0.20

0

1

0

2

0

3

0

sigma

500 1000 1500 2000

0

2

0

4

0

6

0

8

0

F -statistic

0.975 0.980 0.985 0.990 0.995

0

1

0

2

0

3

0

4

0

R^2 alpha

b

e

ta

0.5 0.6 0.7 0.8 0.9 1.0 1.1

0

.2

3

0

.2

5

0

.2

7

0

.2

9

Figure 9: Histogram of the B = 200 bootstrap values and a scatterplot for the

bootstrap values of ˆ α and

ˆ

β.

Intro to Bootstrap April 2004

’

&

$

%

Parametric Bootstrap in Regression

Instead of non-parametric bootstrap we can use parametric bootstrap in a similar

fashion. This implies that we assume that the errors follow some distribution

(e.g. t or a mixture of normals). Then full inference is available based on

bootstrap, while this is very diﬃcult when using classical approaches.

The only change is that the errors for each bootstrap sample are generated from

the assumed distribution.

Intro to Bootstrap April 2004

’

&

$

%

Bootstrap in Principal Components Analysis

Principal Components Analysis is a dimension reduction technique that starts

with correlated variables and ends with uncorrelated variables, the principal

components, which are in descending order of importance and preserve the total

variability. If X is the vector with the original variables the PC are derived as

Y = XA, where A is a matrix that contains the normalized eigenvectors from

the spectral decomposition of the covariance matrix of X. In order the PCA to

me meaningful we need to keep PC less than the original variables. While

working with standardized data, a criterion is to select so many components that

correspond to eigenvalues greater than 1. A problem of major interest with

sample data is how to measure the sampling variability in the eigenvalues of the

sample covariance matrix. Bootstrap can be a solution.

Intro to Bootstrap April 2004

’

&

$

%

Bootstrap in PCA - Example

The data represent the performance of 26 athletes in heptathlon in the Olympic

Games of Sidney, 2000. We proceed with PCA based on the correlation matrix.

We resample observations with replacement. We used B = 1000.

Note: Our approach is non-parametric bootstrap. However we could use

parametric bootstrap by assuming a multivariate normal density for the

populations and sampling from this multivariate normal model with parameters

estimated from the data.

Intro to Bootstrap April 2004

’

&

$

%

order mean st.dev 95% CI observed value

1 3.0558 0.4403 (2.2548, 3.9391) 2.9601

2 1.6324 0.2041 (1.2331, 2.0196) 1.5199

3 1.0459 0.1830 (0.6833, 1.3818) 1.0505

4 0.6251 0.1477 (0.3733, 0.9416) 0.6464

5 0.3497 0.0938 (0.1884, 0.5441) 0.3860

6 0.1958 0.0658 (0.0846, 0.3401) 0.2778

7 0.0950 0.0413 (0.0306, 0.1867) 0.1592

log-determinant -4.1194 0.9847 (-6.2859, -2.4292) -2.9532

Table 5: Bootstrap estimates,standard error and CI, based on B = 1000 replica-

tions. An idea of the sample variability can be easily deduced.

Intro to Bootstrap April 2004

’

&

$

%

2.0 2.5 3.0 3.5 4.0 4.5

0

5

0

1

0

0

value

1.0 1.5 2.0 2.5

0

5

0

1

5

0

value

0.4 0.6 0.8 1.0 1.2 1.4 1.6

0

5

0

1

0

0

value

0.2 0.4 0.6 0.8 1.0

0

5

0

1

5

0

2

5

0

value

0.2 0.4 0.6

0

5

0

1

0

0

value

0.0 0.1 0.2 0.3 0.4

0

1

0

0

2

0

0

3

0

0

value

0.0 0.05 0.15 0.25

0

5

0

1

0

0

value

-10 -8 -6 -4 -2

0

1

0

0

3

0

0

log-determinant

0

1

2

3

4

Figure 10: Histogram of the eigenvlaues, the logarithm of the determinant and a

boxplot representing all the eigenvalues, based on B = 1000 replications

Intro to Bootstrap April 2004

’

&

$

%

Bootstrap in Kernel Density Estimation (1)

Kernel Density Estimation is a useful tool for estimating probability density

functions. Namely we obtain an estimate of the unknown density f(y) as

ˆ

f(y)

based on a sample (x

1

, . . . , x

n

)

ˆ

f(x) =

1

n

n

i=1

1

h

K

_

x −x

i

h

_

where K(·) is the kernel function and h is the bandwidth which makes the

estimate more smooth (h →∞) or less smooth (h →0). There are several choices

for the kernel, the standard normal density is the common one. For h we can

select it in an optimal way. For certain examples it suﬃces to use

h

opt

= 1.059n

−1/5

min

_

Q

3

−Q

1

1.345

, s

_

,

where Q

1

and Q

3

the ﬁrst and third quartile and s the sample standard deviation.

We want to create conﬁdence intervals for

ˆ

f(x).

Intro to Bootstrap April 2004

’

&

$

%

Bootstrap in Kernel Density Estimation (2)

We apply bootstrap:

Take B bootstrap samples from the original data. For each sample ﬁnd

ˆ

f

i

(x), the

subscript denotes the bootstrap sample. Use these values to create conﬁdence

intervals for each value of x. This will create a band around

ˆ

f(x), which is in fact

a componentwise conﬁdence interval and give us information about the curve.

Example: 130 simulated values

Intro to Bootstrap April 2004

’

&

$

%

x

d

e

n

s

it

y

2 4 6 8 10 12

0

.

0

0

.

1

0

.

2

0

.

3

Figure 11: Estimated

ˆ

f(x) for the data of our example

Intro to Bootstrap April 2004

’

&

$

%

x

d

e

n

s

it

y

2 4 6 8 10 12

0

.

0

0

.

1

0

.

2

0

.

3

Figure 12: The B = 100 bootstrap replicated estimated of

ˆ

f(x) for the data of our

example

Intro to Bootstrap April 2004

’

&

$

%

x

d

e

n

s

it

y

2 4 6 8 10 12

0

.

0

0

.

1

0

.

2

0

.

3

Figure 13: Componentwise 95% conﬁdence intervals based on the percentile boot-

strap approach (B = 100)

Intro to Bootstrap April 2004

’

&

$

%

Bootstrap in Time Series

Consider the AR(2) model

Y

t

= β

1

Y

t−1

+β

2

Y

t−2

+

t

In the classical case we assume normality for the residuals. If the assumption is

not made we can use bootstrap for inference

Intro to Bootstrap April 2004

’

&

$

%

The algorithm

1. Fit the AR(2) model to the data (y

1

, . . . , y

T

). Find

ˆ

β

1

,

ˆ

β

2

.

2. Obtain the residuals ˆ

i

from the ﬁtted model

3. Create the empirical distribution

ˆ

F

T

of the residuals

4. Start Bootstrap. Simulate a new series by

• Set y

∗

i

= y

i

, i = 1, 2, i.e. use as starting points the original observations

at that time

• Construct the entire series using the recursion

y

∗

t

=

ˆ

β

1

y

∗

t−1

+

ˆ

β

2

y

∗

t−2

+

∗

t

,

where

∗

t

is drawn from

ˆ

F

T

.

• Fit the AR(2) model to the bootstrapped series and obtain the parameters

• Repeat the steps above B times

Intro to Bootstrap April 2004

’

&

$

%

Example: Annual Peak ﬂows

The data consist of n = 100 observations about the annual peak ﬂows in a speciﬁc

area in the Missouri river in cm

3

/sec. The time period examined is 1898-1997.

Index

a

n

n

u

a

l

p

e

e

k

f

lo

w

0 20 40 60 80 100

2

4

6

8

Figure 14: The original series

Intro to Bootstrap April 2004

’

&

$

%

Example: Annual Peak ﬂows

θ mean std. err. 95% CI sample value

ˆ

β

1

0.155 0.0964 ( -0.032, 0.333) 0.112

ˆ

β

1

0.042 0.0923 (-0.118, 0.226) 0.063

ˆ σ

2

1.669 0.415 (1.087, 2.533) 1.579

Table 6: Bootstrap values for certain quantities. The correlation between ˆ α and

ˆ

β

was −0.16.

Intro to Bootstrap April 2004

’

&

$

%

-0.1 0.0 0.1 0.2 0.3 0.4

0

1

0

2

0

3

0

4

0

b1

-0.2 -0.1 0.0 0.1 0.2 0.3

0

1

0

2

0

3

0

4

0

b2

Figure 15: Histogram of the bootstrap values for the parameters (B = 200)

Intro to Bootstrap April 2004

’

&

$

%

-2 0 2 4 6

0

1

0

2

0

3

0

residuals b1

b

2

-0.1 0.0 0.1 0.2 0.3 0.4

-0

.2

-0

.1

0

.0

0

.1

0

.2

0

.3

Figure 16: Histogram of the ﬁtted residuals in the sample data and scatterplot of

the bootstrap values for the parameters

Intro to Bootstrap April 2004

’

&

$

%

t

yboot

0 20 40 60 80 100

-2

0

2

4

6

t

yboot

0 20 40 60 80 100

0

2

4

6

t

yboot

0 20 40 60 80 100

-2

0

2

4

6

t

yboot

0 20 40 60 80 100

-2

0

2

4

6

t

yboot

0 20 40 60 80 100

-1

0

1

2

3

t

yboot

0 20 40 60 80 100

-1

0

1

2

3

t

yboot

0 20 40 60 80 100

-2

0

2

4

6

t

yboot

0 20 40 60 80 100

-1

0

1

2

3

t

yboot

0 20 40 60 80 100

-2

0

2

4

6

t

yboot

0 20 40 60 80 100

-2

0

2

4

6

t

yboot

0 20 40 60 80 100

-2

-1

0

1

2

3

t

yboot

0 20 40 60 80 100

-2

-1

0

1

2

3

Figure 17: Some plots of the bootstrap series

Intro to Bootstrap April 2004

’

&

$

%

Bootstrap with dependent Data

Bootstrap is based on independent sampling from

ˆ

F

n

. For dependent data

standard bootstrap cannot be applied. The idea is that we need to mimic the

dependence of the data.

Remedies

• Moving Block Bootstrap: For example consider the data (x

1

, x

2

, . . . , x

12

). We

construct 4 blocks of 3 observations in each one, namely y

1

= (x

1

, x

2

, x

3

),

y

2

= (x

4

, x

5

, x

6

), y

3

= (x

7

, x

8

, x

9

), and y

4

= (x

1

0, x

11

, x

12

). Then we

resample blocks y

i

instead of the original observation. We keep some part of

the dependence, but we loose it when connecting blocks. In some sense we

add white noise to the series. Note that dependence for lags ≥ 3 vanishes.

• Overlapping blocks: We built the blocks overlapping, i.e. we deﬁne

y

1

= (x

1

, x

2

, x

3

), y

2

= (x

2

, x

3

, x

4

), . . . , y

11

= (x

11

, x

12

, x

1

), y

12

= (x

12

, x

1

, x

2

).

This adds less white noise but still there is missing dependence

Intro to Bootstrap April 2004

’

&

$

%

Bootstrap with dependent Data (2)

• Parametric Bootstrap: We apply a parametric model to catch the dependence

structure. For example, for linear relationships, by theory, we can ﬁnd an

appropriate AR model that approximates quite well the dependence structure

of the series. Or we combine parametric ideas with block bootstrapping, by

ﬁtting a model and using block bootstrap to the residuals of that model.

• There are several other most complicated methods suitable for certain types

of data (i.e. spatial dependence etc)

Intro to Bootstrap April 2004

’

&

$

%

More Applications

There are several more applications of bootstrap not treated here. Bootstrap is

suitable for a variety of statistical problems like

• Censored Data

• Missing Data problems

• Finite populations

• etc

However before applying bootstrap one must be sure that the bootstrap samples

are drawn from a good estimator of the unknown population density. Otherwise

bootstrap does not work!

Intro to Bootstrap April 2004

’

&

$

%

Selected Bibliography

The bibliography on bootstrap increases with high rates during the last years.

Some selected textbooks are the following (capturing both the theoretical and the

practical aspect of bootstrap)

Efron B., Tibshirani (1993) An introduction to the Bootstrap. Marcel and

Decker.

Efron, B. (1982) The jackknife, the bootstrap and other resampling plans. SIAM.

Pensylvania

Efron, B. and Tibshirani, R. (1986). Bootstrap methods for standard errors,

conﬁdence intervals and other measures of statistical accuracy. Statistical

Science, 1, 54–77.

Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their

Applications. Cambridge University Press, Cambridge.

Shao, J. and Tu, D. (1995) The Jackknife and Bootstrap . Springer

Intro to Bootstrap April 2004

’

&

$

%

Chernick, M. R. (1999) Bootstrap Methods: A practitioner’s Guide. Willey

Noreen, E.W. (1989) Computer Intensive Methods for Testing Hypotheses: An

Introduction. Willey

Hope, A.C.A (1968) A Simpliﬁed Monte Carlo Signiﬁcance Test Procedure.

Journal of the Royal Statistical Society, B, 30, 582–598.

Westfall, P.H. and Young, S.S. (1993) Resampling-Based Multiple Testing. Wiley

Good, P. (1998). Resampling Methods: A practical Guide to Data Analysis.

Birkhauser, Boston

Intro to Bootstrap April 2004

’

&

$

%

THE END

Bootstrap and dry bread...

i =. M random variables simulated from Fn . . So we sample by selecting randomly with replacement observations from the original sample. F ˆ Important: µ(Fn ) is an exact result. estimate of F (i. . .2 0.0
∗ φ(yi )
Figure 1: Empirical distribution function from a random sample of size n from a standard normal distribution & % &
∗ ˆ where yi . . . • In practice. Since Fn is a consistent ˆ ˆn → F if n → ∞) then µ(Fn ) → µ(F ). . .4
0. it is well known as a method for estimating standard errors.0
µ(F ) = ˆ
1 M
M i=1
0. In practice it is not easy to ﬁnd it. i = 1. Note: if some value occurs more than one time then it is given probability larger than 1/n. A value can occur more than one in the bootstrap sample! Some other values may not occur in the bootstrap sample
A quick view of Bootstrap (1)
• Appeared in 1979 by the seminal paper of Efron. Predecessors existed for a long time • Popularized in 80’s due to the introduction of computers in statistical practice • It has a strong mathematical background (though not treated here). . The density ˆ associated with Fn will be the one that gives probability 1/n to ˆ function fn all observed points xi . the idea of bootstrap used in practice is the following: ˆ Generate samples from Fn and use as an estimate of µ(F ) the quantity the
-2 -1 0 x 1 2
F(x) 0. n and 0 elsewhere. so we use a Monte Carlo approximation of it. So.6
1.e. it is based on simulation but for some few examples there are exact solutions without need of simulation • While it is a method for improving estimators.8
ˆ ˆ Use as an estimate the quantity µ(Fn ) instead of µ(F ).
%
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
ˆ Simulation from Fn
ˆ Simulating from Fn is a relatively easy and straightforward task.Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Example: Empirical Distribution Function Bootstrap Idea
Empirical Distribution Function
0. bias and constructing conﬁdence intervals for parameters
&
%
&
%
.

Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
A quick view of Bootstrap (2)
• It has minimum assumptions. θB ). ˆ∗ ˆ∗ ˆ∗ ˆ By this procedure we end up with bootstrap values θ∗ = (θ1 . We want to ﬁnd the standard error of the sample median. . ˆ ˆ Note that θ∗ is a sample from the unknown distribution of θ and thus it contains ˆ all the information related to θ! Consider data x = (x1 . . B time. . It is merely based on the assumption that the sample is a good representation of the unknown population • It is not a black box method. % & %
&
. We will use these bootstrap values for calculating all the quantities of interest. xn ). B times. • Non-parametric Bootstrap: We do not know the form of F and we estimate it ˆ by F the empirical distribution obtained from the data
&
%
&
%
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
An example: Median The general bootstrap algorithm
ˆ 1. ˆ 2. not applicable in our case if n small. Compute θ∗ for this bootstrap sample 3.g. Asymptotic arguments exist but they refer to huge sample sizes. a histogram is an estimate of the unknown density of the sample median. . denote it as θ1 • Repeat steps 1 and 2. but the progress on computer speed makes it easily available in everyday practice
Types of Bootstrap
• Parametric Bootstrap: We know that F belongs to a parametric family of distributions and we just estimate its parameters from the sample. . standard deviation. Moreover. percentiles etc). Generate a sample x∗ of size n from Fn . We use bootstrap. . Repeat steps 1 and 2. θ2 . This is our ﬁrst bootstrap sample ˆ∗ • For this sample we calculate the sample median. ˆ∗ ˆ∗ ˆ∗ ˆ At the end we have B values θ∗ = (θ1 . . We can study skewness etc. . mean. θB ). This is a random sample from the distribution of the sample median and hence we can use it to approximate every quantity of interest (e. θ2 . . . It works for the majority of problems but it may be problematic for some others • In practice it is computationally demanding. • We generate a sample x1 ∗ by sampling with replacement from x. . . We generate samples from F using the estimated parameters.

i = 1. 1) ˆ ¯ and Gamma(1. θ2i ) are the bootstrap values for the two parameters taken from the i-th bootstrap sample
Table 1: Estimated covariance for sample mean and variance based on parametric bootstrap (B = 1000). .Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Bootstrap Standard Errors
∗ Denote θi the bootstrap value from the i−th sample. 1) densities. The bootstrap ˆ estimate of the standard error of θ is calculated as
Bootstrap Estimate of Bias
ˆ Similarly an estimate of the bias of θ is obtained as ˆ ˆ ˆ Bias(θ) = θ∗ − θ ˆ Note that even if θ is an unbiased estimate. From theory. . Based on B = 1000 replications we estimated the covariance ˆ ˆ between θ1 and θ2 . since the above is merely an estimate it can be non-zero. this estimate must be seen in connection with the standard errors. in regression setting two of the regression coeﬃcients).0998 0. 200 were generated from N (1. So. for the normal distribution the covariance is 0.0104
ˆ∗ ˆ∗ θ1i − θ1
ˆ∗ ˆ∗ θ2i − θ2
ˆ∗ ˆ∗ where (θ1i .
ˆ seB (θ) = where
1 B
B i=1 B i=1
ˆ∗ ˆ θi − θ∗
2
1 θ = B ˆ∗
ˆ∗ θi
This is merely the standard deviation of the bootstrap values. % & %
&
.0007 Gamma 0. Distribution Normal n=20 n=200 0. Samples of sizes n = 20. Suppose that θ1 and θ2 are two estimates of interest (e. . θ2 ) = M
M i=1
Parametric bootstrap. suppose that θ1 = x ˆ and θ2 = s2 . B.00031 0. Both have µ = 1 and σ 2 = 1.
&
%
&
%
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Example: Covariance between sample mean and variance Bootstrap Estimate of Covariance
In a similar manner with the standard errors for one parameter we can obtain ˆ ˆ bootstrap estimates for the covariance of two parameters.g. . in the normal distribution they can be the mean and the variance. Then the bootstrap estimate of covariance is given by 1 ˆ ˆ CovB (θ1 .

asymptotic standard errors) we can use bootstrap to estimate them.
Gamma(1.5
0. The interval has the form
Percentile CI
The conﬁdence interval is given as κa/2 .0
$
Normal.2
1.g. κ1−a/2 ˆ∗ where κa denotes the a-th empirical quantile of the bootstrap values θi .0
0. If we do not know them in closed forms (e. θ − ζa/2 seB (θ)) where ζa is the a-th quantile of the values ξi . n=200
3
variance
variance
2
1
0
0. The computational burden is double! ˆ∗ Note that ξi are studentized values of θi .5
2. It implies that we assume that θ follows a Normal distribution.0
0.6
1.0 mean 1.0 mean
1.0
0.0 mean
1.8 1.5
Simple Bootstrap CI
0.5
1. n=20
4 4
Gamma(1.5
1.
&
%
&
%
. This is clearly non-symmetric and takes into account the distributional form of estimate.5
2.
ˆ ˆ ˆ ˆ (θ − ζ1−a/2 seB (θ).6
0.0 mean
1.5
variance
variance
1.6 1.4
0.0
Figure 2: Scatterplot of sample mean and variance based on B = 1000 replications.0
0
1
2
3
0.0 2.2 1. & % & %
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Percentile-t CI
Improves the simple bootstrap CI in the sense that we do not need the normality assumption. n=200
’
$
1. θ − Z1−a/2 seB (θ)) where Za denotes the a-th quantile of the standard normal distribution.4
1.8
Use the bootstrap estimate of the standard error and a normality assumption (arbitrary in many circumstances) to construct an interval of the form ˆ ˆ ˆ ˆ (θ − Z1−a/2 seB (θ).Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
Normal.8
1.4 0. This is a ˆ (1 − a) conﬁdence Interval for θ.1).8
1. n=20
2.0
0.1). where ξi = ˆ∗ ˆ θi − θ ˆ se(θ∗ )
i
ˆ∗ Note that for the ξi ’s we need the quantities se(θi ).6 0.4 1.5
1.

κp2 ) where p1 = Φ(zα/2 + 2b0 ) and p2 = Φ(z1−α/2 + 2b0 ) with Φ(·) the standard normal distribution function. Bootstrap might be a solution
0
20
10
20
30
40
60 participation
80
100
Figure 3: The data for the American Elections & % & %
. i = 1. therefore the simple percentile CI are obtained. results exist only under the assumption of a bivariate normal population. They are related to the American presidential elections in 1844. .Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Bias Corrected CI
Improves the Percentile bootstrap ci in the sense that we take into account the bias The bootstrap bias-corrected percentile interval (BC): (κp1 . n Pearson’s correlation coeﬃcient is given by
n i=1 n i=1
(xi − x)(yi − y ) ¯ ¯
n i=1
ˆ θ=
(xi − x)2 ¯
(yi − y )2 ¯
difference
Inference is not so easy. yi ). • Resulting CI not symmetric (except for the simple CI. The observed ˆ correlation coeﬃcient is θ = −0. . then b0 = 0 and p1 = a/2 and p2 = 1 − a/2. .369
Example 1: Correlation Coeﬃcient
Consider data (xi . which is the less intuitive) • Note the connection between CI and hypothesis testing!
ˆ∗ ˆ I(θi ≤ θ)
&
ˆ∗ If the distribution of θi is symmetric. . • In order to estimate consistently extreme percentiles we need to increase B. The question is whether there is correlation between the two variables. The two variables are: the participation proportion in the election for this county and the diﬀerence between the two candidates.
%
&
%
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
American Elections
The data refer to n = 24 counties in America. κa is the a−quantile of the distribution of the bootstrap values (similar to the notation for the percentile CI) b0 = Φ−1 1 B
B i=1
Comparison of CI
Which interval to use? things to take into account: • Percentile CI are easily applicable in many situations ˆ • Percentile-t intervals need to know se(θ).

For data from a Poisson x distribution this quantity is.1881.1. . 0. 3.6806. B. ˆ (B = 1000): θ simple bootstrap CI (0.206.0.2.1.0456.1. seB (θ) = 0.4961) (0.0. The ˆ 1−θ 2 ˆ asymptotic standard error given by normal theory as seN (θ) = √n−3 = 0.2 0. 1.7228. θ∗ = 2. Skewness is evident in both plots.2
0
50
100
150
200
ˆ We use bootstrap by resampling from the observed values. i = 1. seB (θ) = 0. . especially the one for ξi ’s & % &
Bias Corrected
Table 3: 95% bootstrap conﬁdence intervals for I %
. we calculate ˆ ˆ θ∗ − θ ξi = i ˆ∗ ) se(θi where ˆ 1 − θ∗2 ˆ∗ se(θi ) = √ i n−3 Then we found the quantiles of ξ.9077. this explains the diﬀerent left limit of the percentile-t conﬁdence intervals. 0. 3. .1.2659.8981. To do so.
Table 2: 95% bootstrap conﬁdence intervals for I
&
%
&
%
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Example 2: Index of Dispersion
For count data a quantity of interest is I = s2 /¯.1801.105.8762.7858)
0
100
200
300
-6
-4
-2 ksi
0
2
Percentile Percentile-t
Figure 4: Histogram of the bootstrapped values and the ξi ’s.1.Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
American Elections: Results
ˆ ˆ From B = 1000 replications we found θ∗ = −0.6 -0.0 0. The data values are: (1.0042 simple bootstrap CI Percentile Percentile-t Bias Corrected (-0. -0.5.0167) (-0.8.3656.0175)
More details in the Percentile-t intervals
In this case and since we have an estimate of the standard error of the correlation coeﬃcient (though it is an asymptotic estimate) we can use percentile-t intervals ˆ∗ without need to reiterate bootstrap.2. theoretically.6731. from the bootstrap values θi . We have accidents for n = 20 crossroads in one year period.4 bootstrap values -0.6241) (0. 3. Bias(θ) = 0.0019) (-0.6929. Using θ = I we found ˆ ˆ ˆ ˆ = 2. .3).3.8 -0.5.2.1420) (-0. Note that the distribution of ξ is skewed.4742) (1.0.0. 3. ˆ Bias(θ) = 0.
-0.0. We want to built conﬁdence intervals for I.2. 0.6901.

This is due to the small skewness of the bootstrap values. ”Extreme” depends on th form of the alternative. The idea is to construct the distribution of the test statistic using bootstrap (parametric). • For each sample calculate the value ti of the statistic. Standard t-test applicable only via the Central Limit Theorem implying a large sample size. .Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
250
4
200
Some comments
Index of dispersion
• Asymptotic theory for I is not available. • The sample estimate is biased. . Example: We have independent data (x1 .
Figure 5: Histogram and boxplot for the index of dispersion. bootstrap is an easy approach to estimate standard errors • The CI are not the same. and we know that their distribution is Gamma with some parameters. . xn ). i = 1. We wish to test a certain value of the population mean µ: H0 : µ = 1 versus H1 : µ = 1. . Important: We do not care that our statistic has a known distribution under the null hypothesis • Calculate the observed value tobs of the statistic for the sample • Generate B samples from the distribution implied by the null hypothesis.
100
150
50
0
0
1
2 Index of dispersion
3
4
1
2
3
• For the percentile-t intervals we used bootstrap to estimate standard errors for each bootstrap sample. • Accept or reject according to the signiﬁcance level. . . . . A small skewness is apparent & % & %
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Hypothesis Tests Hypothesis Tests
Parametric Bootstrap very suitable for hypothesis testing. the general algorithm is the following • Set the two hypotheses. So. For smaller size the situation is not so easy. B. • Find the proportion of times the sampled values are more extreme than the observed. & % & %
. • Choose a test statistic T that can discriminate between the two hypotheses.

402. −0.598 we ﬁnd x ¯ ˆ Tobs = 0.89. 0. A ˆ good strategy can be to increase B if p is close to the signiﬁcance level. ˆ Important: Rescaling is not obvious in certain hypotheses. We select as a test statistic T = |¯ − 1|. Then an approximate p-value is given by
B
p= ˆ
i=1
I(ti ≥ tobs ) + 1 B+1
Hypothesis Tests (Non-parametric Bootstrap)
The only diﬃculty is that we do not know the population density.18. Thus the ˆ samples must be taken from Fn . The main diﬀerence from permutation test to bootstrap tests is that in permutation tests we sample without replacement in order to take a permutation of the data. Thus we add in each observation 0. we need the distribution of the test statistic under the null hypothesis.775.402. we can build conﬁdence intervals for this. A remedy can be to ˆ rescale Fn so as to fulﬁll the null hypothesis. In order F10 to represent the null hypothesis we rescale our data so as to have a mean equal to 1. 1.Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Hypothesis Tests (2)
More formally. let us assume that we reject the null hypothesis at the right tail of the distribution. The data ˆ are not from the null hypothesis.05. Since x = 0. Taking B = 100 samples we found p = 0. Alternative testing procedures are the so-called permutation or randomization tests. 1.47.402. We resample with replacement from F (xnull ).279. 0. ˆ ˜ &
i=1
I(ti ≥ tobs ) B % & %
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Example
Consider the following data x = (−0.96). We want to test H0 : µ = 1.
&
%
&
%
. thus Fn is not appropriate.155.0016. Several other statistic could be used. 0. 0.89. Then we test whether the observed value is extreme relative to the totality of the permutations. ˆ Note that some researchers advocate the use of the simpler estimate of the p-value.
Permutation test
Hypothesis testing via parametric bootstrap is also known as Monte Carlo tests.
p is an estimate of the true p-value. According to standard hypothesis testing theory. vs H1 : µ = 1.23. The idea is applicable when the null hypothesis implies that the data do not have any structure and thus. 1. then we take the bootstrap samples from this distribution and we built the distribution of the selected test statistic. every permutation of the sample data. namely
B
p= ˜ p has better properties than p. under the null hypothesis is equally probable. The the ˆ new sample is xnull = x + 0. In practice since the permutations are huge we take a random sample from them and we built the distribution of the test statistic under the null hypothesis from them. 1.

Note that in this case it is not so easy to construct a bootstrap test.6
-0. . Thus. .0 rho
0. H0 : µx = µy versus H1 : µx = µy . . as for example the well known two-sample t-statistic.5
-0. xn ) and y = (y1 .4
-0. i = 1. For example the t-statistic has this property as it is standardized. . x2 . i.3698) + 1 1000 =
This is an estimate of the true p-value. B Estimate the p-value of the test as
B
Goodness of Fit test using Bootstrap
Parametric Bootstrap suitable for Goodness of ﬁt tests. Other ”distances” can be also use to measure deviations from normality! The test can be used for any distribution!
B+1 Other test statistics are applicable. we need to estimate them). ym ). H0 : F = N (µ. This has asymptotically and under the null hypothesis a known and tabulated distribution. σ 2 ). . We want to test the hypothesis that H0 : ρ = 0. . A well known test ˆ statistic is the Kolmogorov . We estimate the p-values as right tail of the distribution. . . Use as a test statistic T = |¯ − y |. .e. y1 . We wish to test normality. either via asymptotic results or bootstrap etc. i. .3698. . Then an approximate p-value is given by
B
p= ˆ
i=1
I(ti ≤ to bs) + 1 B+1 =
i=1
I(ti ≤ −0. There is no need to use D. we can built conﬁdence intervals for this. σ 2 ) versus H1 : F = N (µ.0
1. . We wish to test the hypothesis that they have the same mean. Bootstrap based test do not use the asymptotic arguments but we take samples from the normal distribution in H0 (if the parameters are not known. & % &
0. a test statistic for which the distribution does not vary. The shadowed area is the area where we reject the null hypothesis %
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Non-Parametric Bootstrap Hypothesis Tests
Suppose two samples x = (x1 .e.2
0. vs H1 : ρ < 0.6
Figure 6: Density estimation for the bootstrapped values.2
0. permutation tests are complementary to the bootstrap tests. A general warning for selecting test statistics is the following: we would like to select a ”pivotal” test statistic. . The observed value is −0. .Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Permutation test. sample with replacement from z. . .e. y2 .e. For each of the B bootstrap samples calculate Ti∗ . & % &
p= ˆ
i=1
I(Ti ≥ tobs ) + 1
%
. Sampling from the null hypothesis is not so ˆ easy because we need to transform Fn to reﬂect the null hypothesis! Therefore. xn .Example
Consider the data about the American elections in 1844. the sample correlation coeﬃcient. We use as a test statistic r. .4
0. i.Smirnov test statistic D = max(|Fn (x) − F (x)|). . For each sample we obtain the value of the test statistic and we construct the distribution of the test statistic. Under the null hypothesis a x ¯ good estimate of the population distribution is the combined sample z = (x1 .5
B
density
1.04 1000
0. i. We take B = 999 random permutations of the data by ﬁxing the one variable and taking permutations of the other (sampling without replacement). . ym ).0
39 + 1 = 0.

99% percentile or max(Xi )). the median).g. This may cause problems.g time series. • Complexity of the problem
Choice of B
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Other Resampling Schemes: The Jackknife Variants of Bootstrap
ˆ • Smoothed Bootstrap: Instead of using fn we may use a smoothed estimate of it for simulating the bootstrap samples. & % & % Choice of B depends on • Computer availability • Type of the problem: while B = 1000 suﬃces for estimating standard errors. clearly we do not sample from a good estimate of F and we add variability in our estimates. denotes the estimate when all the observations except the i-th are used for estimation. It is quite ˆ useful for estimating standard errors. we ˆ∗ perform another bootstrap round using at the bootstrap values θi ˆ • Bayesian Bootstrap: We generate from F but the probabilities associated with each observation are not exactly 1/n for each bootstrap sample but they vary around this value.. It holds that ˆ se(θJ ) = n−1 n
n
ˆ ˆ θ(i) − θ(·)
i=1
2
ˆ This is a good approximation of se(θ) as well. • Unsmooth quantities: There are plenty of theoretical results that relate the success of bootstrap with the smoothness of the functional under consideration ˆ • Multivariate data: When the dimensions of the problem are large. Such an estimate might be the Kernel Density estimate • Iterated Bootstrap: For non-smooth functionals (e. Bootstrap is based on the assumption of independence.g. perhaps it is not enough for conﬁdence intervals.Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Failures of Bootstrap
ˆ • Small Data sets (because Fn is not a good approximation of F ) • Inﬁnite moments (e. & % & %
. 99. Remedies exist • Estimate extreme values (e. The problem is the non-smoothness of the functional under consideration • Dirty Data: If outliers exist in our sample. the mean of the Cauchy distribution).g. • Dependence structures (e. Deﬁne ˆ ˆ ˆ θJ = nθ − (n − 1)θ(·) where 1ˆ ˆ θ(·) = θ(i) n ˆ ˆ ˆ θJ is called the jackknifed version of θ and usually it has less bias than θ. The method was initially introduced as a bias-reduction technique. spatial problems). then Fn becomes less good as an estimate of F . Let θ(i) .

we have n − 1 diﬀerent samples. n. . Subsamples are in fact samples from F and not ˆ from Fn . Now each observation is the entire vector associated with the original observation.g. F -statistic. Subsamples are of smaller size and thus we need to rescale them (recall the factor n − 1 in the se of the jackknife version. Apply bootstrap on the residuals of the model ﬁtted to the original data The second one is preferable.
Bootstrap in Linear Regression
There are two diﬀerent approaches: 1. Obtain the estimates β and the residuals form the ﬁtted model ˆi . For each 2. Generalize the idea by ignoring b ≥ 1 observations at each time. Complete enumeration is now diﬃcult: use Monte Carlo methods by taking a sample of them. . It can remedy the failure of bootstrap in some cases. M SE. We want to ﬁt a simple linear regression Y = α + βX. . X is the age of the bird. This is the idea of subsampling. i = 1.
&
%
&
%
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Bootstrapping the residuals
Consider the model Y = βX + using the standard notation. & % &
5
4
6
8
10 x
12
14
16
Figure 7: The data and the ﬁtted line %
. Split the entire sample in subsamples of equal size. . 1
∗ n)
from the residuals by sampling
y 2 3 4
• Using the design matrix.
∗
Example
n = 12 observations. Resample with replacement from the observations. . create the bootstrap values for the response using ˆ Y ∗ = βX +
∗
• Fit the model using as response Y ∗ and the design matrix X. The bootstrap algorithm is the following ˆ • Fit the model to the original data. • Keep all the quantities of interest from ﬁtting the model (e. . coeﬃcients etc) • Repeat the procedure B times. • Take a bootstrap sample with replacement. Y is the wing length of a bird. Thus. .Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Other Resampling Schemes: Subsampling
In jackknife we ignore one observation at each time. Similar formulas can be derived. . The samples are created without replacement.
= ( ∗. since the ﬁrst approach violates the assumption for constant design matrix Bootstrapping in linear regression removes any distributional assumptions on the residuals and hence allows for inference even if the errors do not follow normal distribution.

5
0.891) (0.9
1.12 0.1
0.291 0.8 alpha
0.
Figure 9: Histogram of the B = 200 bootstrap values and a scatterplot for the ˆ bootstrap values of α and β.5
0.27
0.106 .7
0.9812
y 3
4
5
F -statistic
4 6 8 10 x 12 14 16
R
Figure 8: The ﬁtted lines for all the B = 200 bootstrap samples
ˆ Table 4: Bootstrap values for certain quantities. 0.963) (0. err.980
0.16 sigma 0. 1566.993)
sample value 0.08 0.g.25 beta
0.249 .6
0.0044
95% CI (0.975
0. Then full inference is available based on bootstrap.985 R^2
0.200) ( 406.493 .1
80
60
30
40
20
beta
40
0.266 0.6
0.984
2
std.9
1. The correlation between α and β ˆ was −0.
&
%
&
%
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
40
30
30
20
20
20
30
Parametric Bootstrap in Regression
0.0
F -statistic
The only change is that the errors for each bootstrap sample are generated from the assumed distribution. 0.995
0.78 0.159 717. 0.
1.23
0
0
. while this is very diﬃcult when using classical approaches.023 288.536 0.975 . 0.178 523.Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Results
θ α ˆ ˆ β σ ˆ
2
mean 0.29
0
10
20
10
0.106 0. 0.266 0.20
0
10
0
10
0.286) (0.7
0.778 0.009 0. This implies that we assume that the errors follow some distribution (e.990
0.27
0.572 .774 0.8 alpha
0.0
1. t or a mixture of normals).25
Instead of non-parametric bootstrap we can use parametric bootstrap in a similar fashion.89.29
500
1000
1500
2000
0.23
0. ˆ & % & %
0.

3.0
4.3401) (0.4
300
100
100
50
-2.1477 0.0
0.
&
%
&
%
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
100
150
order 1 2 3 4 5 6 7 log-determinant
mean 3.0658 0.standard error and CI.3733.0938 0.05
0.5441) (0.0
2.6324 1.1867) (-6. 2000. -2. We resample observations with replacement.0
2.6 value
0. However we could use parametric bootstrap by assuming a multivariate normal density for the populations and sampling from this multivariate normal model with parameters estimated from the data. the principal components.0
0.2 value
0.2
1.4403 0.1592
3
4
.6
150
0 50
0
50
0. 2.1884. 0.0950 -4.9601 1. A problem of major interest with sample data is how to measure the sampling variability in the eigenvalues of the sample covariance matrix.0196) (0.
Note: Our approach is non-parametric bootstrap. If X is the vector with the original variables the PC are derived as Y = XA. An idea of the sample variability can be easily deduced. the logarithm of the determinant and a boxplot representing all the eigenvalues.
Bootstrap in PCA .2548.6
0
100
0.3860 0.5
3.9416) (0.25
-10
-8
-6
-4
-2
log-determinant
Table 5: Bootstrap estimates.4
1.2778
0.6464
100
200
300
1.2
0.8
1.5199
250 50 0
0
50
2. based on B = 1000 replications.2331. & % &
Figure 10: Histogram of the eigenvlaues.6251 0.3497 0.4
0. a criterion is to select so many components that correspond to eigenvalues greater than 1. 0.3818) (0.1830 0.0413 0.5 value
2.15 value
0.9847
95% CI (2. In order the PCA to me meaningful we need to keep PC less than the original variables.0
1.3
0. where A is a matrix that contains the normalized eigenvectors from the spectral decomposition of the covariance matrix of X.0 value
3.6833.1194
st. 0.0 value
1.2859.9391) (1. based on B = 1000 replications %
0
1
2
0. which are in descending order of importance and preserve the total variability.5
4.4 value
0.8
1.0306.0558 1.5
0
50
100
0.1958 0.2041 0. We proceed with PCA based on the correlation matrix.0846. 0. While working with standardized data.2
0. 1.4292)
observed value 2.Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Bootstrap in Principal Components Analysis
Principal Components Analysis is a dimension reduction technique that starts with correlated variables and ends with uncorrelated variables.9532
0
0
0.dev 0.5
1. We used B = 1000.4
0. Bootstrap can be a solution.Example
The data represent the performance of 26 athletes in heptathlon in the Olympic Games of Sidney.0505
0.0
0.0459 0.6
0.1
0.

2
0. & % & %
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
0. This will create a band around f (x). which is in fact a componentwise conﬁdence interval and give us information about the curve.0
0.0
0. . For h we can select it in an optimal way.059n−1/5 min Q3 − Q1 . .
1 K h
x − xi h
where K(·) is the kernel function and h is the bandwidth which makes the estimate more smooth (h → ∞) or less smooth (h → 0). .2
0. the subscript denotes the bootstrap sample.1
0. .Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Bootstrap in Kernel Density Estimation (1)
Kernel Density Estimation is a useful tool for estimating probability density ˆ functions. There are several choices for the kernel. xn ) 1 ˆ f (x) = n
n i=1
Bootstrap in Kernel Density Estimation (2)
We apply bootstrap: ˆ Take B bootstrap samples from the original data. Namely we obtain an estimate of the unknown density f (y) as f (y) based on a sample (x1 . For each sample ﬁnd fi (x). Use these values to create conﬁdence ˆ intervals for each value of x. ˆ We want to create conﬁdence intervals for f (x). For certain examples it suﬃces to use hopt = 1.3
density
density
0. 1. the standard normal density is the common one.345
Example: 130 simulated values
where Q1 and Q3 the ﬁrst and third quartile and s the sample standard deviation.s .1
0.3
2 2 4 6 x 8 10 12
4
6 x
8
10
12
ˆ Figure 11: Estimated f (x) for the data of our example
ˆ Figure 12: The B = 100 bootstrap replicated estimated of f (x) for the data of our example % & %
&
.

Find β1 .Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
0. yT ). i. . Simulate a new series by
annual peek flow
Example: Annual Peak ﬂows
The data consist of n = 100 observations about the annual peak ﬂows in a speciﬁc area in the Missouri river in cm3 /sec.0
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
The algorithm
ˆ ˆ 1. 2. Fit the AR(2) model to the data (y1 .1
Yt = β1 Yt−1 + β2 Yt−2 +
2 4 6 x 8 10 12
t
In the classical case we assume normality for the residuals.e. .2
Consider the AR(2) model
0. Create the empirical distribution FT of the residuals 4. . . i = 1. use as starting points the original observations at that time
• Construct the entire series using the recursion
∗ ˆ ∗ ˆ ∗ yt = β1 yt−1 + β2 yt−2 + ∗ t. Start Bootstrap. If the assumption is not made we can use bootstrap for inference
Figure 13: Componentwise 95% conﬁdence intervals based on the percentile bootstrap approach (B = 100) & % & %
Intro to Bootstrap
0.3
Bootstrap in Time Series
density 0. Obtain the residuals ˆi from the ﬁtted model ˆ 3.
where
∗ t
ˆ is drawn from FT .
∗ • Set yi = yi . β2 . 2. The time period examined is 1898-1997.
2
0
4
6
8
20
40 Index
60
80
100
• Fit the AR(2) model to the bootstrapped series and obtain the parameters • Repeat the steps above B times & % &
Figure 14: The original series %
.

0
0.3
0.226) (1.042 1.1 b1
0.Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
ˆ β1 σ2 ˆ
0.032.155
std.2
0.579
ˆ Table 6: Bootstrap values for certain quantities.2
0.0
0.087.4
0
-0.1
0.2
0
2
2
30
0
0
-2
0
20
40 t
60
80
100
0
20
40 t
60
80
100
-2
-2
0
0
2
4
6
20
40 t
60
80
100
0.063
10
1.112
30
20
30
40
Example: Annual Peak ﬂows
40
-0.0923 0.3
6
6
6
4
yboot
yboot
4
yboot
4
yboot
0 20 40 t 60 80 100
2
0.1
0.16.0
0
yboot
0 20 40 t 60 80 100
2
0
-1
0
-1
10
0
20
40 t
60
80
100
0
20
40 t
60
80
100
-2
-1
0
0
1
2
3
20
40 t
60
80
100
-0.1
6
6
2
3
4
4
-0.533)
0.333)
sample value 0. 0.0 b2
0. 0.1
0. 2.1
0.0964
95% CI ( -0.2
1
0
yboot
yboot
yboot
0
yboot
0 20 40 t 60 80 100
2
2
-2
0
20
40 t
60
80
100
0
20
40 t
60
80
100
-2
0
residuals
b1
-2
-2
-1
-2
0
2
4
6
-0.1
0.4
0
0
-1
0
1
2
3
20
40 t
60
80
100
Figure 16: Histogram of the ﬁtted residuals in the sample data and scatterplot of the bootstrap values for the parameters & % &
Figure 17: Some plots of the bootstrap series
%
.669
0.2
0. err.118.1
3
3
20
b2
6
2
2
4
1
yboot
yboot
1
yboot
0.3
0.2
0
10
20
θ ˆ β1
mean 0. The correlation between α and β ˆ was −0.415
(-0. 0.3
Figure 15: Histogram of the bootstrap values for the parameters (B = 200)
&
%
&
%
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
0.
-0.

spatial dependence etc)
Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Selected Bibliography More Applications
There are several more applications of bootstrap not treated here. x1 ). Then we resample blocks yi instead of the original observation. . x12 ). • There are several other most complicated methods suitable for certain types of data (i. namely y1 = (x1 . but we loose it when connecting blocks. In some sense we add white noise to the series. (1986). and y4 = (x1 0. . . we can ﬁnd an appropriate AR model that approximates quite well the dependence structure of the series. y3 = (x7 . we deﬁne y1 = (x1 . D. 1. Efron. x2 .V. Cambridge. x3 ). and Tibshirani. . Tibshirani (1993) An introduction to the Bootstrap. Note that dependence for lags ≥ 3 vanishes. For dependent data standard bootstrap cannot be applied. Marcel and Decker. for linear relationships. 54–77. (1982) The jackknife. x6 ). B. Statistical Science. J. y12 = (x12 . x3 . x2 . Cambridge University Press.C. x12 . y2 = (x4 . x9 ). x11 . Some selected textbooks are the following (capturing both the theoretical and the practical aspect of bootstrap) Efron B.Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
Bootstrap with dependent Data
ˆ Bootstrap is based on independent sampling from Fn . the bootstrap and other resampling plans. Otherwise bootstrap does not work! The bibliography on bootstrap increases with high rates during the last years. x5 . conﬁdence intervals and other measures of statistical accuracy. Bootstrap is suitable for a variety of statistical problems like • Censored Data • Missing Data problems • Finite populations • etc However before applying bootstrap one must be sure that the bootstrap samples are drawn from a good estimator of the unknown population density. We keep some part of the dependence. and Tu. x3 ). by ﬁtting a model and using block bootstrap to the residuals of that model. SIAM. B. (1995) The Jackknife and Bootstrap . • Overlapping blocks: We built the blocks overlapping. A. . . and Hinkley. . Remedies • Moving Block Bootstrap: For example consider the data (x1 . Davison. i. x1 . x8 . Pensylvania Efron. y2 = (x2 . R. Or we combine parametric ideas with block bootstrapping. x12 ). . For example. D. Bootstrap methods for standard errors. Springer & % & %
. The idea is that we need to mimic the dependence of the data. y11 = (x11 . Shao. by theory..e. We construct 4 blocks of 3 observations in each one. x4 ). (1997) Bootstrap Methods and Their Applications. This adds less white noise but still there is missing dependence & % & %
Bootstrap with dependent Data (2)
• Parametric Bootstrap: We apply a parametric model to catch the dependence structure.e. x2 ). x2 .

(1989) Computer Intensive Methods for Testing Hypotheses: An Introduction. Boston
&
%
&
%
.A (1968) A Simpliﬁed Monte Carlo Signiﬁcance Test Procedure.S. (1993) Resampling-Based Multiple Testing. 582–598. Willey Noreen. Birkhauser. Willey Hope. A. E. M. P.Intro to Bootstrap
April 2004
Intro to Bootstrap
April 2004
’
$
’
$
THE END
Chernick.H. P.W. 30. Resampling Methods: A practical Guide to Data Analysis. B. and Young. (1998). S.C. (1999) Bootstrap Methods: A practitioner’s Guide. Westfall. Journal of the Royal Statistical Society. R. Wiley Good.