You are on page 1of 12

27/5/2014

1
1
Resampling Techniques
Resampling
Introduction
We have relied on idealized models of the origins of
our data ( ~N) to make inferences

But, these models can be inadequate

Resampling techniques allow us to base the analysis
of a study solely on the design of that study, rather
than on a poorly-fitting model
27/5/2014
2
Fewer assumptions
Ex: resampling methods do not require that distributions
be Normal or that sample sizes be large

Generality: Resampling methods are remarkably similar for a
wide range of statistics and do not require new formulas for
every statistic

Promote understanding: Bootstrap procedures build intuition
by providing concrete analogies to theoretical concepts
Resampling
Why resampling
Resampling

Collection of procedures to make statistical inferences
without relying on parametric assumptions

- bias
- variance, measures of error
- Parameter estimation
- hypothesis testing
-

27/5/2014
3
5
Resampling Techniques
PART 0: Monte Carlo Simulation

PART 1: Bootstrap
Resampling with replacement

PART 2: Jackknife
Resampling without replacement
6
PART 1: The Bootstrap
Bootstrap technique was proposed by
Bradley Efron (1979, 1981, 1982) in
literature.


Bootstrapping is an application of intensive
computing to traditional inferential methods.

27/5/2014
4
Resampling
Bootstrap
Hypothesis testing, parameter estimation, assigning
measures of accuracy to sample estimates
e.g.: se, CI
Useful when:
formulas for parameter estimates are based on
assumptions that are not met
computational formulas only valid for large samples
computational formulas do not exist


Resampling
Bootstrap
Assume that sample is representative of population

Approximate the distribution of the population by repeatedly
resampling (with replacement) from the sample

27/5/2014
5
Resampling
Bootstrap
Assume that sample is representative of population

Approximate the distribution of the population by repeatedly
resampling (with replacement) from the sample

What is bootstrapping?
Randomly sampling, with replacement, from
an original dataset for use in obtaining
statistical estimates.

Start with a set of values.
Randomly draw a value from the population.
The value stays in the available population of values.
Randomly draw another value from the population.
Do this n number of times to fill your dataset.
Perform an analysis on your dataset(s)
Do this 10,000 times
Utilize the results of your B=10,000 analyses to draw
conclusions.
27/5/2014
6
An example
X=(3.12, 0, 1.57,
19.67, 0.22, 2.20)
Mean=4.46
X1=(1.57,0.22,19.67,
0,0,2.20,3.12)
Mean=4.13
X2=(0, 2.20, 2.20,
2.20, 19.67, 1.57)
Mean=4.64
X3=(0.22, 3.12,1.57,
3.12, 2.20, 0.22)
Mean=1.74
12
Illustration of Bootstrap
Population,
resampling
1 2
, , ...,
n
X X X
sampling
* * *
1 2
, , ...,
n
X X X
B times
inference
* * *
1 2


B

statistics
estimate
by
* * *
1 2
, , ...,
n
X X X
* * *
1 2
, , ...,
n
X X X

27/5/2014
7
13
Summary of the Bootstrap Method
( )
( )
( )


Var

bias


1
1

Var
,

bias
,

2
2
1
* *
*
* *
2
*
1
*
+ =

=
=
+ + +
=
=
MSE
B
B
B
b
M b
M
B
M

14
PART 2: The Jackknife
27/5/2014
8
15
Why the funny name of Jackknife?

Mosteller and Tukey (1977, p. 133) described a
predecessor resampling method, the jackknife, in
the following way:
The name jackknife is intended to suggest the
broad usefulness of a technique as a substitute
for specialized tools that may not be available,
just as the Boy Scouts trustworthy tool serves so
variedly
The Jack-knife
The jackknife was the first of the
computer-based methods for estimating
standard errors and biases.

First proposed in 1956 by Maurice
Quenouille as a method for bias reduction.

In 1958, John Tukey proposed a jackknife
estimate of the standard error.
27/5/2014
9
The Jack-knife
Jack-knife is a special kind of bootstrap.
The jackknife is a tool for estimating standard errors and
the bias of estimators
As its name suggests, the jackknife is a small, handy tool;
Both the jackknife and the bootstrap involve resampling
data; that is, repeatedly creating new data sets from the
original data
The Jack-knife
Each bootstrap subsample has all but one of the
original elements of the list.
The jackknife deletes each observation and
calculates an estimate based on the remaining
(n 1) of them
It uses this collection of estimates to do things
like estimate the bias and the standard error
27/5/2014
10
The Jack-knife



Have a data set .

The i
th
jackknife sample, denoted ,
is defined to be the original data set, ,
with the i
th
point removed,



( )
n
x x x x , , ,
2 1
=
( ) i
x
x
( )
( ) n i x x x x x x
n i i i
, , 2 , 1 , , , , , , ,
1 1 2 1
= =
+
The Jack-knife
27/5/2014
11
The Jack-knife
22
Illustration of Jackknife
Population,
resampling
1 2
, , ...,
n
X X X
sampling
2
, ...,
n
X X
1 3
, ...
n
X X X
1 2 1
, , ...,
n
X X X

N times
inference
( 1) ( 2) ( )


n


statistics
Estimate
by

27/5/2014
12
23
Summary of the Jackknife Method


(1) (2) ( )
( )
( )
2
2
( ) ( )
1
2 2

+ ... +

,

( 1)( ),
1

( ) ,
.
n
J
n
J
i
i
J J
J
n
bias n
n
se
n
MSE bias se



=
+
=
=

=
= +

You might also like