Burns Statistics -- Statistical Bootstrap and Other Resampling Methods hitp://www.burns-stat.com/pages/Tutor/bootstrap_resampling. html
Ide ll
The Statistical Bootstrap and Other Resampling Methods
This page has the following section:
Preliminaries
The Bootstrap
Rand S-PLUS Software
‘The Bootstrap More Formally
Permutation Tests
Cross Validation
Simulation
Preliminaries
The purpose of this document is to introduce the statistical bootstrap and related
techniques in order to encourage their use in practice. The examples work in Rand
S-PLUS ~~ see An Introduction to the S Language for an explanation of these, and
‘Some hints for the R beginner for an introduction to using R. However, you need not
be a user to follow the discussion. On the other hang, the S language (including R) is
arguably the best environment in which to perform these techniques.
A dataset of the daily returns of IBM and the S&P 500 index for 2006 is used in the
examples. A tab separated file of the data is available at: http://www.burns-
stat.com/pages/Tutor/spx ibm.txt
‘An R command to create the data as used in the examples i
spxiba < as matrix (sead. table(
vel (http: //wee,burne-stat. con/pages/Tutor/spx_sbm. txt!) ,
header=TRUE, sep='\t", row.nanes=1))
The above command reads the file from the Burns Statistics website and creates a
two column matrix with 251 rows. The following two commands each extract one
column to create a vector.
spxret <- spxibm[, "spx']
ibmret <- spxibm[, "ibm )
The calculations we talk about are random. If you want to be able to reproduce them
exactly, there are two choices: You can save the .Random. seed object in existence
just before you start the computation. You can use the set. seed function.
The Bootstrap
The idea: We have just one dataset. When we compute a statistic on the data, we
only know that one statistic -- we don't see how variable that statistic is. The
bootstrap creates a large number of datasets that we might have seen and computes
20/06/2011 21:50Burns Statistics -- Statistical Bootstrap and Other Resampling Methods hitp://www.burns-stat.com/pages/Tutor/bootstrap_resampling. html
2 de 1
the statistic on
\ch of these datasets. Thus we get a distribution of the statistic.
Key is the strategy to create data that "Wwe might have seen".
Our example data are log returns (also known as continuously compounded
returns). The log return for the year is the sum of the daily log returns. The log
return for the year for the S&P is 12.8%. We can use the bootstrap to get an idea of
the variability of that figure.
There are 251 daily returns in the year. One bootstrap sample is 251 randomly
sampled daily returns. The sampling is with replacement, so some of the days will be
in the bootstrap sample multiple times and other days will not appear at all. Once
we have a bootstrap sample, we perform the calculation of interest on it -- in this
case the sum of the values. We don't stop at just one bootstrap sample though,
typically hundreds or thousands of bootstrap samples are created.
Below is some simple code to perform this bootstrap with 1000 bootstrap samples.
spx-boot.sum <- numeric(1000) # create a numeric vector 1000 long
for(4 in 1:1000) {
‘ehis.sanp < spxret{ sample(251, 251, replace=TRUE) 1
spx.boot.sum[i] <- sum(this. samp)
The key step of the code above is the call to the sample function. The command says
to sample from the integers from 1 to 251, make the sample size 251 and sample
with replacement. The effect is that this.sanp is a year of daily returns that might
have happened (but probably didn't). In the subsequent line we collect the annual
return from each of the hypothetical years. We can then plot the distribution of the
bootstrapped annual returns.
bist (spx.boot2um, co!
yellow’)
pive')
abline(v=sun(spxret) , co!
Histogram of spx.boot.sum
Frequency
joo 150200
50
70 Ze
spe bootsum
The plot shows the annual return to be quite variable. It could have easily been
anything from 0 to 25. The actual annual return Is squarely in the middle of the
distribution. That doesn't always happen -- there can be substantial bias for some
statistics and datasets.
More than numbers can be bootstrapped. In the following example we bootstrap a
‘smooth function over time.
20/06/2011 21:50Burns Statistics -- Statistical Bootstrap and Other Resampling Methods hitp://www.burns-stat.com/pages/Tutor/bootstrap_resampling. html
3de 11
spe varsma <- array(WA, ¢(251, 20)) # make 251 by 20 matrix
for(i in 1:20) {
chis.sanp < spxret{ sample(251, 251, replace=TRUE) 1
spx.varsmul,i] <- supsmu(1:251,
(this samp ~ mean(this.samp))*2) $y
)
plot (supsms(1:251, (spxret-mean (spxret))*2), type:
xlab='Days!, ylab='Variance')
matlines(1:251, spx.varems, 1ty=2, col='zed')
Variarce
03 04 05 (Oe
oz
o4
g 5o 180 150 7380 7280
Days
The black line is a smooth of the variance of the real S&P data aver the year, while
the red lines are smooths from bootstrapped samples. It isn't absolutely clear that
the black line is different from the red lines, but it is. Market data experience the
phenomenon of "volatility clustering”. There are periods of low volatility, and
periods of high volatility. Since the bootstrapping does not preserve time ordering,
the bootstrap samples will not have volatility clustering. We will return to volatility
clustering later
A statistic that can be of interest is the slope of the linear regression of a stock's
returns explained by the returns of the "market", that is, of an index like the S&P. In
finance this number is called the beta of the stock. This is often thought of as a fixed
number for each stock. In fact betas change continuously, but we will ignore that
complication here.
The command below shows that our data gives an estimate of IBM's beta of about
oss.
> coef(im(ibmret ~ spxret))
(Intercept) spxret
0.02887969 0,64553741
What is after the ">" on the first line is what the user typed, the rest was the
response. If we start on the inside of the command, “ibmret ~ spxret” is a formula
that says ibmret is to be explained by spxret. "Im" stands for linear model. The
result of the call to “Im” is an object representing the linear regression. We then
extract the coefficients from that object. To get just the beta, we can subscript for
the second element of the coefficients:
> coef {im{ibmret ~ spxret)) [2]
spxret
20/06/2011 21:50Burns Statistics -- Statistical Bootstrap and Other Resampling Methods hitp://www.burns-stat.com/pages/Tutor/bootstrap_resampling. html
0.0455374
We are now ready to try bootstrapping beta in order to get a sense of its variability.
beta.cbs boot <- nuneric(1000)
for(i in 1:1000) {
this.ind <- sample (251, 251, replace=TRUE)
beta.cbs.boot [i] <- coef (1m
Aberet(this ied] ~ epxret [this ind})) (2)
)
hist (beta.obs.boot, col='yellow')
abline(vacoes(im(Ibaret ~ spuret)) [2], col='blue’)
Histogram of beta.obs.boot
Frequency
toc 150 200 250
50
os oe
betacbs boot
Each bootstrap takes a sample of the indices of the days in the year, creates the IBM
return vector based on those indices, creates the matching return vector for the
S&P, and then performs the regression. Basically, a number of hypothetical years
are created.
There is another approach to bootstrapping the regression coefficient. The response
in a regression is identically equal to the fit of the regression plus the residuals (the
regression line plus distance to the line). We can take the viewpoint that only the
residuals are random.
Here's the alternative bootstrapping approach. Sample from the residuals of the
regression on the original data, and then create synthetic response data by adding
the bootstrapped residuals to the fitted value. The explanatory variables are still the
original data.
In our case we will use the original S&P returns because that is our explanatory
variable. For each day we will create a new IBM return by adding the fit of the
regression for that day to the residual from some day. This second method is.
performed below.
ibm.im <- Im(ibmrat ~ spxret)
bm. £40 <- fitted (ibm. 1m)
bm. resid < resid (ibm. 1m)
beta-resid.boot <- numeric (1000)
for(i in 1:1000) {
‘this.ind <- sample (251, 251, replace=TRUE)
beta.resid.boot [i] <- coef (1m(
ibm. fit + ibm.resid[this.ind) ~ spxret)) [2]
4de 11 20/06/2011 21:50Burns Statistics -- Statistical Bootstrap and Other Resampling Methods hitp://www.burns-stat.com/pages/Tutor/bootstrap_resampling. html
5 de 11
,
bist (beta resid boot, col="yellow')
abline(v=coof (im (ibaret ~ spxret)) [2], col='blue')
Histogram of beta.resid.boot
Frequency
100 150200 250
50
oe oe
beta resid boot
In this case the results are equivalent. (An experiment with 50,000 bootstrap
samples showed the two bootstrap densities to be almost identical.) There are
times, though, when the residual method is forced upon us. If we are modeling
volatility clustering, then sampling the observations will not work -- that destroys
the phenomenon we are trying to study. We need to fit our model of volatility
clustering and then sample from the residuals of that model.
Rand S-PLUS Software
For data exploration the techniques that have just been presented are likely to be
sufficient. If you are using R, S-PLUS or a few other languages, then there is no need
for any specialized software -- you can just write a simple loop. Often it is a good
exercise to decide how to bootstrap your data. You are not likely to understand your
data unless you know how to mimic its variability with a bootstrap.
If formal inference is sought, then there are some technical tricks, such as bias
corrected confidence intervals, that are often desirable. Specialized software does
generally make sense in this case.
There are a number of R packages that are either confined to or touch upon
bootstrapping or its relatives. These include:
boot: This package incorporates quite a wide variety of bootstrapping tricks.
bootstrap: A package of relatively simple functions for bootstrapping and related
techniques.
coin: A package for permutation tests (which are discussed below).
Design: This package includes bootcov for bootstrapping the covariance of estimated
regression parameters and validate for testing the quality of fitted models by cross
validation or bootstrapping.
chtest: This package is for Monte Carlo hypothesis tests, that Is, tests using some
form of resampling. This includes code for sampling rules where the number of
20/06/2011 21:50Burns Statistics -- Statistical Bootstrap and Other Resampling Methods hitp://www.burns-stat.com/pages/Tutor/bootstrap_resampling. html
6 de 11
samples taken depend on how certain the result is.
neboot: Provides a method of bootstrapping a time seri
pemntest: A package containing a function for permutation tests of microarray data.
resper: A package for doing restricted permutations.
scaleboot: This package produces approximately unbiased hypothesis tests via
bootstrapping.
simpleboot: A package of a few functions that perform (or present) bootstraps in
simple situations, such as one and two samples, and linear regression.
There are a large number of R packages that include bootstrapping. Examples
Include muittest that has the boot. resample function, and Matching which has a
function for a bootstrapped Kolmogorov-Smirnov test (the equality of two
probability distributions).
There is an S-PLUS library for bootstrapping and related techniques called
‘S+Resample. The S+Resample home page is on the Insightful website.
The Bootstrap More Formally
Bootstrapping is an alternative to the traditional statistical technique of assuming a
particular probability distribution. For example, it would be reasonably common
practice to assume that our return data are normally distributed. This is clearly not
the case. However, there is decidedly no consensus on what distribution would be
believable. Bootstrapping outflanks this discussion by letting the data speak for
itself.
As long as there are more than a few observations the data will reveal their
distribution to a reasonable extent. One way of describing bootstrapping
sampling from the empirical distribution of the data.
that it is
A sort of a compromise is to do "smoothed bootstrapping”. This produces
observations that are close to a specific data observation rather than exactly equal
to the data observation. An R statement that performs smoothed bootstrapping on
the S&P return vector is:
mnorm (251, meanssample(spxret, 251, replace=7RUE), sd=.05)
This generates 251 random numbers with a normal distribution where the mean
values are a standard bootstrap sample and the standard deviation Is small (relative
to the spread of the original data). If zero were given for #4, then it would be exactly
the standard bootstrap sample.
Some statistics are quite sensitive to tied values (which are inherent in bootstrap
samples). Smoothed bootstrapping can be an improvement over the standard
bootstrap for such statistics.
The usual assumption to make about data that are being bootstrapped is that the
observations are independent and identically distributed. If this is not the case, then
the bootstrap can be misleading.
Let's look at this assumption in the case of bootstrapping the annual return. If you
consider just location, returns are close to independent. However, independence is.
20/06/2011 21:50Burns Statistics -- Statistical Bootstrap and Other Resampling Methods hitp://www.burns-stat.com/pages/Tutor/bootstrap_resampling. html
7 dell
definitely shattered by volatility clustering. It is probably easiest to think in terms of
predictability. The predictability of the returns is close to (but not exactly) zero.
There is quite a lot of predictability to the squared returns though. The amount that
the bootstrap is distorted by predictability of the returns is infinitesimal. Distortion
due to volatility clustering could be appreciable, though unlikely to be
overwhelming.
There are a number of books that discuss bootstrapping. Here are a few:
‘A book that briefly discusses bootstrapping along with a large number of additional
topics in the context of R and S-PLUS is Modern Applied Statistics with S, Fourth
Edition by Venables and Ripley.
An Introduction to the Bootstrap by Efron and Tibshirani. (Efron is the inventor of
the bootstrap.)
‘Bootstrap Methods and their Application by Davison and Hinkley.
Bootstrap Methods: A Practitioner's Guide by Chernick.
Permutation Tests
‘The idea: Permutation tests are restricted to the case where the null hypothesis
really is null -- that is, that there is no effect. If changing the order of the data
destroys the effect (whatever it is), then a random permutation test can be done.
The test checks if the statistic with the actual data is unusual relative to the
distribution of the statistic for permuted data.
ur example permutation test is to test volatility clustering of the S&P returns.
Below is an R function that computes the statistic for Engle's ARCH test.
engle.arch.test <- function (x, order=10)
‘
neq < x2
nobs <- length (x)
inds <- outer (0: (nobs ~ order - 1), order:1, "#")
vmat < xsa[inds)
dim(xmat) <- dim(inds)
xreg <- Im(xsq[-1:-order] ~ xmat)
summary (xreg) $r-squared * (nobs ~ order)
All you need to know is that the function returns the test statistic and that a big
value means there is volatility clustering, but here is an explanation of it if you are
interested. The test does a regression with the squared returns as the response and
some number of lags (most recent previous data) of the squared returns as
explanatory variables. (An estimate of the mean of the returns is generally removed
first, but this has little impact in practice.) If the last few squared returns have
power to predict tomorrow's squared return, then there must be volatility clustering,
The tricky part of the function is the line that creates inds. This object is a matrix of
the desired indices of the squared returns for the matrix of explanatory variables.
Once the explanatory matrix is created, the regression is performed and the desired
statistic is returned. The default number of lags to use Is 10.
A random permutation test compares the value of the test statistic for the original
data to the distribution of test statistics when the data are permuted. We do this
below for our example.
20/06/2011 21:50Burns Statistics -- Statistical Bootstrap and Other Resampling Methods hitp://www.burns-stat.com/pages/Tutor/bootstrap_resampling. html
8 de 11
spx.arch.perm <- nuneric (1000)
for(i in 1:1000) {
apx.arch.porm[i] <- engle.arch. test (sample (spxret))
,
hist (spx.azch-perm, col='yellow')
abline (v=engls
arch. teat (spxret), col="blue')
Histogram of spx.arch.perm
Frequency
200° 300 400 500
100
5 Ze
spxarch perm
The simplest way to get a random permutation in R is to put your data vector as the
only argument to the sample function. The call:
sample spxret)
Is equivalent t
"ALSE)
samplo(spxret, size=length (spxret) , repla:
A simple calculation for the p-value of the permutation test is to count the number
of statistics from permuted data that exceed the statistic for the original data and
then divide by the number of permutations performed. In R a succinct form of this.
calculation is:
moan (spx.arch.perm >= engle.arch. test (spxzet) )
In my case I get 0.016. That is, 16 of the 1000 permutations produced a test
statistic larger than the statistic from the real data. A test using 100,000
permutations gave a value of 0.0111.
There is a more pedantic version of the p-value computation that adds 1 to both
numerator and denominator. The p-value is a calculation assuming that the null
hypothesis is true. Under this assumption the real data produces a statistic at least
as extreme as the statistic produced with the real data (that is, itself).
The reason to do a permutation test is so that we don't need to depend on an
assumption about the distribution of the data. In this case the standard assumption
that the statistic follows a chisquare distribution gives a p-value of 0.0096 which is.
in quite good agreement with the permutation test. But we wouldn't necessarily
know beforehand that they would agree.
In the example test that we performed, we showed evidence of volatility clustering
because the statistic from the actual data was in the right tall of the statistic’s
distribution with permuted data. If our null hypothesis had been that there were
20/06/2011 21:50Burns Statistics -- Statistical Bootstrap and Other Resampling Methods hitp://www.burns-stat.com/pages/Tutor/bootstrap_resampling. html
9 de 11
some given amount of volatility clustering, then we couldn't use a permutation test.
Permuting the data gives zero volatility clustering, and we would need data that had
that certain amount of volatility clustering.
Given current knowledge of market data, performing this volatility test is of little
interest. Market data do have volatility clustering. If a test does not show significant
volatility clustering, then either it is a small sample or the data are during a
quiescent time period. In this case we have both.
We could have performed bootstrap sampling in our test rather than random
permutations. The difference is that bootstraps sample with replacement, and
permutations sample without replacement. In either case, the time order of the
observations is lost and hence volatility clustering is lost -- thus assuring that the
samples are under the null hypothesis of no volatility clustering. The permutations
always have all of the same observations, so they are more like the original data
than bootstrap samples. The expectation is that the permutation test should be
more sensitive than a bootstrap test. The permutations destroy volatility clustering
but do not add any other variability.
Permutations can not always be used. If we are looking at the annual return
(meaning the sum of the observations), then all permutations of the data will yield
the same answer. We get zero variability in this case,
The paper Permuting Super Bow! Theory has a further discussion of random
permutation tests.
Cross Validation
‘The idea: Models should be tested with data that were not used to fit the model. If
you have enough data, it is best to hold back a random portion of the data to use for
testing. Cross validation is a trick to get out-of-sample tests but still use all the data.
The sleight of hand is to do a number of fits, each time leaving out a different
portion of the data.
Cross validation is perhaps most often used in the context of prediction. Everyone
wants to predict the stock market. So let's do
Below we predict tomorrow's IBM return with a quadratic function of today's IBM
and S&P returns.
predictors <- chind(sparet, ibmret, spxret"2, ibmret*2,
spxret * ibaret) [-251,]
predicted <- ibmret{[-1]
predicted.Im <- Im(predicted ~ predictors)
The p-value for this regression is 0.048, so it is good enough to publish in a journal.
However, you might want to hold off putting money on it until we have tested it a bit
In cross validation we divide the data into some number of groups. Then for each
group we fit the model with all of the data that are not in the group, and test that fit
with the data that are in the group. Below we divide the data into 5 groups.
group <- rep(1:5, length=250)
group <- sample (group)
mse.group <- nuneric(5)
for(i in 1:5) {
‘group.im <- In(predictedigroup !
a
20/06/2011 21:50Burns Statistics -- Statistical Bootstrap and Other Resampling Methods hitp://www.burns-stat.com/pages/Tutor/bootstrap_resampling. html
10 de 11
predictors[group '= i, 1)
ee.group[{] <- mean( (predicted{group
a
ebind(1, predictors {group == i,]) #*% cost (group.1m))*2)
The first command above repeats the numbers 1 through 5 to get a vector of length
250. We do not want to use this as our grouping because we may be capturing
systematic effects. In our case a group would be on one particular day of the week
until a holiday interrupted the pattern. Hence the next line permutes the vector so
we have random assignment, but still an equal number of observations in each
group. The for loop estimates each of the five models and computes the out-of-
sample mean squared error.
The mean squared error of the predicted vector taking only its mean into account is
0.794, The in-sample mean squared error from the regression is 0.759. This is a
modest improvement, but in the context of market prediction might be of use if the
improvement were real. However, the cross validation mean squared error is 0.800
-- even higher than from the constant model. This is evidence of overfitting (which
this regression model surely is doing).
(Practical note: Cross validation destroys the time order of the data, and is not the
best way to test this sort of model, Better is to do a backtest -- fit the model up to
some date, test the performance on the next period, move the “current” date
forward, and repeat until reaching the end of the data.)
Doing several cross validations (with different group assignments) and averaging
can be useful. See for instance: http://biostat.me.vanderbilt.edu /twiki/pub/Main
RmS/logistic.val.pdf
Simulation
In a loose sense of the word, all the techniques we've discussed can be called
simulations. Often the word is restricted to the situation of getting output from a
‘model given random inputs. For example, we might create a large number of series
that are possible 20-day price paths of a stock. Such paths might be used, for
instance, to price an option.
Random Portfolios
The idea: We can mimic the creation of a financial portfolio (containing, for example,
stocks or bonds) by satisfying a set of constraints, but otherwise allocating
randomly.
Generating random portfolios is a technique in finance that is quite similar to
bootstrapping. The process is to produce a number of portfolios of assets that
satisfy some set of constraints. The constraints may be those under which a fund
actually operates, or could be constraints that the fund hypothesizes to be useful.
If the task is to decide on the bound that should be imposed on some constraint,
then the random portfolios are precisely analogous to bootstrapping. The
distribution (of the returns or volatility or ...) is found, showing location and
variability.
Another use of random portfolios is to test if a fund exhibits skill. Random portfolios
are generated which match the constraints of the fund but have no predictive ability.
The fraction of random portfolios that outperform the actual fund is a p-value of the
20/06/2011 21:50Burns Statistics -- Statistical Bootstrap and Other Resampling Methods
lide ll
null hypothesis that the fund has zero skill. This is quite similar to a permutation
test with restrictions on the permutations.
‘See Random Portfolios in Finance for more.
Summary
When bootstrapping was invented in the late 1970's, it was outrageously
‘computationally intense. Now a bootstrap can sometimes be performed in the blink
of an eye. Bootstrapping, random permutation tests and cross validation should be
standard tools for anyone analyzing data.
Links
More material is avs
ble elsewhere. Examples include:
Bootstrapping Regression Models (pdf)
Bootstrap Methods and Permutation Teste (pdf
‘S-PLUS/R Library: Introduction to Bootstrapping (html)
Go to Burns Statistics Home,
Direct access to this article is http://www.burns-stat.com/pages/Tutor
/bootstrap_resampling.html
First Version: 2007 February 27 Last Modified: 2010 August 02
hitp://www.burns-stat.com/pages/Tutor/bootstrap_resampling. html
20/06/2011 21:50