# Bootstrapping

30/01/2002

Bootstrapping
Let's begin with a dictionary definition of bootstrap:
A data-based simulation method for statistical inference, which can be used to study the variability of values of a
set of observations and provide confidence interbvals for for parameters in situations where these are difficult or
impossible to to derive analytically. The basic idea involves sampling with replacement to produce random
samples of size n from the original data. Each of these samples is known as a bootstrap sample and each provides
and estimate of the parameter of interest. Repeating the sampling a large number of times provides information
on the variability of the estimator.
Say that we wanted to test whether or not the write test score came from a population with median of 50. Let's try the
bootstrap command by estimating the standard error of the median.
use http://www.ats.ucla.edu/stat/stata/notes/hsb2
bs "summarize write, detail" "r(p50)", reps(400)
command:
statistic:
(obs=200)

## summarize write, detail

r(p50)

Bootstrap statistics
Variable |
Reps
Observed
Bias
Std. Err.
[95% Conf. Interval]
---------+------------------------------------------------------------------bs1 |
400
54
.11
.8838002
52.26251 55.73749 (N)
|
52
57 (P)
|
54
57 (BC)
----------------------------------------------------------------------------N = normal, P = percentile, BC = bias-corrected

Now let's compare the results of a bootstrap estimate with the analytically derived standard error uing the ci command.
ci write
Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
---------+------------------------------------------------------------write |
200
52.775
.6702372
51.45332
54.09668
bs "summarize write" "r(mean)", reps(400)
command:
statistic:
(obs=200)

summarize write
r(mean)

Bootstrap statistics
Variable |
Reps
Observed
Bias
Std. Err.
[95% Conf. Interval]
---------+------------------------------------------------------------------bs1 |
400
52.775 -.0054515
.6444644
51.50803 54.04197 (N)
|
51.49
53.9525 (P)
|
51.485
53.945 (BC)
----------------------------------------------------------------------------N = normal, P = percentile, BC = bias-corrected

One last time, let's find the standard error of the coefficient of variation.
bs "summarize write" "r(sd)/r(mean)*100", reps(400)

