You are on page 1of 23

Bootstrapping and Confidence Intervals

CS109A Introduction to Data Science


Pavlos Protopapas, Kevin Rader and Chris Tanner
Bootstrap

In the lack of active imagination, parallel universes and the likes, we need an
alternative way of producing fake data set that resemble the parallel universes.

Bootstrapping is the practice of sampling from the observed data (X,Y) in estimating
statistical properties.

CS109A, PROTOPAPAS, RADER, TANNER 3


Bootstrap

Imagine we have 5 billiard balls in a bucket.

CS109A, PROTOPAPAS, RADER, TANNER 4


Bootstrap

We first pick randomly a ball and replicate it. This is called sampling with
replacement. We move the replicated ball to another bucket.

CS109A, PROTOPAPAS, RADER, TANNER 5


Bootstrap

CS109A, PROTOPAPAS, RADER, TANNER 6


Bootstrap

We then randomly pick another ball and again we replicate it.


As before, we move the replicated ball to the other bucket.

CS109A, PROTOPAPAS, RADER, TANNER 7


Bootstrap

CS109A, PROTOPAPAS, RADER, TANNER 8


Bootstrap

We repeat this process.

CS109A, PROTOPAPAS, RADER, TANNER 9


Bootstrap

Again

CS109A, PROTOPAPAS, RADER, TANNER 10


Bootstrap

And again

CS109A, PROTOPAPAS, RADER, TANNER 11


Bootstrap
We continue until the “other” bucket has the same number of balls as the
original one.

This new bucket represents a new parallel universe

CS109A, PROTOPAPAS, RADER, TANNER 12


Bootstrap

We repeat the same process and acquire another sample.

CS109A, PROTOPAPAS, RADER, TANNER 13


Bootstrap

We repeat the same process and acquire another sample.

CS109A, PROTOPAPAS, RADER, TANNER 14


Bootstrap

Sample 1

Sample with replacement


Size N Size N

Sample 2
Size N

Dataset Sample 3
Size N
CS109A, PROTOPAPAS, RADER, TANNER 15 15
Bootstrapping for Estimating Sampling Error

Definition
Bootstrapping is the practice of estimating properties of an estimator
by measuring those properties by, for example, sampling from the
observed data.

For example, we can compute and multiple times by randomly


sampling from our data set. We then use the variance of our multiple
estimates to approximate the true variance of and .

CS109A, PROTOPAPAS, RADER, TANNER 16


Bootstrap

train
Model 1:
𝑠
Size N 1
Sample with replacement

^𝛽(𝑖)
𝜇 ^𝛽 = ∑
𝑠 𝑖=1
train


Model 2:
𝑠
1 ^ (𝑖 ) 2
𝜎 ^𝛽= ∑ ( 𝛽 − 𝛽 )
𝑠 𝑖=1
Dataset train
Model s:

CS109A, PROTOPAPAS, RADER, TANNER PAVLOS PROTOPAPAS 17


Confidence intervals for the predictors estimates: Standard Errors

We can empirically estimate the standard deviations which are called the
standard errors, through bootstrapping.
Alternatively:
If we know the variance of the noise , we can compute analytically using
the formulae below (no need to bootstrap):

√ 1 𝑥
2 Where is the number of
𝑆𝐸 ( ^
𝛽 0 )=𝜎 𝜖 + observations
𝑛 ∑ ( 𝑥𝑖− 𝑥)2
𝑖

𝜎𝜖
is the mean value of the
𝑆𝐸 (^
𝛽 1) = predictor.
√ ∑ 2
( 𝑥𝑖 − 𝑥 )
𝑖
CS109A, PROTOPAPAS, RADER, TANNER 18
Standard Errors


2
1 𝑥
More data: and 𝑆𝐸 ( ^
𝛽 0 ) =𝜎 𝜖 +
𝑛 ∑ ( 𝑥𝑖− 𝑥)2
Larger coverage: or 𝑖

𝜎 (𝜖 )
Better data: 𝑆𝐸 ( ^
𝛽 1) =
√ ∑ ( 𝑥
𝑖
𝑖 − 𝑥 )
2


^
( 𝑓 ( 𝑥 ) − 𝑦𝑖 )
2
Better model:
𝜎 ( 𝜖 )= ∑
𝑛−2
Question: What happens to the , under these scenarios?

CS109A, PROTOPAPAS, RADER, TANNER 19 19


Standard Errors

In practice, we do not know the value of since we do not know the exact distribution of
the noise .
However, if we make the following assumptions,
• the errors and are uncorrelated, for ,
• each has a mean 0 and variance ,
then, we can empirically estimate , from the data and our regression line:

√ √
2
𝑛∙ 𝑀𝑆𝐸 ( ^𝑓 ( 𝑥 ) − 𝑦 𝑖 )
𝜎 𝜖=
𝑛− 2
=¿ ∑ 𝑛− 2
¿

Remember: CS109A, PROTOPAPAS, RADER, TANNER 20


Standard Errors
The following results are for the coefficients for TV advertising:
Method
Analytic Formula 0.0061
Bootstrap 0.0061

The coefficients for TV advertising but restricting the coverage of x are:


Method
Analytic Formula 0.0068
SE increase
Bootstrap 0.0068

The coefficients for TV advertising but with added extra noise:


Method
Analytic Formula 0.028 SE increase
Bootstrap 0.023

CS109A, PROTOPAPAS, RADER, TANNER 21 21


Confidence intervals for the predictors estimates (cont)
We can now estimate the mean and standard deviation of the estimates of .

CS109A, PROTOPAPAS, RADER, TANNER 22 22


Confidence intervals for the predictors estimates (cont)
The standard errors give us a sense of our uncertainty over our estimates.
Typically we express this uncertainty as a 95% confidence interval, which is the
range of values such that the true value of is contained in this interval with 95%
percent probability.

95%
68%

𝑪 𝑰 ^𝜷 =( ^
𝜷− 𝟐 𝝈 ^𝜷 , ^
𝜷+𝟐 𝝈 ^𝜷 )

CS109A, PROTOPAPAS, RADER, TANNER 23 23


24

You might also like