Lecture9 Inference Bootstrap

Bootstrapping and Confidence Intervals
CS109A Introduction to Data Science

Pavlos Protopapas, Kevin Rader and Chris Tanner
Bootstrap
In the lack of active imagination, parallel universes and the likes, we need an
alternative way of producing fake data set that resemble the parallel universes.
Bootstrapping is the practice of sampling from the observed data (X,Y) in estimating
statistical properties.
CS109A, PROTOPAPAS, RADER, TANNER 3

Bootstrap
Imagine we have 5 billiard balls in a bucket.

Bootstrap
We first pick randomly a ball and replicate it. This is called sampling with
replacement. We move the replicated ball to another bucket.

Bootstrap

Bootstrap
We then randomly pick another ball and again we replicate it.

As before, we move the replicated ball to the other bucket.

Bootstrap

Bootstrap
We repeat this process.

Bootstrap
Again

Bootstrap
And again

Bootstrap
We continue until the “other” bucket has the same number of balls as the
original one.
This new bucket represents a new parallel universe

Bootstrap
We repeat the same process and acquire another sample.

Bootstrap
We repeat the same process and acquire another sample.

Bootstrap
Sample 1
Sample with replacement

Size N Size N
Sample 2
Size N
Dataset Sample 3
Size N
CS109A, PROTOPAPAS, RADER, TANNER 15 15
Bootstrapping for Estimating Sampling Error
Definition
Bootstrapping is the practice of estimating properties of an estimator
by measuring those properties by, for example, sampling from the
observed data.
For example, we can compute and multiple times by randomly

sampling from our data set. We then use the variance of our multiple
estimates to approximate the true variance of and .

Bootstrap
train
Model 1:
𝑠
Size N 1
Sample with replacement
^𝛽(𝑖)
𝜇 ^𝛽 = ∑
𝑠 𝑖=1
train
√
Model 2:
𝑠
1 ^ (𝑖 ) 2
𝜎 ^𝛽= ∑ ( 𝛽 − 𝛽 )
𝑠 𝑖=1
Dataset train
Model s:
CS109A, PROTOPAPAS, RADER, TANNER PAVLOS PROTOPAPAS 17

Confidence intervals for the predictors estimates: Standard Errors
We can empirically estimate the standard deviations which are called the
standard errors, through bootstrapping.
Alternatively:
If we know the variance of the noise , we can compute analytically using
the formulae below (no need to bootstrap):
√ 1 𝑥
2 Where is the number of
𝑆𝐸 ( ^
𝛽 0 )=𝜎 𝜖 + observations
𝑛 ∑ ( 𝑥𝑖− 𝑥)2
𝑖
𝜎𝜖
is the mean value of the
𝑆𝐸 (^
𝛽 1) = predictor.
√ ∑ 2
( 𝑥𝑖 − 𝑥 )
𝑖
Standard Errors
√
2
1 𝑥
More data: and 𝑆𝐸 ( ^
𝛽 0 ) =𝜎 𝜖 +
𝑛 ∑ ( 𝑥𝑖− 𝑥)2
Larger coverage: or 𝑖
𝜎 (𝜖 )
Better data: 𝑆𝐸 ( ^
𝛽 1) =
√ ∑ ( 𝑥
𝑖
𝑖 − 𝑥 )
2
√
^
( 𝑓 ( 𝑥 ) − 𝑦𝑖 )
2
Better model:
𝜎 ( 𝜖 )= ∑
𝑛−2
Question: What happens to the , under these scenarios?

Standard Errors
In practice, we do not know the value of since we do not know the exact distribution of
the noise .
However, if we make the following assumptions,
• the errors and are uncorrelated, for ,
• each has a mean 0 and variance ,
then, we can empirically estimate , from the data and our regression line:
√ √
2
𝑛∙ 𝑀𝑆𝐸 ( ^𝑓 ( 𝑥 ) − 𝑦 𝑖 )
𝜎 𝜖=
𝑛− 2
=¿ ∑ 𝑛− 2
¿
Remember: CS109A, PROTOPAPAS, RADER, TANNER 20

Standard Errors
The following results are for the coefficients for TV advertising:
Method
Analytic Formula 0.0061
Bootstrap 0.0061
The coefficients for TV advertising but restricting the coverage of x are:

Method
Analytic Formula 0.0068
SE increase
Bootstrap 0.0068
The coefficients for TV advertising but with added extra noise:

Method
Analytic Formula 0.028 SE increase
Bootstrap 0.023

Confidence intervals for the predictors estimates (cont)
We can now estimate the mean and standard deviation of the estimates of .

Confidence intervals for the predictors estimates (cont)
The standard errors give us a sense of our uncertainty over our estimates.
Typically we express this uncertainty as a 95% confidence interval, which is the
range of values such that the true value of is contained in this interval with 95%
percent probability.
95%
68%
𝑪 𝑰 ^𝜷 =( ^
𝜷− 𝟐 𝝈 ^𝜷 , ^
𝜷+𝟐 𝝈 ^𝜷 )

24

Lecture9 Inference Bootstrap

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture9 Inference Bootstrap

Uploaded by

Copyright:

Available Formats

Bootstrapping and Confidence Intervals

CS109A Introduction to Data Science

CS109A, PROTOPAPAS, RADER, TANNER 3

Imagine we have 5 billiard balls in a bucket.

CS109A, PROTOPAPAS, RADER, TANNER 4

CS109A, PROTOPAPAS, RADER, TANNER 5

CS109A, PROTOPAPAS, RADER, TANNER 6

We then randomly pick another ball and again we replicate it.

CS109A, PROTOPAPAS, RADER, TANNER 7

CS109A, PROTOPAPAS, RADER, TANNER 8

We repeat this process.

CS109A, PROTOPAPAS, RADER, TANNER 9

CS109A, PROTOPAPAS, RADER, TANNER 10

CS109A, PROTOPAPAS, RADER, TANNER 11

This new bucket represents a new parallel universe

CS109A, PROTOPAPAS, RADER, TANNER 12

We repeat the same process and acquire another sample.

CS109A, PROTOPAPAS, RADER, TANNER 13

We repeat the same process and acquire another sample.

CS109A, PROTOPAPAS, RADER, TANNER 14

Sample with replacement

For example, we can compute and multiple times by randomly

CS109A, PROTOPAPAS, RADER, TANNER 16

CS109A, PROTOPAPAS, RADER, TANNER PAVLOS PROTOPAPAS 17

CS109A, PROTOPAPAS, RADER, TANNER 19 19

Remember: CS109A, PROTOPAPAS, RADER, TANNER 20

The coefficients for TV advertising but restricting the coverage of x are:

The coefficients for TV advertising but with added extra noise:

CS109A, PROTOPAPAS, RADER, TANNER 21 21

CS109A, PROTOPAPAS, RADER, TANNER 22 22

CS109A, PROTOPAPAS, RADER, TANNER 23 23

You might also like