Resampling PDF

Bias, Variance and Resampling
Peter Caya
November 23, 2016

>
2
Chapter 1
1.1 Bias and Variance

Sources:
1. Elements of Statistical Learning: Chapter 7
Basic Definitions:
Define the following:
Our statistical model as: Y = f (X) + (1.1)

The training data as: (1.2)
When fitting a model to a data set there is a trade off that must be made
between the stability with that training data is assessed and the accuracy main-
tained during its generalization. This is the bias- variance tradeoff.
Definition 1: Bias:
B(), is the difference between the average estimator used for the model and the
parameter in actuality. Defined as:
B() = E[f(X)] f (X) (1.3)

Definition 2: Variance:
V (), the degree to which small changes in the data may impact the predictions
and fit of a model:
V (f(X)) = E[f(X)2 ] E[f(X)]2 (1.4)

The noise term in a statistical model can be expressed in terms of the bias,
the variance and irreducible noise. Consider a regression model Y fitted with a
loss function:
L(X) = [(Y f(X)]2 (1.5)
The bias-variance decomposition of a regression model will have a noise term
at an arbitrary point X0 , Err(X0 ) can be broken down as follows:
Err(X0 ) = E [(Y f(X)]2 = E [(Y E(f(X)) + E(f(X)) f(X)]2 (1.6)

= E [Y E(f(X)]2 + E [E[f(X)] f(X)]2 ...

(1.7)
Note to self: This relates back to my thesis work on inverse problems where
I studied the inclusion of a regularization parameter in the OLS, MOLS and
3
EE models in order to reduce the noise in a model. As a result of this bias
was introduced into the model (IE, it could never be exactly dead on) but the
jaggedness of the model was reduced. In the statistical contenxt, I was studying
ridge regression and the identification of the proper regularization parameter
which minimized the bias-variance tradeoff to provide the best fit.
Model Complexity Look up a definition here.
1.2 Training and Testing Error
Let X be the population of independent variables and Y be the population of

dependent variables that we seek to predict. Assume that X and Y are drawn
at random from their joint distribution. Under these standards the following
definitions are introduced:
Definition 3: Training Error:
, the average loss over the training sample.
N
1 X
err
= L(yi , f(xi )) (1.8)
N i=1
Definition 4: Test Error:
The prediction error of a model over an independent test sample:
Err = E L(Y, f(X))|

(1.9)
In this case we assume is fixed.

The measure of how useful a model really is involves determining what this
value will be. However, this isnt always possible so we instead try to find the
expected prediction error as an approximation:
Err = E L(Y, f(X)) = E Err

(1.10)
The training error is a poor predictor of the testing error since we are esti-
mating the innaccuracies in terms of a model which was built using both the
signal and the noise of the training data. As a result, aspects of the training
set which are coincidental may be in the parameters of the parametrized model
which will lead to greater innaccuracy when the model is used on the testing
data.
1.3 Resampling Methods

The next few sections discuss resampling techniques which can be used to esti-
mate the bias and variance of a model.
1.4 Jackknife Resampling

The basic way of resampling the jacknife technqiue. It is the earliest technique
and was developed in the 1940s and 1950s. The jackknife estimate of a param-
eter can be found by as follows:
4
1. For a set of N observations, construct a subsample omitting observation
i. Call this subsample xi .
2. Parametrize the model using the subsample: f (xi ).
3. Use the parameters (in this case, say with a parameter using all obser-
to estimate B and V (estimates denoted with the hat symbol).
vations )
:

= (N 1)(. ).
B
V = N (N 1).
Lets go through an example. In R, one package that can be used is boot-

strap whose jackknife function is shown below. First, lets just use the jacknife
function to get a vector of estimates of the mean of mpg:
> p_load(bootstrap)
> # Calculate the jackknife means, the SE and the bias for the mean.
> jack_mpg <-jackknife(mtcars$mpg,theta = mean)
> jack_mpg$jack.se
[1] 1.065424
> jack_mpg$jack.bias
[1] 0
Heres a more complex example: Lets express the mpg as a linear factor of
weight:
> mtcars_model <- formula(mpg~wt)

> theta <- function(x,xdata,coef_number){ coef(lm(mtcars_model,
+ data = xdata[x,]))[coef_number]}
> jack_reg<- jackknife(1:dim(mtcars)[1],
+ theta,xdata = mtcars, coef_number =2)
> jack_reg$jack.se
[1] 0.7263368
> jack_reg$jack.bias
wt
-0.08087151
>
> plot(jack_reg$jack.values,type = "l" ,

+ main = "Weight Coefficient Based on Ommitted Observations",
+ xlab = "Obs.",ylab = "")
5
Weight Coefficient Based on Ommitted Observations
5.0
5.2
5.4
5.6
5.8
0 5 10 15 20 25 30
Obs.
1.5 Bootstrapping
Bootstrapping is a resampling technique where random subsampling is used with

replacement on a population. A simple example occurs when a large same N is
resampled randomly with replacement and is then used for statistical testing.
Where the population size is large the resampled data will not be identical to
the original set. The procedure is iterated a large number of times and is used
to build a distribution of results. One big advantage of bootstrapping is that it
allows the user to approach data
For this very simple method of bootstrapping, consider the example below:
> mean_boot <- bootstrap(mtcars$mpg,1000,mean)
> hist(mean_boot$thetastar, main = "Bootstrap Results of MPG Mean",

+ xlab = "Mean",ylim = c(0,225))
6
Bootstrap Results of MPG Mean
200
150
Frequency
100
50
0
17 18 19 20 21 22 23 24
Mean
The example from the jackknife section regarding the regression coefficients
is repeated with bootstrapping repeated 1000 times:
> reg_boot <- bootstrap(x=1:dim(mtcars)[1],nboot = 1000, theta,xdata = mtcars, coef_number =2)
> hist(reg_boot$thetastar, xlab = "Obs.",ylab = "",

+ main = "Weight Coefficient Based on Ommitted Observations")
7
Weight Coefficient Based on Ommitted Observations
300
250
200
150
100
50
0
9 8 7 6 5 4 3
Obs.
1.6 Cross Validation

Cross validation is the third major resampling technique. Instead of iteratively
working with the entire population as in jackknife and bootstrap resamplings,
cross-validation breaks the data into a set which is used for training and a set
which is used to test the models fit.
1.6.1 Leave P Out Cross-Validation

This method of cross validation begins with a sample of N observations. Of these
observations, p are omitted from the calculation. The calculation is performed
repeatedly where each time some combination of omitted values is left out. This
is repeated until all of the possible omission permutations are exhausted.
Two things should be noted:
This is essentially the jackknife method with more than one omission.
This becomes computationally intensive extremely quickly. The number

of cross-validation sets produced with this method becomes: Np .
1.6.2 K-Folds Cross Validation

A more commonly used method to determine the amount of training error
present in a model follows from the method described previously. Consider the
case where we have a dataset of N observations. Of these N observations, we
can create a list of K different subsets of the data which are sorted allocated to
8
each group randomly. After partitioning the data a model is trained on all but
one partitions. The partition which was left out in model training is then used
to estimate the training error of the model. This is done repeatedly with each
of the K partitions being left out.
After this procedure is completed the estimate of the testing error is created:
1 X
CV (f) = L(yi , fi (xi )) (1.11)
N i=1
Where i denotes the partition that is being left out.
1.6.3 Implementation
In the bootstrap package the function which implements cross-validation is cross-
val(). Lets calculate the mean using 5 folds cross validation:
> theta_func <- function(x,y) {lm(y~x)}
> theta_predict <-function(model_fit,x){cbind(1,x)%*% model_fit$coef}
> crossval(x = mtcars$wt,y=mtcars$mpg,theta.fit = theta_func, theta.predict = theta_predict,ngr
$cv.fit
[1] 23.915987 21.651761 25.330239 19.986467 18.830932 18.728217 18.081384
[8] 19.625645 19.850435 19.107575 19.107575 15.512767 17.259427 17.610438
[15] 8.048937 8.084572 7.515059 25.119394 28.203608 27.240421 24.758539
[22] 17.771124 19.485795 16.885196 16.668645 26.678445 26.525180 28.727451
[29] 20.926287 22.830061 17.490136 22.220502
$ngroup
[1] 5
$leave.out
[1] 6
$groups
$groups[[1]]
[1] 32 5 4 19 28 6
$groups[[2]]
[1] 27 14 23 21 1 29
$groups[[3]]
[1] 10 30 3 24 16 11
$groups[[4]]
[1] 2 7 18 12 25 13
$groups[[5]]
[1] 31 8 15 20 9 17 26 22
$call
9
crossval(x = mtcars$wt, y = mtcars$mpg, theta.fit = theta_func,
theta.predict = theta_predict, ngroup = 5)
1.7 Exercises
Program a Jackknife Algorithm which takes a function, a sample data
set and returns an estimate of the testing error.
Program a Bootstrapping Algorithm which takes a function, a sample

data set and returns an estimate of the testing error.
Program a cross-validation algorithm which takes a function, a sample

data set and returns an estimate of the testing error.
1.8 ESL Exercises

Derive the estimate of in-sample error:
d 2
+2
Ey (Errin ) = Ey (err) (1.12)
N
10

Resampling PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Resampling PDF

Uploaded by

Copyright:

Available Formats

Bias, Variance and Resampling

November 23, 2016

1.1 Bias and Variance

Our statistical model as: Y = f (X) +  (1.1)

B() = E[f(X)] f (X) (1.3)

V (f(X)) = E[f(X)2 ] E[f(X)]2 (1.4)

Err(X0 ) = E [(Y f(X)]2 = E [(Y E(f(X)) + E(f(X)) f(X)]2 (1.6)

= E [Y E(f(X)]2 + E [E[f(X)] f(X)]2 ...

Model Complexity Look up a definition here.

1.2 Training and Testing Error

Let X be the population of independent variables and Y be the population of

Err = E L(Y, f(X))|

In this case we assume is fixed.

1.3 Resampling Methods

1.4 Jackknife Resampling

2. Parametrize the model using the subsample: f (xi ).

Lets go through an example. In R, one package that can be used is boot-

> mtcars_model <- formula(mpg~wt)

> plot(jack_reg$jack.values,type = "l" ,

Bootstrapping is a resampling technique where random subsampling is used with

> mean_boot <- bootstrap(mtcars$mpg,1000,mean)

> hist(mean_boot$thetastar, main = "Bootstrap Results of MPG Mean",

> reg_boot <- bootstrap(x=1:dim(mtcars)[1],nboot = 1000, theta,xdata = mtcars, coef_number =2)

> hist(reg_boot$thetastar, xlab = "Obs.",ylab = "",

1.6 Cross Validation

1.6.1 Leave P Out Cross-Validation

This becomes computationally intensive extremely quickly. The number

1.6.2 K-Folds Cross Validation

Where i denotes the partition that is being left out.

Program a Bootstrapping Algorithm which takes a function, a sample

Program a cross-validation algorithm which takes a function, a sample

1.8 ESL Exercises

You might also like

Our statistical model as: Y = f (X) + (1.1)