Professional Documents
Culture Documents
Peter Caya
November 23, 2016
>
Chapter 1
1.1
Sources:
1. Elements of Statistical Learning: Chapter 7
Basic Definitions:
Define the following:
Our statistical model as: Y = f (X) +
The training data as:
(1.1)
(1.2)
When fitting a model to a data set there is a trade off that must be made
between the stability with that training data is assessed and the accuracy maintained during its generalization. This is the bias- variance tradeoff.
Definition 1: Bias:
B(), is the difference between the average estimator used for the model and the
parameter in actuality. Defined as:
B() = E[f(X)] f (X)
(1.3)
Definition 2: Variance:
V (), the degree to which small changes in the data may impact the predictions
and fit of a model:
V (f(X)) = E[f(X)2 ] E[f(X)]2
(1.4)
The noise term in a statistical model can be expressed in terms of the bias,
the variance and irreducible noise. Consider a regression model Y fitted with a
loss function:
L(X) = [(Y f(X)]2
(1.5)
The bias-variance decomposition of a regression model will have a noise term
at an arbitrary point X0 , Err(X0 ) can be broken down as follows:
Err(X0 ) = E [(Y f(X)]2 = E [(Y E(f(X)) + E(f(X)) f(X)]2 (1.6)
= E [Y E(f(X)]2 + E [E[f(X)] f(X)]2 ...
(1.7)
Note to self: This relates back to my thesis work on inverse problems where
I studied the inclusion of a regularization parameter in the OLS, MOLS and
3
1.2
N
1 X
L(yi , f(xi ))
N i=1
(1.8)
(1.9)
1.3
Resampling Methods
The next few sections discuss resampling techniques which can be used to estimate the bias and variance of a model.
1.4
Jackknife Resampling
The basic way of resampling the jacknife technqiue. It is the earliest technique
and was developed in the 1940s and 1950s. The jackknife estimate of a parameter can be found by as follows:
4
= (N 1)(. ).
B
V = N (N 1).
Lets go through an example. In R, one package that can be used is bootstrap whose jackknife function is shown below. First, lets just use the jacknife
function to get a vector of estimates of the mean of mpg:
>
>
>
>
p_load(bootstrap)
# Calculate the jackknife means, the SE and the bias for the mean.
jack_mpg <-jackknife(mtcars$mpg,theta = mean)
jack_mpg$jack.se
[1] 1.065424
> jack_mpg$jack.bias
[1] 0
Heres a more complex example: Lets express the mpg as a linear factor of
weight:
>
>
+
>
+
>
[1] 0.7263368
> jack_reg$jack.bias
wt
-0.08087151
>
>
+
+
plot(jack_reg$jack.values,type = "l" ,
main = "Weight Coefficient Based on Ommitted Observations",
xlab = "Obs.",ylab = "")
5
5.8
5.6
5.4
5.2
5.0
10
15
20
25
30
Obs.
1.5
Bootstrapping
100
0
50
Frequency
150
200
17
18
19
20
21
22
23
24
Mean
The example from the jackknife section regarding the regression coefficients
is repeated with bootstrapping repeated 1000 times:
=2)
50
100
150
200
250
300
Obs.
1.6
Cross Validation
1.6.1
1.6.2
each group randomly. After partitioning the data a model is trained on all but
one partitions. The partition which was left out in model training is then used
to estimate the training error of the model. This is done repeatedly with each
of the K partitions being left out.
After this procedure is completed the estimate of the testing error is created:
1 X
CV (f) =
L(yi , fi (xi ))
N i=1
(1.11)
1.6.3
Implementation
In the bootstrap package the function which implements cross-validation is crossval(). Lets calculate the mean using 5 folds cross validation:
21.651761
19.850435
8.084572
19.485795
22.830061
25.330239
19.107575
7.515059
16.885196
17.490136
$ngroup
[1] 5
$leave.out
[1] 6
$groups
$groups[[1]]
[1] 32 5 4 19 28
$groups[[2]]
[1] 27 14 23 21
1 29
$groups[[3]]
[1] 10 30 3 24 16 11
$groups[[4]]
[1] 2 7 18 12 25 13
$groups[[5]]
[1] 31 8 15 20
9 17 26 22
$call
9
19.986467
19.107575
25.119394
16.668645
22.220502
18.830932
15.512767
28.203608
26.678445
18.728217
17.259427
27.240421
26.525180
18.081384
17.610438
24.758539
28.727451
1.7
Exercises
1.8
ESL Exercises
10
d 2
N
(1.12)