You are on page 1of 32

Module 2.

Introduction to
machine learning
econometrics

THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION

Eurostat
Computational issues
Introduction

In a data-rich situation, the best approach would be:

50% 25% 25%

Training set Validation set Test set

Used to fit Used to Used for


the models estimate error assessment of
for model the final chosen
selection model 2
Eurostat
Computational issues
The validation set approach

n observations

1, 2, 3, … ….., n-2, n-1, n


n/2 observations n/2 observations

training set validation set

Machine learning Fitted model


model
3
Eurostat
Computational issues
The validation set approach

Disadvantages:
▪ The validation estimates can be higly variable, depending
on which observations are icluded in the training set

▪ The validation set error rate may overestimate the test


error rate, since statistical methods tend to perform
worse when trained on fewer observations

4
Eurostat
Computational issues
The validation set approach

Disadvantages:
▪ The validation estimates can be higly variable, depending
on which observations are icluded in the training set

▪ The validation set error rate may overestimate the test


error rate, since statistical methods tend to perform
worse when trained on fewer observations

5
Eurostat
Computational issues
Cross-validation

Allternatives to the validation set approach:

▪ Leave-one-out cross-validation (LOOCV)

▪ K-Fold cross-validation

6
Eurostat
Computational issues
Cross-validation: leave-one-out cross-validation

1 n-1

validation set training set

Fitted model Machine learning


model
7
Eurostat
Computational issues
Cross-validation: leave-one-out cross-validation

LOOCV estimate for the test MSE


1 𝑛
CV(𝑛) = σ ሶ 𝑀𝑆𝐸𝑖
𝑛 𝑖=1

8
Eurostat
Computational issues
Cross-validation: leave-one-out cross-validation

Advantages:

▪ Not overestimate the test error rate


▪ No randomness in the training/validation set Split

Disadvantage:
▪ It can be very time consuming if n large

9
Eurostat
Computational issues
Cross-validation: K-Fold cross-validation

k=5
fold k k-1 folds

validation set training set

Fitted model Machine learning


model
10
Eurostat
Computational issues
Cross-validation: K-Fold cross-validation

K-foldCV estimate for the test MSE:


1
CV(𝑘) = σ𝑘𝑖=1
ሶ 𝑀𝑆𝐸𝑖
𝑘

If k=n LOOCV
Usually k=5 or k=10

11
Eurostat
Computational issues
Cross-validation: K-Fold cross-validation

Advantages:

▪ Not time consuming


▪ More accurate estimates of the test error rate
▪ Lower variance

12
Eurostat
Computational issues
Assessing model fitting

▪ There is not a single best method

▪ Mean squared error (MSE):

1
MSE = σ𝑛𝑖=1(𝑦𝑖 − 𝑓መ 𝑥𝑖 ) 2
𝑛 Prediction that 𝒇෠
gives on 𝑥𝑖
Training MSE

▪ The model fits the data if MSE is low

▪ We are more interested in the test MSE (MSE we


obtain when apply the method to unseen data)
13
Eurostat
Computational issues
Assessing model fitting

▪ Minimise the test MSE

▪ Problem: (I) sometimes test observations are not available


(II) Sometimes training MSE is small while test
MSE is larger

14
Eurostat
Computational issues
Assessing model fitting

Example

Black curve: real f


12

Orange curve: linear regression


10

Blue and green curves:


8
Y

smoothing splines with two


different level of smoothness
6
4
2

0 20 40 60 80 100

X
15
Eurostat
Computational issues
Assessing model fitting
2.5

Grey curve: Training MSE


2.0

Red curve: Test MSE


Mean Squared Error

1.5

Squared dots: value of MSE


1.0

for the three applied methods


0.5

Horizontal line: irreducible


error Var(ɛ) i.e. minimum
0.0

2 5 10 20
value that test MSE can get
Flexibility

Overfitting the data


The blue curve from the previous slide is the model
best fitting the data 16
Eurostat
Computational issues
Assessing model fitting

Remarks:

▪ If training MSE is small but test MSE large overfitting

▪ Training MSE almost always smaller than test MSE, because


machine learning packages almost seek to minimize
training MSE

▪ Usually test data are not available

17
Eurostat
Computational issues
Replication or resampling: Bootstrap

Bootstrap: general procedure for assess statistical accuracy


of a parameter estimate or prediction

zi = (xi, yi)
B= number of
bootstrapped
samples
S(Z*b)= quantity
of interest

18
Eurostat
Computational issues
Replication or resampling: Bagging

Bagging (or Bootstrap AGGregation): Improvement of the


bootstrap. Procedure for reducing the variance

Training data Z= ൛ 𝑥1 , 𝑦1 , (𝑥2 , 𝑦2 ),…(𝑥N , 𝑦N )}


𝑓መ *b(x) = our function applied to the bth bootstrapped training
data set

1 𝐵
𝑓መ𝑏𝑎𝑔(𝑥)= σ𝑏=1 𝑓መ*b(x)
𝐵

The bagging averages the prediction 𝑓መ *b(x) over the bootstrap


samples
19
Eurostat
Computational issues
Replication or resampling: Bumping

Bumping: Technique for finding a better single model. It used


bootstrap sampling to choose the model that best fits the
data.

Z*1, Z*2,…Z*b ,..., Z*B: Bootstrapped samples


𝑓መ *b(x)= function applied to the bth bootstrapped training data
set, for each b: 1..B

The best model is the one that produces the smallest


prediction error, averaged over the original training set.

20
Eurostat
Computational issues
Replication or resampling: Bumping
The best model is from the b෠ bootstrap sample where:
𝑁

𝑏෠ = arg min ෍ 𝑦𝑖 − 𝑓መ ∗b(𝑥𝑖 )


b
𝑖=1

^
∗ 𝑏Ƹ
The model predictions are 𝑓መ (x)

Remark:
▪ The original training sample is included in the bootstrapped
samples, so the model could pick it if it has the lowest
training error

21
Eurostat
Machine learning linear estimation
Introduction

Standard linear model: f(X) = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝 +ɛ

The standard method to fit this model is the least squares.

Remarks:
▪ Prediction accuracy:
➢ low bias
➢ If n ≫ p also low variance
However:
➢ If n > p high variance
➢ If p >n the method cannot be used

22
Eurostat
Machine learning linear estimation
Introduction

▪ Model interpretability:
➢ The model sometimes includes irrelevant variables

The complexity increases and the interpretability is


not easy

23
Eurostat
Machine learning linear estimation
Introduction

Alternative: Shrinkage methods

▪ Technique to improve a least-squares estimator which


consists in reducing the variance by adding constraints on
the value of coefficients.

▪ Only those variables that improve the fit deserve to take a


coefficient bigger than zero and consequently appear in the
fitted linear model.

24
Eurostat
Machine learning linear estimation
Shrinkage methods

Best shrinkage techniques:

▪ Ridge regression

▪ Lasso

25
Eurostat
Machine learning linear estimation
Shrinkage methods

Best shrinkage techniques:

▪ Ridge regression

▪ Lasso

26
Eurostat
Machine learning linear estimation
Shrinkage methods – Ridge regression

In ridge regression, the 𝛽መ 𝑅 minimize the following:

σ𝑛𝑖=1 𝑦𝑖 − 𝛽0 − σ𝑝𝑗=1 𝛽𝑗 𝑥𝑖𝑗 2 𝑝


+ 𝜆 σ𝑗=1 𝛽𝑗 2

Tuning parameter
shrinkage penalty

For each value of 𝜆 there is a set of coefficient estimates 𝛽


෢𝜆𝑅

If:
𝜆 =0 shrinkage penality is null and 𝛽መ 𝑅 = 𝛽መ
𝜆 →∞ shrinkage penality grows and 𝛽መ 𝑅 → 0
27
Eurostat
Machine learning linear estimation
Shrinkage methods – Ridge regression

𝛽መ 𝑅 are not scale invariant standardise the predictors xij


𝑥𝑖𝑗
𝑥෦
𝑖𝑗 = 1 𝑛 2
σ 𝑥𝑖𝑗 − 𝑥𝑗
𝑛 𝑖=1

28
Eurostat
Machine learning linear estimation
Shrinkage methods – Ridge regression

Advantages:

60
as 𝜆 increases

Mean Squared Error


50

the variance decreases

40
(but the bias increases)

30
20
10
▪ It can be used when p>n

0
1e−01 1e+01 1e+03
Disadvantage: λ

Black line: squared bias


▪ All p predictors are included in
Green line: variance
the model challenge for Purple line: test means squared
interpretability when p is high error 29
Eurostat
Machine learning linear estimation
Shrinkage methods

Best shrinkage techniques:

▪ Ridge regression

▪ Lasso

30
Eurostat
Machine learning linear estimation
Shrinkage methods – Lasso regression

In Lasso regression, the 𝛽መ 𝐿 minimize the following:

σ𝑛𝑖=1 𝑦𝑖 − 𝛽0 − σ𝑝𝑗=1 𝛽𝑗 𝑥𝑖𝑗 2 𝑝


+ 𝜆 σ𝑗=1 𝛽𝑗

Tuning parameter shrinkage penalty

ℓ1 penalty

ℓ1 penalty forces some 𝛽መ 𝐿 =0 when 𝜆 is high

31
Eurostat
Machine learning linear estimation
Shrinkage methods –Selection of 𝜆

Steps:
▪ Define a grid of values for 𝜆

▪ Calculate cross-validation error for each value of 𝜆

▪ Select 𝜆 for which the cross-validation error is the


smallest

▪ Re-fit the model with Ridge or Lasso regression using the


selected 𝜆

32
Eurostat

You might also like