You are on page 1of 25

Module 3.

Non-linear
machine learning
econometrics:
Tree-based estimation

THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION

Eurostat
Machine-learning non-linear estimation methods
Introduction

▪ Linear estimation methods are relatively simple to


describe and implement, however they have limitations in
terms of predictive power

▪ Linearity assumption is almost always an approximation

▪ Ridge regression and Lasso could improve the models,


but still are linear models

2
Eurostat
Machine-learning non-linear estimation methods
Introduction

When the assumption of linearity is relaxed

Non-linear models
Polinomial regression
Generalized additive models
Decision Trees
Support Vector Machines
Etc.

3
Eurostat
Machine-learning non-linear estimation methods
Introduction

When the assumption of linearity is relaxed

Non-linear models
Polinomial regression
Generalized additive models
Decision Trees
Support Vector Machines
Etc.

4
Eurostat
Machine-learning non-linear estimation methods
Introduction

When the assumption of linearity is relaxed

Non-linear models
Polinomial regression
Generalized additive models
Decision Trees
Support Vector Machines
Etc.

5
Eurostat
Tree-based estimation
Introduction

▪ Tree methods are commonly used in data science to


understand patterns within data and to build predictive
models.
▪ Tree-based methods involve segmenting the predictor
space into a number of simple binary regions

Internal nodes
Branches

Terminal nodes or
leaves 6
6
Eurostat
Tree-based estimation
Introduction

Graphically:

7
Eurostat
Tree-based estimation
Introduction
Advantages:

▪ Can be displayed graphically


▪ Very easy to explain to public
▪ Mirrors more closely human decision-making than other
methods
▪ Can handle qualitative predictors without creating dummy
variables

Disadvantage:

▪ They have lower level of accuracy than other methods


8
Eurostat
Tree-based estimation
Introduction

Source: «Making data science accessible - Machine Learning – Tree Methods” posted by Dan Kellett on
12.04.2016
9
Eurostat
Tree-based estimation
Introduction

Tree-based methods can be used for:

▪ Regression for quantitative response variables

▪ Classification for qualitative response variables

10
Eurostat
Tree-based estimation
Regression trees

How it works:

1. Divide the predictor space into J distinct and non-


overlapping regions R1, R2, …, Rj.

2. For every test observation that falls into region Rj we


predict the mean of the response values for the training
observations in Rj.

11
Eurostat
Tree-based estimation
Regression trees

How it works:

1. Divide the predictor space into J distinct and non-


overlapping regions R1, R2, …, Rj.

2. For every test observation that falls into region Rj we


predict the mean of the response values for the training
observations in Rj.

12
Eurostat
Tree-based estimation
Regression trees

R1, R2, …, Rj have to be defined such that they minimise

Residual Sum of 𝐽
RSS = σ𝑗=1 σ𝑖∈𝑅𝑗 𝑦𝑖 − 𝑦ො𝑅𝑗 2
Squares

Mean response for the


training observations within
the jth region

Problem:
It is computationally infeasible to consider every possible
partition of the feature space into J regions

Recursive binary splitting


13
Eurostat
Tree-based estimation
Regression trees: Recursive binary splitting

▪ Top-down approach (it begins at the top of the tree, i.e. all
observations fall into only one region)

▪ At each step of the tree-building process, the best split is made:


• for each threshold s and variable j the pair of half-planes
R1(j,s) = ൛𝑋|𝑋𝑗 < 𝑠ൟ and R2(j,s)= ൛𝑋|𝑋𝑗 ≥ 𝑠ൟ is defined and

• we seek the value of j and s that minimise:


RSS= σ𝑖: 𝑥𝑖∈ 𝑅 (𝑗,𝑠) 𝑦𝑖 − 𝑦ො𝑅1 2 + σ𝑖: 𝑥𝑖∈ 𝑅 (𝑗,𝑠) 𝑦𝑖 − 𝑦ො𝑅2 2
1 2

Mean response for training observations in


R1 and R2
• process stops when a stopping criterion is reached (for
instance, until no region contains more than 5 observations)
14
Eurostat
Tree-based estimation
Regression trees: Recursive binary splitting

How it works:

1. Divide the predictor space into J distinct and non-


overlapping regions R1, R2, …, Rj.

2. For every test observation that falls into region Rj we


predict the mean of the response values for the training
observations in Rj.

15
Eurostat
Tree-based estimation
Regression trees

Disadvantages:

▪ The tree may me too complex

▪ Overfitting

▪ Low test fitting

16
Eurostat
Tree-based estimation
Regression trees – tree pruning

▪ Tree-pruning means constructing a very large tree T0 and


prune it back to obtain a simpler sub-tree (smaller number
of terminals)
▪ To prune the tree: cost complexity pruning (weakest link
pruning):
for each value of  there corresponds a sub-tree T ⊂ T0
such that it minimises:

Nr of terminal nodes of T Tuning parameter

σ|𝑇|
𝑚=1 σ𝑖: 𝑥𝑖 ∈ 𝑅𝑚 𝑦𝑖 − 𝑦
ො𝑅𝑚 2 +  |T|

Subset of the predictor space corresponding to the Mean of training


m-th terminal node observations in Rm
17
Eurostat
Tree-based estimation
Regression trees – tree pruning

▪  controls the trade-off between the sub-tree’s complexity


(|T|) and its goodness-of-fit to training data:

 =0 T = T0
 increases |T| will be minimised for a smaller subtree

▪  plays a similar role to 𝜆 in Lasso regression (cost


function) and is selected by cross-validation:
• divide the training observations into K folders and for each k= 1,..K
• repeat steps on all data but the k-th folder
• evaluate the mean squared prediction error on the data in the left-out k-th fold, as a function of 
• average the results for each value of  and pick  as to minimise the average error.

18
Eurostat
Tree-based estimation
Classification trees

▪ For qualitative variables

▪ The predicted response is given by the most commonly


(=modal) occurring class of training observations (instead
of the mean response of the training observations)

▪ Interest also on the class proportion among the training


observations

▪ Classification trees can be grown just as regression trees

▪ Recursive binary splitting can be used, but instead of RSS


we use the Gini index or cross-entropy 19
Eurostat
Tree-based estimation
Classification trees

Gini index :
G= σK ො mk(1- 𝑝Ƹ mk)
k=1 p

Proportion of training observations


in the m-th region that are from the
k-th class

Cross- entropy:
D= - σK ො mk*log(𝑝Ƹ mk)
k=1 p

Measures of node purity (i.e. the node contains predominantly


observations from a single class) when the value is low

20
Eurostat
Tree-based estimation
Bootstrap and bagging

▪ As presented in Module 2, Bootstrap and bagging can be


useful for improving decision tres

B= number of samples bootstrapped (regression trees) from


the training data set
𝑓መ *b(x) = our function applied to the bth bootstrapped training
data set

1 𝐵
𝑓መ𝑏𝑎𝑔(𝑥)= σ𝑏=1 𝑓መ*b(x)
𝐵

▪ Trees are not pruned


▪ Each individual tree has high variance but low bias
21
averaging them reduces the variance
Eurostat
Tree-based estimation
Bootstrap and bagging

▪ With qualitative variables we record the class predicted


for a given test observation by each tree and take a
majority vote

▪ B chosen sufficiently large (e.g.100)

Disadvantage:
▪ Difficult to interpret the resulting model

22
Eurostat
Tree-based estimation
Bootstrap and bagging

Variable importance measure:


▪ quantitative variables: total amount that RSS is decreased
due to splits over a given predictor, averaged over all B
trees
▪ Qualitative variables: total amount that Gini index is
decreased due to splits over a given predictor, averaged
over all B trees

high value means important variable

23
Eurostat
Tree-based estimation
Random forest

▪ Random forest method is an improvement of the bagging


method

▪ It considers only a subset (random sample) of m predictors


at each split (bagging considers all the predictors)

▪ Usually m = 𝑝 where p = total number of predictors (if m


= p then we have bagging)

▪ It permits to decorrelate the trees (helpful when we have a


large number of correlated predictors)

24
Eurostat
Tree-based estimation
Random forest

How random forest works:

▪ Construct B regression trees using B bootstrapped training


sets

▪ For each b: 1..B the training set will only consider a subset
m of the available p predictors

▪ Average the resulting predictions

25
Eurostat

You might also like