Professional Documents
Culture Documents
EUC1502 Module3 Machine-Learning
EUC1502 Module3 Machine-Learning
Non-linear
machine learning
econometrics:
Tree-based estimation
THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION
Eurostat
Machine-learning non-linear estimation methods
Introduction
2
Eurostat
Machine-learning non-linear estimation methods
Introduction
Non-linear models
Polinomial regression
Generalized additive models
Decision Trees
Support Vector Machines
Etc.
3
Eurostat
Machine-learning non-linear estimation methods
Introduction
Non-linear models
Polinomial regression
Generalized additive models
Decision Trees
Support Vector Machines
Etc.
4
Eurostat
Machine-learning non-linear estimation methods
Introduction
Non-linear models
Polinomial regression
Generalized additive models
Decision Trees
Support Vector Machines
Etc.
5
Eurostat
Tree-based estimation
Introduction
Internal nodes
Branches
Terminal nodes or
leaves 6
6
Eurostat
Tree-based estimation
Introduction
Graphically:
7
Eurostat
Tree-based estimation
Introduction
Advantages:
Disadvantage:
Source: «Making data science accessible - Machine Learning – Tree Methods” posted by Dan Kellett on
12.04.2016
9
Eurostat
Tree-based estimation
Introduction
10
Eurostat
Tree-based estimation
Regression trees
How it works:
11
Eurostat
Tree-based estimation
Regression trees
How it works:
12
Eurostat
Tree-based estimation
Regression trees
Residual Sum of 𝐽
RSS = σ𝑗=1 σ𝑖∈𝑅𝑗 𝑦𝑖 − 𝑦ො𝑅𝑗 2
Squares
Problem:
It is computationally infeasible to consider every possible
partition of the feature space into J regions
▪ Top-down approach (it begins at the top of the tree, i.e. all
observations fall into only one region)
How it works:
15
Eurostat
Tree-based estimation
Regression trees
Disadvantages:
▪ Overfitting
16
Eurostat
Tree-based estimation
Regression trees – tree pruning
σ|𝑇|
𝑚=1 σ𝑖: 𝑥𝑖 ∈ 𝑅𝑚 𝑦𝑖 − 𝑦
ො𝑅𝑚 2 + |T|
=0 T = T0
increases |T| will be minimised for a smaller subtree
18
Eurostat
Tree-based estimation
Classification trees
Gini index :
G= σK ො mk(1- 𝑝Ƹ mk)
k=1 p
Cross- entropy:
D= - σK ො mk*log(𝑝Ƹ mk)
k=1 p
20
Eurostat
Tree-based estimation
Bootstrap and bagging
1 𝐵
𝑓መ𝑏𝑎𝑔(𝑥)= σ𝑏=1 𝑓መ*b(x)
𝐵
Disadvantage:
▪ Difficult to interpret the resulting model
22
Eurostat
Tree-based estimation
Bootstrap and bagging
23
Eurostat
Tree-based estimation
Random forest
24
Eurostat
Tree-based estimation
Random forest
▪ For each b: 1..B the training set will only consider a subset
m of the available p predictors
25
Eurostat