You are on page 1of 23

Module 4.

Non-linear
machine learning
econometrics:
Support Vector Machine

THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION

Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Introduction

When the assumption of linearity is relaxed

Non-linear models
Polinomial regression
Generalized additive models
Decision Trees
Support Vector Machines
Etc.

2
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Introduction: hyperplanes

Hyperplane:
In a p-dimensional space, an hyperplane is a “flat” affine
subspace of dimension p-1
p=2 line
p=3 plane

Definition:
p=2 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 =0 line equation
p-dimensions 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝 =0

3
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Introduction: hyperplanes

Geometric interpretation:
If X= (X1, X2,…,Xp)T satisfies the above equation X lies on
the hyperplane

If
𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝 >0 or
𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝 <0

X lies on one side or the other of the hyperplane

We can think of a hyperplane as dividing p-dimensional


space into two halves
4
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Introduction: hyperplanes

Example:
1.5

1+2X1+3X2>0
1.0
0.5

1+2X1+3X2=0
X2
0.0
−0.5
−1.0

1+2X1+3X2<0
−1.5

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5


5
X1
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Introduction: hyperplanes

Separating hyperplanes:

Define 𝑦1 , 𝑦2, … 𝑦𝑛 ∈ −1,1

𝑓 𝑥 ∗ = 𝛽0 + 𝛽1 𝑋i1 + 𝛽2 𝑋i2 + ⋯ + 𝛽𝑝 𝑋i𝑝 >0 yi =1


𝑓 𝑥 ∗ = 𝛽0 + 𝛽1 𝑋i1 + 𝛽2 𝑋i2 + ⋯ + 𝛽𝑝 𝑋i𝑝 <0 yi =-1

A test observation x* will be assigned a class (either 1 or


-1) depending on which side of the hyperplane is located

Magnitude of 𝑓 𝑥 ∗ : if 𝑓 𝑥 ∗ is far from 0, then 𝑥 ∗ lies far


from hyperplane
Reliable class assignment for 𝑥 ∗ 6
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Introduction: hyperplanes

Problem:
If a hyperplane exists, then there exists an infinite number of
other hyperplanes that could separate the data

Possible solution:
select the one that is the farthest
from the data

maximal margin hyperplane

7
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Maximal margin classifier

maximal margin hyperplane:


separating hyperplane for which

3
the margin is largest

2
X2
1
margin: minimal distance from distance

0
the observations to the
hyperplane −1

Note: similarity with fitting a −1 0

X1
1 2 3

regression hyperplane with least-


8
squares
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Maximal margin classifier

maximal margin classifier:

3
A test observation will be
classified depending on which

2
side of the maximal margin
Support vectors

X2
hyperplane it lies

1
0
−1

−1 0 1 2 3

X1

9
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Maximal margin classifier
• n training observations x1, x2,…,xn
• p dimensions
• y1, y2,…,yn ∈ 1, −1
• M width of margin
• Optimisation problem: Maximise M for 𝛽0 , 𝛽1 , 𝛽2 , … , 𝛽𝑝
• subject to:
𝑝
• σ𝑗=1 𝛽2 𝑗 = 1
• yi (𝛽0 + 𝛽1 𝑋i1 + 𝛽2 𝑋i2 + ⋯ + 𝛽𝑝 𝑋i𝑝 ) ≥M for
each i=1,..n

• Once maximised M, we classify a test observation


depending on the sign of
∗ ∗ ∗
𝑓 𝑥 ∗ = 𝛽0 + 𝛽1 𝑥 1 + 𝛽2 x 2 + ⋯ + 𝛽𝑝 x 𝑝 10
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Maximal margin classifier

Problems:

▪ It is not robust to individual


observations

▪ it cannot be applied if no
separating hyperplane
exists

Solution:
Support vector classifier
11
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Support vector classifier

▪ Based on hyperplane that does not perfectly separate


the two classes

▪ Soft margin (it can be violated by some of the training


observations)

▪ Robust to individual observations

▪ Better classification of most of the training observations

12
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Support vector classifier
How it works:
• Optimisation problem:

• Maximise M for 𝛽0 , 𝛽1 , 𝛽2 , … , 𝛽𝑝 , 𝜖1 , … 𝜖𝑛
• subject to:
𝑝
• σ𝑗=1 𝛽𝑗2 = 1
• yi (𝛽0 + 𝛽1 𝑋i1 + 𝛽2 𝑋i2 + ⋯ + 𝛽𝑝 𝑋i𝑝 ) ≥M(1-𝜖𝑖 ),
for each i=1,..n

𝜖𝑖 ≥ 0, σ𝑛𝑖=1 𝜖𝑖 ≤ C

𝜖1 , … 𝜖𝑛 =
slack variables that allow individual observations to be on the
wrong side of the margin or the hyperplane
C= non-negative tuning parameter
13
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Support vector classifier

𝜖𝑖 =0 ith observation is on correct side of the margin


𝜖𝑖 >0 ith observation is on wrong side of the margin
(violates the margin)
𝜖𝑖 >1 ith observation is on wrong side of hyperplane

C determines the number and severity of the violations to


the margin (and hyperplane) that are tolerated:

C=0 no accepted violations


C>0 accepted no more than C observations that can be
on wrong side of hyperplane

14
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Support vector classifier

About C:

▪ Tuning parameter generally chosen via cross-validation

▪ It controls the bias-variance trade-off

▪ If C is small we want narrow margins rarely violated


highly fit to the data (low bias but high variance)

▪ If larger, the margin is wider and we allow more violations


lower fit to data (higher bias but lower variance)

15
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Support vector classifier

C higher Lower C

16
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Support vector classifier

Property:

▪ An observation that lies on the correct side of margin


does not affect the support vector classifier

▪ Only Support vectors affect the classifier

17
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Support Vector Machines

▪ Extension of the support vector classifier

▪ Method to enlarge the feature space to accommodate


non-linear boundaries

▪ They use quadratic, cubic, or even higher-order


polynomial functions of the predictors:

X1, X21, X2, X22,…, Xp, X2p

18
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Support Vector Machines

Maximise M
𝛽0 , 𝛽11 , 𝛽12 … 𝛽𝑝1 , 𝛽𝑝2 , 𝜖1 , … 𝜖𝑛

Subject to

2
𝑝 2
σ𝑗=1 ෍ 𝛽𝑗𝑘 = 1,
𝑘=1

𝑝 𝑝
yi (𝛽0 + σ𝑗=1 𝛽j1 xij + σ𝑗=1 𝛽j2 2x2ij) ≥M(1-𝜖𝑖 ),

𝜖𝑖 ≥ 0, σ𝑛𝑖=1 𝜖𝑖 ≤ C

19
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Support Vector Machines

▪ Introducing Kernels (function that quantify the similarity of


two observations):
𝑝
𝐾 (𝑥𝑖 , 𝑥𝑖′ ) =σ𝑗=1 𝑥𝑖𝑗 𝑥𝑖 ′𝑗 linear kernel
Inner product
𝑝
𝐾 (𝑥𝑖 , 𝑥𝑖′ ) =1+ (σ𝑗=1 𝑥𝑖𝑗 𝑥𝑖 ′ 𝑗 )𝑑 polynomial kernel

𝑝
𝐾 (𝑥𝑖 , 𝑥𝑖′ ) =exp(−γ σ𝑗=1 𝑥𝑖𝑗 𝑥𝑖 ′ 𝑗 2) radial kernel

20
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Support Vector Machines

▪ It combines a non-linear (polynomial) kernel with a


support vector classifier

▪ If the linear support vector classifier can be represented


by:
𝑓 𝑥 = 𝛽0 + ෍ 𝛼𝑖 𝑥, 𝑥𝑖 Inner product
𝑖∈𝑆
Space of the
indices for which Parameter that ≠ 0 only if
𝛼𝑖 ≠ 0 the training observation is
a support vector

Then the SVM:


Polynomial kernel
𝑓 𝑥 = 𝛽0 + ෍ 𝛼𝑖 𝐾(𝑥, 𝑥𝑖 )
21
𝑖∈𝑆
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Support Vector Machines

Examples:
4

4
2

2
X2

X2
0

0
−2

−2
−4

−4

−4 −2 0 2 4 −4 −2 0 2 4

X1 X1

Polynomial kernel Radial kernel


with d=3
22
Eurostat
References

“An Introduction to Statistical Learning” G. James, D. Witten, T.


Hastie, R. Tibshirani; Springer, 2013.

“The Elements of Statistical Learning: Data Mining, Inference,


and Prediction” T. Hastie, R. Tibshirani, J Friedman; Springer,
2009.

“Introducton to machine learning” E. Alpaydın; The MIT Press,


2010.

23
Eurostat

You might also like