EUC1502 Module4 Machine-Learning

Module 4.
Non-linear
machine learning
econometrics:
Support Vector Machine
THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION
Eurostat
Machine-learning non-linear estimation methods: Support Vector Machines
Introduction
When the assumption of linearity is relaxed
Non-linear models
Polinomial regression
Generalized additive models
Decision Trees
Support Vector Machines
Etc.
2
Eurostat
Introduction: hyperplanes
Hyperplane:
In a p-dimensional space, an hyperplane is a “flat” affine
subspace of dimension p-1
p=2 line
p=3 plane
Definition:
p=2 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 =0 line equation
p-dimensions 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝 =0
3
Eurostat
Geometric interpretation:
If X= (X1, X2,…,Xp)T satisfies the above equation X lies on
the hyperplane
If
𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝 >0 or
𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑝 𝑋𝑝 <0
X lies on one side or the other of the hyperplane
We can think of a hyperplane as dividing p-dimensional

space into two halves
4
Eurostat
Example:
1.5
1+2X1+3X2>0
1.0
0.5
1+2X1+3X2=0
X2
0.0
−0.5
−1.0
1+2X1+3X2<0
−1.5
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

5
X1
Eurostat
Separating hyperplanes:
Define 𝑦1 , 𝑦2, … 𝑦𝑛 ∈ −1,1
𝑓 𝑥 ∗ = 𝛽0 + 𝛽1 𝑋i1 + 𝛽2 𝑋i2 + ⋯ + 𝛽𝑝 𝑋i𝑝 >0 yi =1

𝑓 𝑥 ∗ = 𝛽0 + 𝛽1 𝑋i1 + 𝛽2 𝑋i2 + ⋯ + 𝛽𝑝 𝑋i𝑝 <0 yi =-1
A test observation x* will be assigned a class (either 1 or

-1) depending on which side of the hyperplane is located
Magnitude of 𝑓 𝑥 ∗ : if 𝑓 𝑥 ∗ is far from 0, then 𝑥 ∗ lies far

from hyperplane
Reliable class assignment for 𝑥 ∗ 6
Eurostat
Problem:
If a hyperplane exists, then there exists an infinite number of
other hyperplanes that could separate the data
Possible solution:
select the one that is the farthest
from the data
maximal margin hyperplane
7
Eurostat
Maximal margin classifier
maximal margin hyperplane:

separating hyperplane for which
3
the margin is largest
2
X2
1
margin: minimal distance from distance
0
the observations to the
hyperplane −1
Note: similarity with fitting a −1 0
X1
1 2 3
regression hyperplane with least-

8
squares
Eurostat
maximal margin classifier:
3
A test observation will be
classified depending on which
2
side of the maximal margin
Support vectors
X2
hyperplane it lies
1
0
−1
−1 0 1 2 3
X1
9
Eurostat
• n training observations x1, x2,…,xn
• p dimensions
• y1, y2,…,yn ∈ 1, −1
• M width of margin
• Optimisation problem: Maximise M for 𝛽0 , 𝛽1 , 𝛽2 , … , 𝛽𝑝
• subject to:
𝑝
• σ𝑗=1 𝛽2 𝑗 = 1
• yi (𝛽0 + 𝛽1 𝑋i1 + 𝛽2 𝑋i2 + ⋯ + 𝛽𝑝 𝑋i𝑝 ) ≥M for
each i=1,..n
• Once maximised M, we classify a test observation

depending on the sign of
∗ ∗ ∗
𝑓 𝑥 ∗ = 𝛽0 + 𝛽1 𝑥 1 + 𝛽2 x 2 + ⋯ + 𝛽𝑝 x 𝑝 10
Eurostat
Problems:
▪ It is not robust to individual

observations
▪ it cannot be applied if no
separating hyperplane
exists
Solution:
Support vector classifier
11
Eurostat
▪ Based on hyperplane that does not perfectly separate

the two classes
▪ Soft margin (it can be violated by some of the training

observations)
▪ Robust to individual observations
▪ Better classification of most of the training observations
12
Eurostat
How it works:
• Optimisation problem:
• Maximise M for 𝛽0 , 𝛽1 , 𝛽2 , … , 𝛽𝑝 , 𝜖1 , … 𝜖𝑛
• subject to:
𝑝
• σ𝑗=1 𝛽𝑗2 = 1
• yi (𝛽0 + 𝛽1 𝑋i1 + 𝛽2 𝑋i2 + ⋯ + 𝛽𝑝 𝑋i𝑝 ) ≥M(1-𝜖𝑖 ),
for each i=1,..n
𝜖𝑖 ≥ 0, σ𝑛𝑖=1 𝜖𝑖 ≤ C
𝜖1 , … 𝜖𝑛 =
slack variables that allow individual observations to be on the
wrong side of the margin or the hyperplane
C= non-negative tuning parameter
13
Eurostat
𝜖𝑖 =0 ith observation is on correct side of the margin

𝜖𝑖 >0 ith observation is on wrong side of the margin
(violates the margin)
𝜖𝑖 >1 ith observation is on wrong side of hyperplane
C determines the number and severity of the violations to

the margin (and hyperplane) that are tolerated:
C=0 no accepted violations

C>0 accepted no more than C observations that can be
on wrong side of hyperplane
14
Eurostat
About C:
▪ Tuning parameter generally chosen via cross-validation
▪ It controls the bias-variance trade-off
▪ If C is small we want narrow margins rarely violated

highly fit to the data (low bias but high variance)
▪ If larger, the margin is wider and we allow more violations

lower fit to data (higher bias but lower variance)
15
Eurostat
C higher Lower C
16
Eurostat
Property:
▪ An observation that lies on the correct side of margin

does not affect the support vector classifier
▪ Only Support vectors affect the classifier
17
Eurostat
▪ Extension of the support vector classifier
▪ Method to enlarge the feature space to accommodate

non-linear boundaries
▪ They use quadratic, cubic, or even higher-order

polynomial functions of the predictors:
X1, X21, X2, X22,…, Xp, X2p
18
Eurostat
Maximise M
𝛽0 , 𝛽11 , 𝛽12 … 𝛽𝑝1 , 𝛽𝑝2 , 𝜖1 , … 𝜖𝑛
Subject to
2
𝑝 2
σ𝑗=1 ෍ 𝛽𝑗𝑘 = 1,
𝑘=1
𝑝 𝑝
yi (𝛽0 + σ𝑗=1 𝛽j1 xij + σ𝑗=1 𝛽j2 2x2ij) ≥M(1-𝜖𝑖 ),
𝜖𝑖 ≥ 0, σ𝑛𝑖=1 𝜖𝑖 ≤ C
19
Eurostat
▪ Introducing Kernels (function that quantify the similarity of

two observations):
𝑝
𝐾 (𝑥𝑖 , 𝑥𝑖′ ) =σ𝑗=1 𝑥𝑖𝑗 𝑥𝑖 ′𝑗 linear kernel
Inner product
𝑝
𝐾 (𝑥𝑖 , 𝑥𝑖′ ) =1+ (σ𝑗=1 𝑥𝑖𝑗 𝑥𝑖 ′ 𝑗 )𝑑 polynomial kernel
𝑝
𝐾 (𝑥𝑖 , 𝑥𝑖′ ) =exp(−γ σ𝑗=1 𝑥𝑖𝑗 𝑥𝑖 ′ 𝑗 2) radial kernel
20
Eurostat
▪ It combines a non-linear (polynomial) kernel with a

support vector classifier
▪ If the linear support vector classifier can be represented

by:
𝑓 𝑥 = 𝛽0 + ෍ 𝛼𝑖 𝑥, 𝑥𝑖 Inner product
𝑖∈𝑆
Space of the
indices for which Parameter that ≠ 0 only if
𝛼𝑖 ≠ 0 the training observation is
a support vector
Then the SVM:

Polynomial kernel
𝑓 𝑥 = 𝛽0 + ෍ 𝛼𝑖 𝐾(𝑥, 𝑥𝑖 )
21
𝑖∈𝑆
Eurostat
Examples:
4
4
2
2
X2
X2
0
0
−2
−2
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
X1 X1
Polynomial kernel Radial kernel

with d=3
22
Eurostat
References
“An Introduction to Statistical Learning” G. James, D. Witten, T.

Hastie, R. Tibshirani; Springer, 2013.
“The Elements of Statistical Learning: Data Mining, Inference,

and Prediction” T. Hastie, R. Tibshirani, J Friedman; Springer,
2009.
“Introducton to machine learning” E. Alpaydın; The MIT Press,

2010.
23
Eurostat

EUC1502 Module4 Machine-Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EUC1502 Module4 Machine-Learning

Uploaded by

Copyright:

Available Formats

Module 4.

When the assumption of linearity is relaxed

X lies on one side or the other of the hyperplane

We can think of a hyperplane as dividing p-dimensional

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Define 𝑦1 , 𝑦2, … 𝑦𝑛 ∈ −1,1

𝑓 𝑥 ∗ = 𝛽0 + 𝛽1 𝑋i1 + 𝛽2 𝑋i2 + ⋯ + 𝛽𝑝 𝑋i𝑝 >0 yi =1

A test observation x* will be assigned a class (either 1 or

Magnitude of 𝑓 𝑥 ∗ : if 𝑓 𝑥 ∗ is far from 0, then 𝑥 ∗ lies far

maximal margin hyperplane

maximal margin hyperplane:

Note: similarity with fitting a −1 0

regression hyperplane with least-

maximal margin classifier:

• Once maximised M, we classify a test observation

▪ It is not robust to individual

▪ Based on hyperplane that does not perfectly separate

▪ Soft margin (it can be violated by some of the training

▪ Robust to individual observations

▪ Better classification of most of the training observations

𝜖𝑖 =0 ith observation is on correct side of the margin

C determines the number and severity of the violations to

C=0 no accepted violations

▪ Tuning parameter generally chosen via cross-validation

▪ It controls the bias-variance trade-off

▪ If C is small we want narrow margins rarely violated

▪ If larger, the margin is wider and we allow more violations

▪ An observation that lies on the correct side of margin

▪ Only Support vectors affect the classifier

▪ Extension of the support vector classifier

▪ Method to enlarge the feature space to accommodate

▪ They use quadratic, cubic, or even higher-order

X1, X21, X2, X22,…, Xp, X2p

▪ Introducing Kernels (function that quantify the similarity of

▪ It combines a non-linear (polynomial) kernel with a

▪ If the linear support vector classifier can be represented

Then the SVM:

Polynomial kernel Radial kernel

“An Introduction to Statistical Learning” G. James, D. Witten, T.

“The Elements of Statistical Learning: Data Mining, Inference,

“Introducton to machine learning” E. Alpaydın; The MIT Press,

You might also like