You are on page 1of 17

Support Vector

Machines
Large Margin Classifier
Recap: logistic regression

If , we want ,
If , we want ,
Recap: logistic regression
Cost of example:

If (want ): If (want ):
Support vector machine
Logistic regression:

Support vector machine:


Support Vector Machine

-1 1 -1 1
If , we want (not just )
If , we want (not just )
SVM Cost Function

Whenever :
𝑐𝑐𝑐𝑐𝑐𝑐𝑡𝑡1 𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 = 𝑚𝑚𝑚𝑚𝑚𝑚 (0,1 − 𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )

-1 1

Whenever :

𝑐𝑐𝑐𝑐𝑐𝑐𝑡𝑡0 𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 = 𝑚𝑚𝑚𝑚𝑚𝑚 (0,1 − (−1)𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )

-1 1
SVM Cost Function

Whenever :
𝑐𝑐𝑐𝑐𝑐𝑐𝑡𝑡1 𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 = 𝑚𝑚𝑚𝑚𝑚𝑚 (0,1 − 𝑦𝑦 𝑖𝑖 (𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 ))

-1 1

Whenever 𝑦𝑦 (𝑖𝑖) = −1 :

𝑐𝑐𝑐𝑐𝑐𝑐𝑡𝑡0 𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 = 𝑚𝑚𝑚𝑚𝑚𝑚 (0,1 − (𝑦𝑦 𝑖𝑖 )𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )

-1 1
SVM Hypothesis and Cost Function
ℎ𝜃𝜃 𝑥𝑥 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝜃𝜃 𝑇𝑇 𝑥𝑥)

𝑚𝑚

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = min 𝐶𝐶 � max(0,1 − 𝑦𝑦 𝑖𝑖 (𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )) + 𝜃𝜃 2


𝜃𝜃
𝑖𝑖=1
SVM Margin
1. Intuitively this feels safest. Support
2. If we’ve made a small error in the Vectors
location of the boundary (it’s been are those
jolted in its perpendicular direction) this datapoints
Define the
margin of a gives us least chance of causing a that the
linear misclassification. margin
classifier as 3. Empirically it works very very well. pushes up
the width against.
that the
boundary
could be
increased by
before hitting
a datapoint.
SVM Margin

• Margin = 𝒚𝒚 𝜽𝜽 ⋅ 𝒙𝒙
• Margin is proportional to the
distance of a point from the
decision boundary
• Binary classifier makes error
Distance fromwhen Margin
decision is less than
boundary zero point x:
of a data
𝜃𝜃⋅𝑥𝑥 • Margin also gives a measure of
𝑖𝑖𝑖𝑖 𝜃𝜃 ⋅ 𝑥𝑥 > 0
𝜃𝜃 confidence 𝜃𝜃⋅𝑥𝑥
𝜃𝜃⋅𝑥𝑥
= 𝑦𝑦 𝜃𝜃
− 𝑖𝑖𝑖𝑖 𝜃𝜃 ⋅ 𝑥𝑥 < 0
𝜃𝜃
𝜃𝜃 ⋅ 𝑥𝑥 = 𝜃𝜃 𝑥𝑥 cos 𝑑𝑑
𝑥𝑥+ 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝜃𝜃 ⋅ 𝑥𝑥 > 0
𝑥𝑥− 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝜃𝜃 ⋅ 𝑥𝑥 < 0
SVM Margin

Constraint for all plus data points:


𝜃𝜃. 𝑥𝑥+ ≥ 1
For positive support vectors:
𝜃𝜃. 𝑥𝑥+ = 1
SVM Margin

Constraint for all minus data points:


𝜃𝜃. 𝑥𝑥− ≤ 1
For positive support vectors:
𝜃𝜃. 𝑥𝑥− = −1
SVM Margin

Constraint for all plus data points:


𝜃𝜃. 𝑥𝑥+ ≥ 1
For positive support vectors:
𝜃𝜃. 𝑥𝑥+ = 1

Constraint for all minus data points:


Distance between the positive and 𝜃𝜃. 𝑥𝑥− ≤ 1
negative support vectors: For negative support vectors:
𝜃𝜃 ⋅ 𝑥𝑥+ 𝜃𝜃 ⋅ 𝑥𝑥− 𝜃𝜃. 𝑥𝑥− = −1
𝐷𝐷 = −
𝜃𝜃 𝜃𝜃
2
𝐷𝐷 =
𝜃𝜃
SVM Margin
Let 𝑥𝑥+ and 𝑥𝑥− be the support vectors.
We want a classifier that maximizes the
2
distance between support vectors, 𝐷𝐷 = 𝜃𝜃
,
1
or minimizes ‖𝜃𝜃‖, subject to the constraint
2
that 𝑦𝑦 𝑖𝑖 𝜃𝜃 ⋅ 𝑥𝑥 𝑖𝑖
≥1
𝑚𝑚

𝐽𝐽 = min 𝐶𝐶 � max(0,1 − 𝑦𝑦 𝑖𝑖 (𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )) + 𝜃𝜃 2


𝜃𝜃
𝑖𝑖=1

Distance between the positive and Where 𝐶𝐶 is the trade off parameter
negative support vectors:
𝜃𝜃 ⋅ 𝑥𝑥+ 𝜃𝜃 ⋅ 𝑥𝑥− SVM is called large margin classifier
𝐷𝐷 = −
𝜃𝜃 𝜃𝜃
2
𝐷𝐷 =
𝜃𝜃
Large margin classifier in presence of outliers

x2

x1
Large margin classifier in presence of outliers

x2

x1
SVM parameters:
C( ). Large C: Lower bias, high variance.
Small C: Higher bias, low variance.

You might also like