Support Vector Machines: Large Margin Classifier

Support Vector
Machines
Large Margin Classifier
Recap: logistic regression
If , we want ,
If , we want ,
Recap: logistic regression
Cost of example:
If (want ): If (want ):
Support vector machine
Logistic regression:
Support vector machine:

Support Vector Machine
-1 1 -1 1
If , we want (not just )
If , we want (not just )
SVM Cost Function
Whenever :
𝑐𝑐𝑐𝑐𝑐𝑐𝑡𝑡1 𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 = 𝑚𝑚𝑚𝑚𝑚𝑚 (0,1 − 𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )
-1 1
Whenever :
𝑐𝑐𝑐𝑐𝑐𝑐𝑡𝑡0 𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 = 𝑚𝑚𝑚𝑚𝑚𝑚 (0,1 − (−1)𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )
-1 1
SVM Cost Function
Whenever :
𝑐𝑐𝑐𝑐𝑐𝑐𝑡𝑡1 𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 = 𝑚𝑚𝑚𝑚𝑚𝑚 (0,1 − 𝑦𝑦 𝑖𝑖 (𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 ))
-1 1
Whenever 𝑦𝑦 (𝑖𝑖) = −1 :
𝑐𝑐𝑐𝑐𝑐𝑐𝑡𝑡0 𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 = 𝑚𝑚𝑚𝑚𝑚𝑚 (0,1 − (𝑦𝑦 𝑖𝑖 )𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )
-1 1
SVM Hypothesis and Cost Function
ℎ𝜃𝜃 𝑥𝑥 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝜃𝜃 𝑇𝑇 𝑥𝑥)
𝑚𝑚
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = min 𝐶𝐶 � max(0,1 − 𝑦𝑦 𝑖𝑖 (𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )) + 𝜃𝜃 2

𝜃𝜃
𝑖𝑖=1
SVM Margin
1. Intuitively this feels safest. Support
2. If we’ve made a small error in the Vectors
location of the boundary (it’s been are those
jolted in its perpendicular direction) this datapoints
Define the
margin of a gives us least chance of causing a that the
linear misclassification. margin
classifier as 3. Empirically it works very very well. pushes up
the width against.
that the
boundary
could be
increased by
before hitting
a datapoint.
SVM Margin
• Margin = 𝒚𝒚 𝜽𝜽 ⋅ 𝒙𝒙
• Margin is proportional to the
distance of a point from the
decision boundary
• Binary classifier makes error
Distance fromwhen Margin
decision is less than
boundary zero point x:
of a data
𝜃𝜃⋅𝑥𝑥 • Margin also gives a measure of
𝑖𝑖𝑖𝑖 𝜃𝜃 ⋅ 𝑥𝑥 > 0
𝜃𝜃 confidence 𝜃𝜃⋅𝑥𝑥
𝜃𝜃⋅𝑥𝑥
= 𝑦𝑦 𝜃𝜃
− 𝑖𝑖𝑖𝑖 𝜃𝜃 ⋅ 𝑥𝑥 < 0
𝜃𝜃
𝜃𝜃 ⋅ 𝑥𝑥 = 𝜃𝜃 𝑥𝑥 cos 𝑑𝑑
𝑥𝑥+ 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝜃𝜃 ⋅ 𝑥𝑥 > 0
𝑥𝑥− 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝜃𝜃 ⋅ 𝑥𝑥 < 0
SVM Margin
Constraint for all plus data points:

𝜃𝜃. 𝑥𝑥+ ≥ 1
For positive support vectors:
𝜃𝜃. 𝑥𝑥+ = 1
SVM Margin
Constraint for all minus data points:

𝜃𝜃. 𝑥𝑥− ≤ 1
𝜃𝜃. 𝑥𝑥− = −1
SVM Margin
Constraint for all plus data points:

𝜃𝜃. 𝑥𝑥+ ≥ 1
𝜃𝜃. 𝑥𝑥+ = 1
Constraint for all minus data points:

Distance between the positive and 𝜃𝜃. 𝑥𝑥− ≤ 1
negative support vectors: For negative support vectors:
𝜃𝜃 ⋅ 𝑥𝑥+ 𝜃𝜃 ⋅ 𝑥𝑥− 𝜃𝜃. 𝑥𝑥− = −1
𝐷𝐷 = −
𝜃𝜃 𝜃𝜃
2
𝐷𝐷 =
𝜃𝜃
SVM Margin
Let 𝑥𝑥+ and 𝑥𝑥− be the support vectors.
We want a classifier that maximizes the
2
distance between support vectors, 𝐷𝐷 = 𝜃𝜃
,
1
or minimizes ‖𝜃𝜃‖, subject to the constraint
2
that 𝑦𝑦 𝑖𝑖 𝜃𝜃 ⋅ 𝑥𝑥 𝑖𝑖
≥1
𝑚𝑚
𝐽𝐽 = min 𝐶𝐶 � max(0,1 − 𝑦𝑦 𝑖𝑖 (𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )) + 𝜃𝜃 2

𝜃𝜃
𝑖𝑖=1
Distance between the positive and Where 𝐶𝐶 is the trade off parameter
negative support vectors:
𝜃𝜃 ⋅ 𝑥𝑥+ 𝜃𝜃 ⋅ 𝑥𝑥− SVM is called large margin classifier
𝐷𝐷 = −
𝜃𝜃 𝜃𝜃
2
𝐷𝐷 =
𝜃𝜃
Large margin classifier in presence of outliers
x2
x1
Large margin classifier in presence of outliers
x2
x1
SVM parameters:
C( ). Large C: Lower bias, high variance.
Small C: Higher bias, low variance.

Support Vector Machines: Large Margin Classifier

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Support Vector Machines: Large Margin Classifier

Uploaded by

Copyright:

Available Formats

Support Vector

Support vector machine:

𝑐𝑐𝑐𝑐𝑐𝑐𝑡𝑡0 𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 = 𝑚𝑚𝑚𝑚𝑚𝑚 (0,1 − (−1)𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )

𝑐𝑐𝑐𝑐𝑐𝑐𝑡𝑡0 𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 = 𝑚𝑚𝑚𝑚𝑚𝑚 (0,1 − (𝑦𝑦 𝑖𝑖 )𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = min 𝐶𝐶 � max(0,1 − 𝑦𝑦 𝑖𝑖 (𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )) + 𝜃𝜃 2

Constraint for all plus data points:

Constraint for all minus data points:

Constraint for all plus data points:

Constraint for all minus data points:

𝐽𝐽 = min 𝐶𝐶 � max(0,1 − 𝑦𝑦 𝑖𝑖 (𝜃𝜃 𝑇𝑇 𝑥𝑥 𝑖𝑖 )) + 𝜃𝜃 2

You might also like