Professional Documents
Culture Documents
MACHINE LEARNING
• Is it desirable?
NO.
• Our objective is to find a plane that has
the maximum margin.
Support Vector Machine (Cont.)
Large Margin Linear Classifier:
• Our objective is to find a
plane that has the maximum
margin, i.e the maximum
distance between data points
of both classes.
• The SVM algorithm will
select a line that not only
separates the two classes but
stays as far away from the
closest samples as possible.
• Hyperplanes are decision boundaries that help classify the data points.
• If the number of input features is 2, then the hyperplane is just a line.
• If the number of input features is 3, then the hyperplane becomes a two-dimensional plane.
• It becomes difficult to imagine when the number of features exceeds 3.
• The objective of the support vector machine algorithm is to find a hyperplane in an N-
dimensional space (N — the number of features) that distinctly classifies the data points.
Support Vector Machine (Cont.)
Hyperplanes and Support Vectors
• Support vectors are data points that are closer to the hyperplane and influence the position
and orientation of the hyperplane.
• Using these support vectors, we maximize the margin of the classifier.
• Deleting the support vectors will change the position of the hyperplane.
• These are the points that help us to build our SVM.
Support Vector Machine (Cont.)
Support Vector Machine (Cont.)
• Aim: Learn a maximum margin classifier
Support Vector Machine (Cont.)
Computing the margin width:
Support Vector Machine (Cont.)
Support Vector Machine (Cont.)
Lagrange Duality in brief:
Support Vector Machine (Cont.)
Lagrange Duality in brief:
Support Vector Machine (Cont.)
Lagrange Duality in brief:
Support Vector Machine (Cont.)
Lagrange Duality in brief:
Support Vector Machine (Cont.)
Lagrange Duality in brief:
Karush-Khun-Tucker (KKT) Conditions
0 x
0 x
• We can map it to a higher-dimensional space:
x2
0 x
Non-linear SVM (Cont.)
The Kernel trick
• Linear discriminant function:
g(x) = wTΦ(x) + b
• Dual form of SVM optimization problem is:
Non-linear SVM (Cont.)
The Kernel trick
• A kernel function is defined as a function that corresponds to a dot product of two feature
vectors in some expanded feature space.
K(xi, xj) = Φ(xi)T Φ(xj)
Non-linear SVM (Cont.)
The Kernel trick
Non-linear SVM (Cont.)
Examples of commonly used Kernels