Support Vector Machine

ARTIFICIAL INTELLIGENCE &
MACHINE LEARNING
Dr. K Shyam Sunder Reddy

Associate Professor
Department of Information Technology
Vasavi College of Engineering
shyamd4@staff.vce.ac.in
Support Vector Machine
Support Vector Machine (SVM) is a supervised machine learning algorithm capable of
performing classification, and regression.
Linear Discriminant Function (Linear Classifier):
• Let us consider a two class problem: We have to classes
C1 and C2.
• The linear discriminant function:
g(X) = wTX + b
• For every feature vector X, we compute the linear
function: wTX + b.
• If X lies on positive side of the hyper plane/line, then
we will have g(X) = wTX + b>0.
• If X belongs to the negative side of the hyper plane/line,
then we will have g(X) = wTX + b<0.
Support Vector Machine (Cont.)
• How do you classify these points using a
linear discriminant function in order to
minimize the error rate?
• To separate the two classes of data points,
there are many possible hyperplanes that
could be chosen.
• How do we know which line will do the
best job of classifying the data?
• The liner classifier with the maximum
margin is the best classifier.
• Is it desirable?
NO.
• Our objective is to find a plane that has
the maximum margin.
Large Margin Linear Classifier:
• Our objective is to find a
plane that has the maximum
margin, i.e the maximum
distance between data points
of both classes.
• The SVM algorithm will
select a line that not only
separates the two classes but
stays as far away from the
closest samples as possible.
d+ : the shortest distance to the closest +ve point.

d- : the shortest distance to the closest -ve point
Maximum margin: d+ + d-
• SVM constructs a hyperplane in multidimensional space to separate different classes.
• SVM generates optimal hyperplane in an iterative manner, which is used to minimize an error.
Support Vectors
• Support vectors are the data points, which are closest to the hyperplane. These points will
define the separating line better by calculating margins. These points are more relevant to the
construction of the classifier.
Hyperplane
• A hyperplane is a decision plane which separates between a set of objects having different
class memberships.
Margin
• A margin is a gap between the two lines on the closest class points. This is calculated as the
perpendicular distance from the line to support vectors or closest points. If the margin is
larger in between the classes, then it is considered a good margin, a smaller margin is a bad
margin.
Hyperplanes and Support Vectors
• Hyperplanes are decision boundaries that help classify the data points.
• If the number of input features is 2, then the hyperplane is just a line.
• If the number of input features is 3, then the hyperplane becomes a two-dimensional plane.
• It becomes difficult to imagine when the number of features exceeds 3.
• The objective of the support vector machine algorithm is to find a hyperplane in an N-
dimensional space (N — the number of features) that distinctly classifies the data points.
Hyperplanes and Support Vectors
• Support vectors are data points that are closer to the hyperplane and influence the position
and orientation of the hyperplane.
• Using these support vectors, we maximize the margin of the classifier.
• Deleting the support vectors will change the position of the hyperplane.
• These are the points that help us to build our SVM.
• Aim: Learn a maximum margin classifier
Computing the margin width:
Lagrange Duality in brief:
Karush-Khun-Tucker (KKT) Conditions
L(w*,α*,β*) =0; i=1,…m
L(w*,α*,β*) =0; i=1,…l
3. α*gi(w*) =0; i=1,…k

4. gi(w*) ≤0; i=1,…k
5. α* ≥0; i=1,…k
Solving the Optimization Problem for SVM:
Types of SVM
• Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separable data, which means
if a dataset cannot be classified by using a single straight line, then such data is termed as
non-linear data and classifier used is called as Non-linear SVM classifier.
Non-linear SVM
Dealing with non-linear and inseparable planes
• Some problems can’t be solved using linear hyperplane.
• The data points are not linearly separable.

• In case of non-linearly separable data, the simple SVM algorithm cannot be used. Rather, a
modified version of SVM, called Kernel SVM, is used.
Non-linear SVM (Cont.)
The Kernel trick
• SVM uses a kernel trick to transform the input space to a higher dimensional space (called
the feature space) using a special function called the kernel .
• Here, the kernel takes a low-dimensional input space and transforms it into a higher
dimensional space.
• In other words, you can say that it converts non-separable problem to separable problems by
adding more dimension to it.
• Use a linear model in the new high-dimensional feature space. The linear model in the feature
space corresponds to a non-linear model in the input space.
The Kernel trick
• Datasets that are linearly separable work out great:
0 x
• Linear classifier cannot learn some functions.

But what if the dataset is just too hard?
0 x
• We can map it to a higher-dimensional space:
x2
0 x
The Kernel trick
• Linear discriminant function:
g(x) = wTΦ(x) + b
• Dual form of SVM optimization problem is:
The Kernel trick
• A kernel function is defined as a function that corresponds to a dot product of two feature
vectors in some expanded feature space.
K(xi, xj) = Φ(xi)T Φ(xj)
The Kernel trick
Examples of commonly used Kernels

Support Vector Machine

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Support Vector Machine

Uploaded by

Copyright:

Available Formats

ARTIFICIAL INTELLIGENCE &

Dr. K Shyam Sunder Reddy

d+ : the shortest distance to the closest +ve point.

L(w,α,β*) =0; i=1,…m

L(w,α,β*) =0; i=1,…l

3. αgi(w) =0; i=1,…k

• The data points are not linearly separable.

• Linear classifier cannot learn some functions.

You might also like