Basic of SVM Algorithm

Basic of SVM Algorithm
Support Vector Machines is one of the most popular algorithms and is one of the best choices
for high-performance algorithms with a little tuning and it presents one of the most robust
prediction methods. SVM is implemented uniquely when compared to other ML algorithms.
An SVM training algorithm builds a model that assigns new examples to one category or the
other, making it a non-probabilistic binary linear classifier.
SVM is a Supervised Learning algorithm, which is used for Classification as well as
Regression problems. SVM is used for Classification problems in Machine Learning. In
addition to performing linear classification, SVMs can efficiently perform a non-linear
classification as well using a trick or parameter called as Kernel, which implicitly maps their
inputs into high-dimensional feature spaces. SVM is also an Unsupervised Learning algorithm.
When data is unlabelled, supervised learning is not possible, and an unsupervised learning
approach is required, which attempts to find natural clustering of the data to groups, and then
map new data to these formed groups.
The support-vector clustering algorithm, created by Hava Siegelmann and Vladimir Vapnik,
applies the statistics of support vectors, developed in the support vector machines algorithm, to
categorize unlabeled data, and is one of the most widely used clustering algorithms in
industrial applications. But here we will stick to the Supervised Learning model. A support
Vector Machine is a discriminative classifier formally defined by a separating hyper plane. In
other words, given labelled training data (supervised learning), the algorithm outputs an
optimal hyper plane which categorizes new examples. In two dimensional spaces, this hyper
plane is a line dividing a plane into two parts wherein each class lay in either side.
In simple terms, SVM as a model that represents the data points in space, mapped such that the
data points of the separate categories are divided by a clear gap that is as wide as possible. For
1 Dimensional data, the support vector classifier is a point. Similarly, for 2 Dimensional data,
the support vector classifier will be a line, and for 3-dimensional data, a support vector
classifier is a plane. In geometry, a hyper plane is a subspace whose dimension is one less than
that of its ambient space. If space is 3-dimensional then its hyperplanes are the 2-dimensional
planes, while if the space is 2-dimensional, its hyperplanes are the 1-dimensional lines. This
notion can be used in any general space in which the concept of the dimension of a subspace is
defined.
Mathematics of support vector machines
The key points for you to understand about support vector machines are:
1. Support vector machines find a hyperplane (that classifies data) by maximizing the
distance between the plane and nearest input data points (called support vectors).
2. This is done by minimizing the weight vector ‘w’ which is used to define the hyperplane.
3. Step 2 relies on optimization theory and certain assumptions (which are detailed out
below).
We all know the equation of a hyper plane is w.x+b=0 where w is a vector normal to hyper
plane and b is an offset.
Figure1 Equation of a hyper plane

Let’s say we are given a set of two-dimensional feature vectors and asked to classify between
two classes. The natural question to ask is what is the best decision boundary for our data?
There are a near-infinite number of lines we can pass through the intervening space. SVMs
make the clear assumption that the best line is the one which maximizes the margin between
the classes. In other words, we want a line (or hyper plane in 2 < dimensions) which is as far as
possible from either side. And the boundary-defining points of each side are defined as the
ones which are bang on the minimum allowed margin from our line. So, this is what a typical
SVM solution looks like. It identified the boundary-defining data points (circled in image) and
found the maximum-margin boundary between the two classes.
That’s all good but how did we get to this solution? How do we get w and b? First, let’s
formalize our problem. We wanted a line which best separates our data. How do we define it?
We can define a line as:
Drawing this on a 2-dimensional space gives us all the possible solutions which satisfy this
equation. w defines the orientation and b the position in relation to the origin. Note that the
parameter vector w is perpendicular to the line itself. If a point, x, satisfies this condition then
we know it lies on the boundary itself. If the result is larger than 0 then it’s on the positive side
of the boundary, if it’s less than 0 then it’s on the negative side. This is now our decision rule.
Now, let’s add a few constraints. First, if we have two classes they’re denoted as yi = +1
or yi = -1. Second, I want every sample to have a value indicative of its class:
Figure 2 Boundary for positive and negative sample
This allows us to squeeze the above two expressions into one:

If the above happens to give us 0 then we know x lies exactly at margin’s length from the
boundary:
As a matter of fact, if we’re not interested in the sign we can re-write this as:
This will come in handy later on.

Next we need to get a handle on the distance for the absolute nearest point xi to the boundary,
otherwise the machinery of the method would break down. In order to do this we need to
imagine this line being comprised of some set of points in the feature space. Take any point, x,
from this set and then take the vector connecting it with xi. This gives us the vector x – xi. Bear
in mind that the Euclidean distance between these points – || x – xi || – isn’t what we want; this
is just helping us get there. What we want is the length of the component in the direction of
w, in other words the projection of (x – xi) on w. This is given by:
Step used in SVM algorithm

 Step 1: Load the important libraries. ...
 Step 2: Import dataset and extract the X variables and Y separately
 Step 3: Explore the data to figure out what they look like
 Step 4: Pre-process the data
 Step 5: Divide the dataset into train and test.
 Step 6: Initializing the SVM classifier model.
 Step 7: Fitting the SVM classifier model.
 Step 8: Coming up with predictions.

Outline for proposed approach
Start
Load the important libraries
Import dataset and extract the X

variables and Y separately
Explore the data to figure out what they

look like
Pre-process the data
Divide the dataset into train and test
Initializing the SVM classifier model
Fitting the SVM classifier model Testing data set
Fitting the SVM prediction

precess
Coming up with predictions

precess
Figure 3 Outline of proposed work

Illustrate with example
Consider a simple data set with
Table 3.1 Simple data set with eight objects
S No x y
1 1 1
2 2 1
3 4 0
4 5 1
5 5 -1
6 6 0
7 1 -1
8 2 -1
Quality of Life Before and After Chemotherapy

1.5
1 1 1 1
Quality of Life After Chemotherapy
0.5
0 0 0
0 1 2 3 4 5 6 7
-0.5
-1 -1 -1 -1
-1.5
Quality of Life Before Chemotherapy
Figure 4 Sample data set

Form the above graph it is clear that there are two classes and there are three points which can
be treated as support vectors names as S1, S2, and S3. In these three support vectors two are
negatively leveled (S1 and S2) and one is positively leveled (S3) point
S1=(2,1)
S2=(2,-1)
S3=(4,0)
1.5
1 S1
0.5
0 S3
0 1 2 3 4 5 6 7
-0.5
-1 S2
-1.5
Figure 5 Selecting vectors form the give data set
~ 2
s 1= 1
1
()
( )
2
~
s 2= −1
1
()
4
~
s 3= 0
1
Now we need to found three parameters α 1,α 2and α 3based on three linear equations
~~ ~~ ~~
α 1 S1 S 1+ α 2 S 2 S 1+ α 3 S 3 S1 =−1
( )( ) ( )( ) ( )( )
2 2 2 2 4 2
α 1 1 1 + α 2 −1 1 +α 3 0 1 =−1
1 1 1 1 1 1
~~ ~~ ~ ~
α 1 S1 S 2+ α 2 S 2 S 2+ α 3 S 3 S 2=−1
( )( ) ( )( ) ( )( )
2 2 2 2 4 2
α 1 1 −1 + α 2 −1 −1 +α 3 0 −1 =−1
1 1 1 1 1 1
~~ ~~ ~~
α 1 S1 S 3 +α 2 S2 S 3+ α 3 S 3 S 3=1
( )( ) ( )( ) ( )( )
2 4 2 4 4 4
α 1 1 0 + α 2 −1 0 +α 3 0 0 =1
1 1 1 1 1 1
6 α 1+ 4 α 2 +9 α 3=−1
4 α 1 +6 α 2 +9 α 3=−1
9 α 1+ 9 α 2 +17 α 3=1
After solving these equation we got the value of α 1,α 2and α 3
α 1,α 2=-3.25 and α 3=3.5
Now we find the hyper plane which discriminate the positive class from the negative class is
given by
~ ~
w=∑ α i Si
~
i
() ( ) ()
2 2 4
~
w=α 1 1 + α 2 −1 +α 3 0 =1
1 1 1
() ( ) ()
2 2 4
~
w=(−3.25 ) 1 + (−3.25 ) −1 +(3.5) 0
1 1 1
After solving the equations we got hyper plane

()
1
~
w= 0
−3
Now the separating hyper plane equation
y=wx+b
()
w= 1
0
and b = -3

1.5
1 S1
0.5
0 S3
0 1 2 3 4 5 6 7
-0.5
-1 S2
-1.5
Figure 6 Hyper plane for the give data set

Basic of SVM Algorithm

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic of SVM Algorithm

Uploaded by

Copyright:

Available Formats

Basic of SVM Algorithm

Figure1 Equation of a hyper plane

Figure 2 Boundary for positive and negative sample

This allows us to squeeze the above two expressions into one:

This will come in handy later on.

Step used in SVM algorithm

 Step 2: Import dataset and extract the X variables and Y separately

 Step 4: Pre-process the data

 Step 5: Divide the dataset into train and test.

 Step 6: Initializing the SVM classifier model.

 Step 7: Fitting the SVM classifier model.

 Step 8: Coming up with predictions.

Load the important libraries

Import dataset and extract the X

Explore the data to figure out what they

Pre-process the data

Divide the dataset into train and test

Initializing the SVM classifier model

Fitting the SVM classifier model Testing data set

Fitting the SVM prediction

Coming up with predictions

Figure 3 Outline of proposed work

Table 3.1 Simple data set with eight objects

Quality of Life Before and After Chemotherapy

Figure 4 Sample data set

Figure 5 Selecting vectors form the give data set

After solving these equation we got the value of α 1,α 2and α 3

α 1,α 2=-3.25 and α 3=3.5

After solving the equations we got hyper plane

Now the separating hyper plane equation

Quality of Life Before and After Chemotherapy

Figure 6 Hyper plane for the give data set

You might also like