You are on page 1of 10

Basic of SVM Algorithm

Support Vector Machines is one of the most popular algorithms and is one of the best choices
for high-performance algorithms with a little tuning and it presents one of the most robust
prediction methods. SVM is implemented uniquely when compared to other ML algorithms.
An SVM training algorithm builds a model that assigns new examples to one category or the
other, making it a non-probabilistic binary linear classifier.
SVM is a Supervised Learning algorithm, which is used for Classification as well as
Regression problems. SVM is used for Classification problems in Machine Learning. In
addition to performing linear classification, SVMs can efficiently perform a non-linear
classification as well using a trick or parameter called as Kernel, which implicitly maps their
inputs into high-dimensional feature spaces. SVM is also an Unsupervised Learning algorithm.
When data is unlabelled, supervised learning is not possible, and an unsupervised learning
approach is required, which attempts to find natural clustering of the data to groups, and then
map new data to these formed groups.
The support-vector clustering algorithm, created by Hava Siegelmann and Vladimir Vapnik,
applies the statistics of support vectors, developed in the support vector machines algorithm, to
categorize unlabeled data, and is one of the most widely used clustering algorithms in
industrial applications. But here we will stick to the Supervised Learning model. A support
Vector Machine is a discriminative classifier formally defined by a separating hyper plane. In
other words, given labelled training data (supervised learning), the algorithm outputs an
optimal hyper plane which categorizes new examples. In two dimensional spaces, this hyper
plane is a line dividing a plane into two parts wherein each class lay in either side.
In simple terms, SVM as a model that represents the data points in space, mapped such that the
data points of the separate categories are divided by a clear gap that is as wide as possible. For
1 Dimensional data, the support vector classifier is a point. Similarly, for 2 Dimensional data,
the support vector classifier will be a line, and for 3-dimensional data, a support vector
classifier is a plane. In geometry, a hyper plane is a subspace whose dimension is one less than
that of its ambient space. If space is 3-dimensional then its hyperplanes are the 2-dimensional
planes, while if the space is 2-dimensional, its hyperplanes are the 1-dimensional lines. This
notion can be used in any general space in which the concept of the dimension of a subspace is
defined.
Mathematics of support vector machines
The key points for you to understand about support vector machines are:
1. Support vector machines find a hyperplane (that classifies data) by maximizing the
distance between the plane and nearest input data points (called support vectors).
2. This is done by minimizing the weight vector ‘w’ which is used to define the hyperplane.
3. Step 2 relies on optimization theory and certain assumptions (which are detailed out
below). 
We all know the equation of a hyper plane is w.x+b=0 where w is a vector normal to hyper
plane and b is an offset.

Figure1 Equation of a hyper plane


Let’s say we are given a set of two-dimensional feature vectors and asked to classify between
two classes. The natural question to ask is what is the best decision boundary for our data?
There are a near-infinite number of lines we can pass through the intervening space. SVMs
make the clear assumption that the best line is the one which maximizes the margin between
the classes. In other words, we want a line (or hyper plane in 2 < dimensions) which is as far as
possible from either side. And the boundary-defining points of each side are defined as the
ones which are bang on the minimum allowed margin from our line. So, this is what a typical
SVM solution looks like. It identified the boundary-defining data points (circled in image) and
found the maximum-margin boundary between the two classes.
That’s all good but how did we get to this solution? How do we get w and b? First, let’s
formalize our problem. We wanted a line which best separates our data. How do we define it?
We can define a line as:

Drawing this on a 2-dimensional space gives us all the possible solutions which satisfy this
equation. w defines the orientation and b the position in relation to the origin. Note that the
parameter vector w is perpendicular to the line itself. If a point, x, satisfies this condition then
we know it lies on the boundary itself. If the result is larger than 0 then it’s on the positive side
of the boundary, if it’s less than 0 then it’s on the negative side. This is now our decision rule.
Now, let’s add a few constraints. First, if we have two classes they’re denoted as yi = +1
or yi = -1. Second, I want every sample to have a value indicative of its class:

Figure 2 Boundary for positive and negative sample

This allows us to squeeze the above two expressions into one:


If the above happens to give us 0 then we know x lies exactly at margin’s length from the

boundary:
As a matter of fact, if we’re not interested in the sign we can re-write this as:

This will come in handy later on.


Next we need to get a handle on the distance for the absolute nearest point xi to the boundary,
otherwise the machinery of the method would break down. In order to do this we need to
imagine this line being comprised of some set of points in the feature space. Take any point, x,
from this set and then take the vector connecting it with xi. This gives us the vector x – xi. Bear
in mind that the Euclidean distance between these points – || x – xi || – isn’t what we want; this
is just helping us get there. What we want is the length of the component in the direction of
w, in other words the projection of (x – xi) on w. This is given by:

Step used in SVM algorithm


 Step 1: Load the important libraries. ...

 Step 2: Import dataset and extract the X variables and Y separately

 Step 3: Explore the data to figure out what they look like

 Step 4: Pre-process the data

 Step 5: Divide the dataset into train and test.

 Step 6: Initializing the SVM classifier model.

 Step 7: Fitting the SVM classifier model.

 Step 8: Coming up with predictions.


Outline for proposed approach

Start

Load the important libraries

Import dataset and extract the X


variables and Y separately

Explore the data to figure out what they


look like

Pre-process the data

Divide the dataset into train and test

Initializing the SVM classifier model

Fitting the SVM classifier model Testing data set

Fitting the SVM prediction


precess

Coming up with predictions


precess

Figure 3 Outline of proposed work


Illustrate with example
Consider a simple data set with

Table 3.1 Simple data set with eight objects

S No x y
1 1 1
2 2 1
3 4 0
4 5 1
5 5 -1
6 6 0
7 1 -1
8 2 -1

Quality of Life Before and After Chemotherapy


1.5

1 1 1 1
Quality of Life After Chemotherapy

0.5

0 0 0
0 1 2 3 4 5 6 7

-0.5

-1 -1 -1 -1

-1.5
Quality of Life Before Chemotherapy

Figure 4 Sample data set


Form the above graph it is clear that there are two classes and there are three points which can
be treated as support vectors names as S1, S2, and S3. In these three support vectors two are
negatively leveled (S1 and S2) and one is positively leveled (S3) point
S1=(2,1)

S2=(2,-1)

S3=(4,0)
Quality of Life Before and After Chemotherapy
1.5

1 S1
Quality of Life After Chemotherapy

0.5

0 S3
0 1 2 3 4 5 6 7

-0.5

-1 S2

-1.5
Quality of Life Before Chemotherapy

Figure 5 Selecting vectors form the give data set

~ 2
s 1= 1
1
()
( )
2
~
s 2= −1
1

()
4
~
s 3= 0
1

Now we need to found three parameters α 1,α 2and α 3based on three linear equations
~~ ~~ ~~
α 1 S1 S 1+ α 2 S 2 S 1+ α 3 S 3 S1 =−1

( )( ) ( )( ) ( )( )
2 2 2 2 4 2
α 1 1 1 + α 2 −1 1 +α 3 0 1 =−1
1 1 1 1 1 1
~~ ~~ ~ ~
α 1 S1 S 2+ α 2 S 2 S 2+ α 3 S 3 S 2=−1

( )( ) ( )( ) ( )( )
2 2 2 2 4 2
α 1 1 −1 + α 2 −1 −1 +α 3 0 −1 =−1
1 1 1 1 1 1

~~ ~~ ~~
α 1 S1 S 3 +α 2 S2 S 3+ α 3 S 3 S 3=1

( )( ) ( )( ) ( )( )
2 4 2 4 4 4
α 1 1 0 + α 2 −1 0 +α 3 0 0 =1
1 1 1 1 1 1

6 α 1+ 4 α 2 +9 α 3=−1

4 α 1 +6 α 2 +9 α 3=−1

9 α 1+ 9 α 2 +17 α 3=1

After solving these equation we got the value of α 1,α 2and α 3

α 1,α 2=-3.25 and α 3=3.5

Now we find the hyper plane which discriminate the positive class from the negative class is
given by
~ ~
w=∑ α i Si
~
i

() ( ) ()
2 2 4
~
w=α 1 1 + α 2 −1 +α 3 0 =1
1 1 1

() ( ) ()
2 2 4
~
w=(−3.25 ) 1 + (−3.25 ) −1 +(3.5) 0
1 1 1

After solving the equations we got hyper plane


()
1
~
w= 0
−3

Now the separating hyper plane equation

y=wx+b

()
w= 1
0
and b = -3

Quality of Life Before and After Chemotherapy


1.5
Quality of Life After Chemotherapy

1 S1

0.5

0 S3
0 1 2 3 4 5 6 7

-0.5

-1 S2

-1.5
Quality of Life Before Chemotherapy

Figure 6 Hyper plane for the give data set

You might also like