Support Vector Machine

Support Vector Machine:
Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. Primarily, it
is used for Classification problems in Machine Learning.
SVMs are different from other classification algorithms because of the way they choose
the decision boundary that maximizes the distance from the nearest data points of all the
classes. The decision boundary created by SVMs is called the maximum margin
classifier or the maximum margin hyper plane.
How an SVM works?
A simple linear SVM classifier works by making a straight line between two classes.
That means all of the data points on one side of the line will represent a category and the
data points on the other side of the line will be put into a different category. This means
there can be an infinite number of lines to choose from.
The goal of the SVM algorithm is to create the best line or decision boundary. This best
decision boundary is called a hyperplane.
SVM chooses the Support vectors are data points that are closer to the hyperplane and
influence the position and orientation of the hyperplane. Using these support vectors, we
maximize the margin of the classifier.
Example:
Face Detection
It classifies the parts of the image as face and non-face. It contains training data of n x n
pixels with a two-class face (+1) and non-face (-1). Then it extracts features from each
pixel as face or non-face. Creates a square boundary around faces on the basis of pixel
brightness and classifies each image by using the same process.
SVM algorithm can be used for Face detection, image classification, text categorization, etc.
Note:
The dimensions of the hyperplane depend on the features present in the dataset, which means
if there are 2 features , then hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
Types of SVM
SVM algorithm is of two types:-

Linear SVM: When the data points are linearly separable into two classes, the data is
called linearly-separable data. We use the linear SVM classifier to classify such data.
Non-linear SVM When the data is not linearly separable, we use the non-linear SVM
classifier to separate the data points.
How does Linear SVM works?
The working of the SVM algorithm can be understood by using an example. Suppose
we have a dataset that has two tags (green and blue), and the dataset has two features x1
and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either
green or blue. Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the
below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane.
SVM algorithm finds the closest point of the lines from both the classes. These points
are called support vectors.
The distance between the vectors and the hyperplane is called as margin. And the goal
of SVM is to maximize this margin.The hyperplane with maximum margin is called
the optimal hyperplane.
What are Non-linear SVMs?
We cannot separate the above data points with a single line. Also if we gave more
than two classes, it is impossible to separate them with a single straight line. Consider
the following example:-
There are two classes of data points which cannot be separated by a straight line.
But, they can be separated by a circular hyperplane, hence we can introduce a

coordinate Z, with the help of X and Y, where Z=X2+Y2. Now after introducing the
third dimension, the graph changes to :-
Now the above depiction of data points is linearly separable and can be separated by a
straight-line hyperplane.
This representation is in 3-d with a z-axis. In 2-D, the graph looks like this:-
This is what, a non-linear SVM does! It hypothetically takes the data points to a higher
dimension, so that they are linearly separable in that dimension and then the algorithm
classifies them.
Note:
We are using kernel function to go for higher dimensions from lower dimension and
perform smooth calculations with the help of it.
A kernel helps to form the hyperplane in the higher dimension without raising the
complexity.
Kernels are a way to solve non-linear problems with the help of linear classifiers. This
is known as the kernel trick method. The kernel functions are used as parameters in
the SVM codes. They help to determine the shape of the hyperplane and decision
boundary.
SVM Regression
Support Vector Regression as the name suggests is a regression algorithm This method
works on the principle of the Support Vector Machine. SVR differs from SVM in the
way that SVM is a classifier that is used for predicting discrete categorical labels
while SVR is a regressor that is used for predicting continuous ordered variables.
Note:
In simple regression, the idea is to minimize the error rate.
where yᵢ is the target, wᵢ is the coefficient, and xᵢ is the predictor (feature).
while in SVR the idea is to fit the error inside a certain threshold which means, work of
SVR is to approximate the best value within a given margin called ε- tube.
Constraints:
this algorithm doesn’t work for all data points. The algorithm solved the objective
function as best as possible but some of the points still fall outside the margins. With the
help of slack variables ,we need to account for the possibility of errors that are larger
than ϵ.
The concept of slack variables is simple: for any value that falls outside of ϵ, we can
denote its deviation from the margin as ξ.
We know that these deviations have the potential to exist, but we would still like to
minimize them as much as possible. Thus, we can add these deviations to the objective
function.
Minimize:
Constraints:
Illustrative Example:
We now have an additional hyperparameter, C, that we can tune. As C increases, our

tolerance for points outside of ϵ also increases. As C approaches 0, the tolerance
approaches 0 and the equation collapses into the simplified (although sometimes
infeasible) one.
1.Hyperplane:It is a separation line between two data classes in a higher dimension

than the actual dimension. In SVR it is defined as the line that helps in predicting the
target value.
2. Kernel: In SVR the regression is performed at a higher dimension. To do that we

need a function that should map the data points into its higher dimension. This function
is termed as the kernel. Type of kernel used in SVR is Sigmoidal Kernel, Polynomial
Kernel, Gaussian Kernel, etc,
3. Boundary Lines: These are the two lines that are drawn around the hyperplane at a
distance of ε (epsilon). It is used to create a margin between the data points.
4. Support Vector: It is the vector that is used to define the hyperplane or we can say
that these are the extreme data points in the dataset which helps in defining the
hyperplane. These data points lie close to the boundary.
Note:
The objective of SVR is to fit as many data points as possible without violating the
margin. Note that the classification that is in SVM use of support vector was to define
the hyperplane but in SVR they are used to define the linear regression.

Support Vector Machine

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Support Vector Machine

Uploaded by

Copyright:

Available Formats

Support Vector Machine:

SVM algorithm is of two types:-

How does Linear SVM works?

But, they can be separated by a circular hyperplane, hence we can introduce a

In simple regression, the idea is to minimize the error rate.

where yᵢ is the target, wᵢ is the coefficient, and xᵢ is the predictor (feature).

We now have an additional hyperparameter, C, that we can tune. As C increases, our

1.Hyperplane:It is a separation line between two data classes in a higher dimension

2. Kernel: In SVR the regression is performed at a higher dimension. To do that we

You might also like