You are on page 1of 19

Support Vector Machines

What is Support Vector Machines?

• A support vector machine (SVM) is a supervised machine learning


model that used for classification, regression and outliers detection.
• However, primarily, it is used for Classification problems in Machine
Learning.
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that
we can easily put the new data point in the correct category in the
future.
• This best decision boundary is called a hyperplane.
How Does SVM Work?
• The basics of Support Vector Machines and how it works are best understood
with a simple example. Let’s imagine we have two tags: red and blue, and our
data has two features: x and y.
• We want a classifier that, given a pair of (x,y) coordinates, outputs if it’s
either red or blue. We plot our already labeled training data on a plane:
• A support vector machine takes these data points and outputs the
hyperplane (which in two dimensions it’s simply a line) that best
separates the tags.
• This line is the decision boundary: anything that falls to one side of it
we will classify as blue, and anything that falls to the other as red.
• But, what exactly is the best hyperplane?

• For SVM, it’s the one that maximizes


the margins from both tags.
In other words:
• the hyperplane (remember it's a line in this case)
whose distance to the nearest element
of each tag is the largest.
Mathematical formulation

• A support vector machine constructs a hyper-plane or set of hyper-planes in a high


or infinite dimensional space, which can be used for classification, regression or
other tasks.
• The good separation is achieved by the hyper-plane that has the largest distance
to the nearest training data points of any class (so-called functional margin),
since in general the larger the margin the lower the generalization error of the
classifier.
• The figure below shows the decision function for a linearly separable problem,
with three samples on the margin boundaries, called “support vectors”:
• SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is termed
as Support Vector Machine. 
Kernel Function - Trick
• Kernel Function is a method used to take data as input and transform
it into the required form of processing data.
• “Kernel” is used due to a set of mathematical functions used in
Support Vector Machine providing the window to manipulate the
data.
• Kernel Function generally transforms the training set of data so that
a non-linear decision surface is able to transform to a linear
equation in a higher number of dimension spaces.
• Basically, It returns the inner product between two points in a
standard feature dimension. 
Kernel Function - Trick
• SVM algorithms use a set of mathematical functions that are defined
as the kernel.
• The function of kernel is to take data as input and transform it into
the required form.
• Different SVM algorithms use different types of kernel functions.
These functions can be different types. For example linear, nonlinear,
polynomial, radial basis function (RBF), and sigmoid.
Kernel Function
• Introduce Kernel functions for sequence data, graphs, text, images, as
well as vectors.
• The most used type of kernel function is RBF. Because it has localized
and finite response along the entire x-axis.
• The kernel functions return the inner product between two points in a
suitable feature space. Thus by defining a notion of similarity, with
little computational cost even in very high-dimensional spaces.
Examples of SVM Kernels

• Polynomial kernel : It is popular in image processing.


• Gaussian kernel: It is a general-purpose kernel; used when there is no prior
knowledge about the data.
• Gaussian radial basis function (RBF) : It is a general-purpose kernel; used when
there is no prior knowledge about the data.
• Hyperbolic tangent kernel: We can use it in neural networks.
• Sigmoid kernel :We can use it as the proxy for neural networks.
• Linear splines kernel in one-dimension: It is useful when dealing with large sparse
data vectors. It is often used in text categorization. The splines kernel also
performs well in regression problems
The advantages of support vector machines are:

• Effective in high dimensional spaces.

• Still effective in cases where number of dimensions is greater than the number of
samples.

• Uses a subset of training points in the decision function (called support vectors), so it
is also memory efficient.

• Versatile: different Kernel functions can be specified for the decision function.
Common kernels are provided, but it is also possible to specify custom kernels.
The disadvantages of support vector machines include:

• If the number of features is much greater than the number of


samples, avoid over-fitting in choosing Kernel functions and
regularization term is crucial.

• SVMs do not directly provide probability estimates, these are


calculated using an expensive five-fold cross-validation (see Scores
and probabilities, below).

You might also like