You are on page 1of 48

Deepika Kamboj

Support Vector
Machine
Deepika Kamboj

Support Vector Machine


• SVM is a model, which can do linear classification as well as regression.

• SVM is based on the concept of a surface, called a hyperplane, which draws a boundary between
data instances plotted in the multi-dimensional feature space.

• The output prediction of an SVM is one of two conceivable classes which are already defined in
the training data.

NOTE: Don’t get confused between SVM and logistic regression. Both the algorithms try to find the
best hyperplane, but the main difference is logistic regression is a probabilistic approach whereas
support vector machine is based on statistical approaches.
Deepika Kamboj

Support
Vector
Machine
Deepika Kamboj

Classification using hyperplanes


Deepika Kamboj

Support Vectors

• Support vectors are the data points (representing classes), the critical component in a
data set, which are near the identified set of lines (hyperplane).

• Margin: it is the distance between the hyperplane and the observations closest to the
hyperplane (support vectors). In SVM large margin is considered a good margin.

• There are two types of margins hard margin and soft margin.
Deepika Kamboj

Hard Margin & Soft Margin


Deepika Kamboj

Hard Margin & Soft Margin

Hard Margin: A hard margin SVM seeks to find a decision boundary that completely separates
two classes of data, with no data points allowed in the margin or on the wrong side of the
boundary. It is more rigid and works well when the data is linearly separable and there are no
outliers.

Soft Margin: A soft margin SVM allows for a margin that may contain some misclassified data
points or outliers. It introduces a trade-off between maximizing the margin and minimizing
classification errors. Soft margin SVMs are more flexible and can handle cases where the data is
not perfectly separable.
Deepika Kamboj

Support Vector
Machine
• There may be many possible hyperplanes,
and one of the challenges with the SVM
model is to find the optimal hyperplane.

• A hard margin in terms of SVM means that


an SVM model is inflexible in classification
and tries to work exceptionally fit in the
training set, thereby causing overfitting.
Deepika Kamboj

1. The hyperplane should segregate the


data instances belonging to the two

Identifying classes in the best possible way (Dual

the correct Formulation).

hyperplane 2. It should maximize the distances


between the nearest data points of both
in SVM the classes, i.e., maximize the margin.
Deepika Kamboj

• Doing so helps us in achieving more

Identifying generalization and hence less number of issues

the correct in the classification of unknown data.

hyperplane • Modelling a problem using SVM is nothing


but identifying the support vectors and MMH
in SVM corresponding to the problem space.
Deepika Kamboj

Dot Product

A . B = |A| cosθ * |B|

In SVM
A.B = |A| cosθ * unit vector of B
Deepika Kamboj

Dot Product
Here A and B are 2 vectors, to find the dot product between these 2 vectors we first find the
magnitude of both the vectors and to find magnitude we use the Pythagorean theorem or the
distance formula.
After finding the magnitude we simply multiply it with the cosine angle between both the vectors.
Mathematically it can be written as:
A . B = |A| cosθ * |B|
Where |A| cosθ is the projection of A on B and |B| is the magnitude of vector B
Now in SVM we just need the projection of A not the magnitude of B. To just get the projection we
can simply take the unit vector of B because it will be in the direction of B but its magnitude will
be 1. Hence now the equation becomes:
A.B = |A| cosθ * unit vector of B
Deepika Kamboj

Use of Dot Product


in SVM
Deepika Kamboj

Consider a random point X and we


want to know whether it lies on the

Use of Dot right side of the plane or the left side


of the plane (positive or negative).

Product in To find this first we assume this point


is a vector (X) and then we make a
SVM vector (w) which is perpendicular to
the hyperplane. Let’s say the distance
of vector w from origin to decision
boundary is ‘c’. Now we take the
projection of X vector on w.
Deepika Kamboj

Use of Dot Product


in SVM
Deepika Kamboj

Use of Dot Product in SVM

We already know that projection of any vector or another vector is called dot-
product. Hence, we take the dot product of x and w vectors. If the dot product is
greater than ‘c’ then we can say that the point lies on the right side. If the dot
product is less than ‘c’ then the point is on the left side and if the dot product is
equal to ‘c’ then the point lies on the decision boundary.
Deepika Kamboj

Dot product of x and w vectors


Deepika Kamboj

Dot product of x and w vectors

You must be having this doubt that why did we take this perpendicular vector w
to the hyperplane? So what we want is the distance of vector X from the decision
boundary and there can be infinite points on the boundary to measure the
distance from. So that’s why we come to standard, we simply take perpendicular
and use it as a reference and then take projections of all the other data points on
this perpendicular vector and then compare the distance.
Deepika Kamboj

Decision
Rule for –ve
and +ve
points
Deepika Kamboj

Margin in
Support Vector
Machine
Deepika Kamboj

Maximize Margin
Deepika Kamboj

Optimization Function and its Constraints

For all red points:


For all green points:

Rather than taking 2 constraints forward, we’ll now try to simplify these two
constraints into 1. We assume that green classes have y=-1 and red classes have y=1.
Deepika Kamboj

Optimization Function and its Constraints

Hard SVM

We have now found our optimization function but there is a catch here that we don’t find this type of perfectly
linearly separable data in the industry, there is hardly any case we get this type of data and hence we fail to use
this condition we proved here. The type of problem which we just studied is called Hard Margin SVM now we
shall study soft margin which is similar to this but there are few more interesting tricks we use in Soft Margin
SVM.
Deepika Kamboj

Optimization Function and its Constraints

Soft SVM

To make a soft margin equation we add 2 more terms to this equation which
is zeta and multiply that by a hyperparameter ‘c’
Deepika Kamboj

Optimization Function and its Constraints

For all the correctly classified points our zeta will be equal to 0 and for all
the incorrectly classified points the zeta is simply the distance of that particular
point from its correct hyperplane that means if we see the wrongly classified
green points the value of zeta will be the distance of these points from L1
hyperplane and for wrongly classified redpoint zeta will be the distance of that
point from L2 hyperplane.
Deepika Kamboj
Types of Support Vector
Machine Algorithms

• Linear SVM: When the data is perfectly linearly separable only then we can
use Linear SVM.

• Non-Linear SVM: When the data is not linearly separable then we can use
Non-Linear SVM. Here we use some advanced techniques like kernel tricks to
classify them.
Deepika Kamboj

Types of Support Vector Machine Algorithms


Deepika Kamboj

Linearly inseparable classes


Deepika Kamboj

• To deal with above non-linearly separable


data points SVM uses the idea of “Kernels”.

Kernels • Kernel is a function that is used to map the


input data into a higher-dimensional space
where the data is easier to classify using a
linear boundary.
Deepika Kamboj

A polynomial kernel with degree 2 has been applied in


transforming the data from 1-dimensional to 2-
dimensional data
Deepika Kamboj

In the 2-dimensional case, the kernel trick is applied as below


with the polynomial kernel with degree 2
Deepika Kamboj
• The main idea behind kernel methods in SVMs is to map
the input data into a higher dimensional feature space,
where it is possible to find a hyperplane that separates the
classes with maximum margin.

• The mapping from the original input space to the feature


Kernel space is performed using a kernel function.

Methods • Types of kernel functions :

ü Linear kernel

ü Polynomial kernel

ü Radial basis function (RBF) kernel

ü Sigmoid kernel
Deepika Kamboj

Kernel
Methods
Deepika Kamboj

Linear Kernel
• The linear kernel is the simplest and most straightforward kernel.

• It calculates the dot product of the input data points in their original feature space.

• This kernel is suitable when the data is already linearly separable, meaning a straight line
can effectively separate the classes.

Linear Kernel Equation: K(X1,X2)=X1⋅X2


Deepika Kamboj

Polynomial Kernel
• The polynomial kernel transforms data into a higher-dimensional space using polynomial
functions.

• The kernel has a hyperparameter, usually denoted as "d," which represents the degree of
the polynomial.

• Higher degrees introduce more non-linearity.

Polynomial Kernel Equation: K(X1,X2)=(X1⋅X2+c)d


Deepika Kamboj

Radial Basis Function (RBF) Kernel


• The RBF kernel is also known as the Gaussian kernel.
• It transforms data into an infinite-dimensional space by applying a Gaussian (radial) basis
function.
• The RBF kernel is highly flexible and can capture complex, non-linear relationships in the
data.
• It has a hyperparameter, "γ" (gamma), that controls the shape of the kernel and influences
the smoothness of the decision boundary.

RBF Kernel Equation: K(X1,X2)=exp(−γ⋅∥X1−X2∥2)


Deepika Kamboj

Sigmoid Kernel
• The sigmoid kernel is inspired by the sigmoid activation function used in neural networks.

• It maps data into a higher-dimensional space using a sigmoid function.

• This kernel can be useful when the data has a sigmoid-like shape, though it's less
commonly used than the other kernels.

Sigmoid Kernel Equation: K(X1,X2)=tanh(αX1⋅X2+c)


Deepika Kamboj

Hyper • Kernel Type (Kernel Function)

parameters • Kernel Parameters

• Regularization Parameter (C)


in SVM
Deepika Kamboj
Kernel Type (Kernel Function): Common choices are
Linear, Polynomial, RBF (Radial Basis Function), and
Sigmoid kernels.

Kernel Parameters: For example, in the RBF kernel, you


Hyper have the 𝛾 parameter. In the Polynomial kernel, you have
the degree d and a constant c.
parameters Regularization Parameter (C): The regularization

in SVM parameter, denoted as "C," controls the trade-off between


maximizing the margin and minimizing classification
errors. Smaller values of C will allow for a wider margin
but may lead to misclassification of some training
examples, while larger values of C will aim to classify all
training examples correctly but may result in a narrower
margin.
Deepika Kamboj

Strengths of SVM

• SVM can be used for both classification and regression.

• It is robust, i.e., not much impacted by data with noise or outliers.

• The prediction results using this model are very promising.


Deepika Kamboj

Weaknesses of SVM
• SVM is applicable only for binary classification.

• The SVM model is very complex – almost like a black box when it deals
with a high-dimensional data set.

• It is slow for a large dataset, i.e., a data set with either large number of
features or large number of instances.
• It is quite memory-intensive.
Deepika Kamboj

Image-based analysis and classification tasks

Geo-spatial data-based applications

Application Text-based applications

of SVM Computational biology

Security-based applications

Chaotic systems control


Deepika Kamboj

S. No. Logistic Regression Support Vector Machine

It is an algorithm used for It is a model used for both


1.
solving classification problems. classification and regression.

Logistic It is not used to find the best


margin, instead, it can have
It tries to find the “best” margin
that separates the classes and
Regression vs 2. different decision boundaries
with different weights that are
near the optimal point.
thus reduces the risk of error on
the data.

Support Vector
Machine 4.
It is based on probabilistic
approach.
It is based on geometrical or
statistical properties of the data.

The risk of overfitting is less in


5. It is vulnerable to overfitting.
SVM.
Deepika Kamboj

Properties of SVM

• Effective for High-Dimensional Data: SVMs are effective for high-dimensional data,
making them suitable for tasks like text classification and image recognition.

• Versatility: SVMs can be used for both classification and regression tasks. In
classification, they are particularly known for their ability to handle non-linear
separable data using various kernel functions.

• Maximizing Margin: SVMs aim to find a decision boundary (hyperplane) that


maximizes the margin between classes. This can lead to better generalization.
Deepika Kamboj

Properties of SVM

• Robust to Overfitting: SVMs can be less prone to overfitting when appropriate regularization
is applied through the "C" parameter.

• Ability to Handle Non-Linear Data: Using kernel functions (e.g., RBF, polynomial), SVMs
can model non-linear relationships in the data by implicitly mapping the data to a higher-
dimensional space.

• Binary Classification: While SVMs are originally designed for binary classification, they can
be extended to multi-class problems using techniques like one-vs-all or one-vs-one strategies.
Deepika Kamboj

Properties of SVM

Kernel Flexibility: SVMs offer a variety of kernel functions, allowing you to


choose the most appropriate one for your specific problem.
Deepika Kamboj

Issues in SVM

• Choice of Kernel and Parameters: Selecting the appropriate kernel function and setting the hyperparameters (e.g.,
"C" for regularization, kernel-specific parameters) can be challenging. The performance of the SVM is sensitive to
these choices.

• Computational Complexity: SVMs can be computationally expensive, especially when working with large datasets
or complex kernel functions. Training times can be long, and memory requirements can be significant.

• Sensitivity to Noise: SVMs can be sensitive to noisy data, as they aim to fit a margin that best separates classes.
Noisy or mislabelled data points near the decision boundary can have a significant impact on the model.
Deepika Kamboj

Issues in SVM

• Handling Imbalanced Data: SVMs are not inherently well-suited for imbalanced datasets, where
one class has significantly more data points than the other.

• Multi-Class Classification: SVMs are originally designed for binary classification and need
extensions like one-vs-all or one-vs-one for multi-class problems, which can be computationally
expensive.

• Memory Usage: Storing the support vectors, which are the critical data points for defining the
decision boundary, can consume substantial memory, especially in high-dimensional spaces.

You might also like