UNIT 2 - Part 8

Deepika Kamboj
Support Vector
Machine
Deepika Kamboj
Support Vector Machine

• SVM is a model, which can do linear classification as well as regression.
• SVM is based on the concept of a surface, called a hyperplane, which draws a boundary between
data instances plotted in the multi-dimensional feature space.
• The output prediction of an SVM is one of two conceivable classes which are already defined in
the training data.
NOTE: Don’t get confused between SVM and logistic regression. Both the algorithms try to find the
best hyperplane, but the main difference is logistic regression is a probabilistic approach whereas
support vector machine is based on statistical approaches.
Deepika Kamboj
Support
Vector
Machine
Deepika Kamboj
Classification using hyperplanes

Deepika Kamboj
Support Vectors
• Support vectors are the data points (representing classes), the critical component in a
data set, which are near the identified set of lines (hyperplane).
• Margin: it is the distance between the hyperplane and the observations closest to the
hyperplane (support vectors). In SVM large margin is considered a good margin.
• There are two types of margins hard margin and soft margin.
Deepika Kamboj
Hard Margin & Soft Margin

Deepika Kamboj
Hard Margin & Soft Margin
Hard Margin: A hard margin SVM seeks to find a decision boundary that completely separates
two classes of data, with no data points allowed in the margin or on the wrong side of the
boundary. It is more rigid and works well when the data is linearly separable and there are no
outliers.
Soft Margin: A soft margin SVM allows for a margin that may contain some misclassified data
points or outliers. It introduces a trade-off between maximizing the margin and minimizing
classification errors. Soft margin SVMs are more flexible and can handle cases where the data is
not perfectly separable.
Deepika Kamboj
Support Vector
Machine
• There may be many possible hyperplanes,
and one of the challenges with the SVM
model is to find the optimal hyperplane.
• A hard margin in terms of SVM means that

an SVM model is inflexible in classification
and tries to work exceptionally fit in the
training set, thereby causing overfitting.
Deepika Kamboj
1. The hyperplane should segregate the

data instances belonging to the two
Identifying classes in the best possible way (Dual
the correct Formulation).
hyperplane 2. It should maximize the distances

between the nearest data points of both
in SVM the classes, i.e., maximize the margin.
Deepika Kamboj
• Doing so helps us in achieving more
Identifying generalization and hence less number of issues
the correct in the classification of unknown data.
hyperplane • Modelling a problem using SVM is nothing

but identifying the support vectors and MMH
in SVM corresponding to the problem space.
Deepika Kamboj
Dot Product
A . B = |A| cosθ * |B|
In SVM
A.B = |A| cosθ * unit vector of B
Deepika Kamboj
Dot Product
Here A and B are 2 vectors, to find the dot product between these 2 vectors we first find the
magnitude of both the vectors and to find magnitude we use the Pythagorean theorem or the
distance formula.
After finding the magnitude we simply multiply it with the cosine angle between both the vectors.
Mathematically it can be written as:
A . B = |A| cosθ * |B|
Where |A| cosθ is the projection of A on B and |B| is the magnitude of vector B
Now in SVM we just need the projection of A not the magnitude of B. To just get the projection we
can simply take the unit vector of B because it will be in the direction of B but its magnitude will
be 1. Hence now the equation becomes:
A.B = |A| cosθ * unit vector of B
Deepika Kamboj
Use of Dot Product

in SVM
Deepika Kamboj
Consider a random point X and we

want to know whether it lies on the
Use of Dot right side of the plane or the left side

of the plane (positive or negative).
Product in To find this first we assume this point

is a vector (X) and then we make a
SVM vector (w) which is perpendicular to
the hyperplane. Let’s say the distance
of vector w from origin to decision
boundary is ‘c’. Now we take the
projection of X vector on w.
Deepika Kamboj
Use of Dot Product

in SVM
Deepika Kamboj
Use of Dot Product in SVM
We already know that projection of any vector or another vector is called dot-
product. Hence, we take the dot product of x and w vectors. If the dot product is
greater than ‘c’ then we can say that the point lies on the right side. If the dot
product is less than ‘c’ then the point is on the left side and if the dot product is
equal to ‘c’ then the point lies on the decision boundary.
Deepika Kamboj
Dot product of x and w vectors

Deepika Kamboj
Dot product of x and w vectors
You must be having this doubt that why did we take this perpendicular vector w
to the hyperplane? So what we want is the distance of vector X from the decision
boundary and there can be infinite points on the boundary to measure the
distance from. So that’s why we come to standard, we simply take perpendicular
and use it as a reference and then take projections of all the other data points on
this perpendicular vector and then compare the distance.
Deepika Kamboj
Decision
Rule for –ve
and +ve
points
Deepika Kamboj
Margin in
Support Vector
Machine
Deepika Kamboj
Maximize Margin
Deepika Kamboj
Optimization Function and its Constraints
For all red points:

For all green points:
Rather than taking 2 constraints forward, we’ll now try to simplify these two
constraints into 1. We assume that green classes have y=-1 and red classes have y=1.
Deepika Kamboj
Hard SVM
We have now found our optimization function but there is a catch here that we don’t find this type of perfectly
linearly separable data in the industry, there is hardly any case we get this type of data and hence we fail to use
this condition we proved here. The type of problem which we just studied is called Hard Margin SVM now we
shall study soft margin which is similar to this but there are few more interesting tricks we use in Soft Margin
SVM.
Deepika Kamboj
Soft SVM
To make a soft margin equation we add 2 more terms to this equation which
is zeta and multiply that by a hyperparameter ‘c’
Deepika Kamboj
For all the correctly classified points our zeta will be equal to 0 and for all
the incorrectly classified points the zeta is simply the distance of that particular
point from its correct hyperplane that means if we see the wrongly classified
green points the value of zeta will be the distance of these points from L1
hyperplane and for wrongly classified redpoint zeta will be the distance of that
point from L2 hyperplane.
Deepika Kamboj
Types of Support Vector
Machine Algorithms
• Linear SVM: When the data is perfectly linearly separable only then we can
use Linear SVM.
• Non-Linear SVM: When the data is not linearly separable then we can use
Non-Linear SVM. Here we use some advanced techniques like kernel tricks to
classify them.
Deepika Kamboj
Types of Support Vector Machine Algorithms

Deepika Kamboj
Linearly inseparable classes

Deepika Kamboj
• To deal with above non-linearly separable

data points SVM uses the idea of “Kernels”.
Kernels • Kernel is a function that is used to map the

input data into a higher-dimensional space
where the data is easier to classify using a
linear boundary.
Deepika Kamboj
A polynomial kernel with degree 2 has been applied in

transforming the data from 1-dimensional to 2-
dimensional data
Deepika Kamboj
In the 2-dimensional case, the kernel trick is applied as below

with the polynomial kernel with degree 2
Deepika Kamboj
• The main idea behind kernel methods in SVMs is to map
the input data into a higher dimensional feature space,
where it is possible to find a hyperplane that separates the
classes with maximum margin.
• The mapping from the original input space to the feature

Kernel space is performed using a kernel function.
Methods • Types of kernel functions :
ü Linear kernel
ü Polynomial kernel
ü Radial basis function (RBF) kernel
ü Sigmoid kernel
Deepika Kamboj
Kernel
Methods
Deepika Kamboj
Linear Kernel
• The linear kernel is the simplest and most straightforward kernel.
• It calculates the dot product of the input data points in their original feature space.
• This kernel is suitable when the data is already linearly separable, meaning a straight line
can effectively separate the classes.
Linear Kernel Equation: K(X1,X2)=X1⋅X2

Deepika Kamboj
Polynomial Kernel
• The polynomial kernel transforms data into a higher-dimensional space using polynomial
functions.
• The kernel has a hyperparameter, usually denoted as "d," which represents the degree of
the polynomial.
• Higher degrees introduce more non-linearity.
Polynomial Kernel Equation: K(X1,X2)=(X1⋅X2+c)d

Deepika Kamboj
Radial Basis Function (RBF) Kernel

• The RBF kernel is also known as the Gaussian kernel.
• It transforms data into an infinite-dimensional space by applying a Gaussian (radial) basis
function.
• The RBF kernel is highly flexible and can capture complex, non-linear relationships in the
data.
• It has a hyperparameter, "γ" (gamma), that controls the shape of the kernel and influences
the smoothness of the decision boundary.
RBF Kernel Equation: K(X1,X2)=exp(−γ⋅∥X1−X2∥2)

Deepika Kamboj
Sigmoid Kernel
• The sigmoid kernel is inspired by the sigmoid activation function used in neural networks.
• It maps data into a higher-dimensional space using a sigmoid function.
• This kernel can be useful when the data has a sigmoid-like shape, though it's less
commonly used than the other kernels.
Sigmoid Kernel Equation: K(X1,X2)=tanh(αX1⋅X2+c)

Deepika Kamboj
Hyper • Kernel Type (Kernel Function)
parameters • Kernel Parameters
• Regularization Parameter (C)

in SVM
Deepika Kamboj
Kernel Type (Kernel Function): Common choices are
Linear, Polynomial, RBF (Radial Basis Function), and
Sigmoid kernels.
Kernel Parameters: For example, in the RBF kernel, you

Hyper have the 𝛾 parameter. In the Polynomial kernel, you have
the degree d and a constant c.
parameters Regularization Parameter (C): The regularization
in SVM parameter, denoted as "C," controls the trade-off between

maximizing the margin and minimizing classification
errors. Smaller values of C will allow for a wider margin
but may lead to misclassification of some training
examples, while larger values of C will aim to classify all
training examples correctly but may result in a narrower
margin.
Deepika Kamboj
Strengths of SVM
• SVM can be used for both classification and regression.
• It is robust, i.e., not much impacted by data with noise or outliers.
• The prediction results using this model are very promising.

Deepika Kamboj
Weaknesses of SVM
• SVM is applicable only for binary classification.
• The SVM model is very complex – almost like a black box when it deals
with a high-dimensional data set.
• It is slow for a large dataset, i.e., a data set with either large number of
features or large number of instances.
• It is quite memory-intensive.
Deepika Kamboj
Image-based analysis and classification tasks
Geo-spatial data-based applications
Application Text-based applications
of SVM Computational biology
Security-based applications
Chaotic systems control

Deepika Kamboj
S. No. Logistic Regression Support Vector Machine
It is an algorithm used for It is a model used for both

1.
solving classification problems. classification and regression.
Logistic It is not used to find the best

margin, instead, it can have
It tries to find the “best” margin
that separates the classes and
Regression vs 2. different decision boundaries
with different weights that are
near the optimal point.
thus reduces the risk of error on
the data.
Support Vector
Machine 4.
It is based on probabilistic
approach.
It is based on geometrical or
statistical properties of the data.
The risk of overfitting is less in

5. It is vulnerable to overfitting.
SVM.
Deepika Kamboj
Properties of SVM
• Effective for High-Dimensional Data: SVMs are effective for high-dimensional data,
making them suitable for tasks like text classification and image recognition.
• Versatility: SVMs can be used for both classification and regression tasks. In
classification, they are particularly known for their ability to handle non-linear
separable data using various kernel functions.
• Maximizing Margin: SVMs aim to find a decision boundary (hyperplane) that

maximizes the margin between classes. This can lead to better generalization.
Deepika Kamboj
Properties of SVM
• Robust to Overfitting: SVMs can be less prone to overfitting when appropriate regularization
is applied through the "C" parameter.
• Ability to Handle Non-Linear Data: Using kernel functions (e.g., RBF, polynomial), SVMs
can model non-linear relationships in the data by implicitly mapping the data to a higher-
dimensional space.
• Binary Classification: While SVMs are originally designed for binary classification, they can
be extended to multi-class problems using techniques like one-vs-all or one-vs-one strategies.
Deepika Kamboj
Properties of SVM
Kernel Flexibility: SVMs offer a variety of kernel functions, allowing you to

choose the most appropriate one for your specific problem.
Deepika Kamboj
Issues in SVM
• Choice of Kernel and Parameters: Selecting the appropriate kernel function and setting the hyperparameters (e.g.,
"C" for regularization, kernel-specific parameters) can be challenging. The performance of the SVM is sensitive to
these choices.
• Computational Complexity: SVMs can be computationally expensive, especially when working with large datasets
or complex kernel functions. Training times can be long, and memory requirements can be significant.
• Sensitivity to Noise: SVMs can be sensitive to noisy data, as they aim to fit a margin that best separates classes.
Noisy or mislabelled data points near the decision boundary can have a significant impact on the model.
Deepika Kamboj
Issues in SVM
• Handling Imbalanced Data: SVMs are not inherently well-suited for imbalanced datasets, where
one class has significantly more data points than the other.
• Multi-Class Classification: SVMs are originally designed for binary classification and need
extensions like one-vs-all or one-vs-one for multi-class problems, which can be computationally
expensive.
• Memory Usage: Storing the support vectors, which are the critical data points for defining the
decision boundary, can consume substantial memory, especially in high-dimensional spaces.

UNIT 2 - Part 8

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UNIT 2 - Part 8

Uploaded by

Copyright:

Available Formats

Deepika Kamboj

Support Vector Machine

Classification using hyperplanes

Hard Margin & Soft Margin

Hard Margin & Soft Margin

• A hard margin in terms of SVM means that

1. The hyperplane should segregate the

Identifying classes in the best possible way (Dual

the correct Formulation).

hyperplane 2. It should maximize the distances

• Doing so helps us in achieving more

Identifying generalization and hence less number of issues

the correct in the classification of unknown data.

hyperplane • Modelling a problem using SVM is nothing

A . B = |A| cosθ * |B|

Use of Dot Product

Consider a random point X and we

Use of Dot right side of the plane or the left side

Product in To find this first we assume this point

Use of Dot Product

Use of Dot Product in SVM

Dot product of x and w vectors

Dot product of x and w vectors

Optimization Function and its Constraints

For all red points:

Optimization Function and its Constraints

Optimization Function and its Constraints

Optimization Function and its Constraints

Types of Support Vector Machine Algorithms

Linearly inseparable classes

• To deal with above non-linearly separable

Kernels • Kernel is a function that is used to map the

A polynomial kernel with degree 2 has been applied in

In the 2-dimensional case, the kernel trick is applied as below

• The mapping from the original input space to the feature

Methods • Types of kernel functions :

ü Radial basis function (RBF) kernel

Linear Kernel Equation: K(X1,X2)=X1⋅X2

• Higher degrees introduce more non-linearity.

Polynomial Kernel Equation: K(X1,X2)=(X1⋅X2+c)d

Radial Basis Function (RBF) Kernel

RBF Kernel Equation: K(X1,X2)=exp(−γ⋅∥X1−X2∥2)

• It maps data into a higher-dimensional space using a sigmoid function.

Sigmoid Kernel Equation: K(X1,X2)=tanh(αX1⋅X2+c)

Hyper • Kernel Type (Kernel Function)

parameters • Kernel Parameters

• Regularization Parameter (C)

Kernel Parameters: For example, in the RBF kernel, you

in SVM parameter, denoted as "C," controls the trade-off between

• SVM can be used for both classification and regression.

• It is robust, i.e., not much impacted by data with noise or outliers.

• The prediction results using this model are very promising.

Image-based analysis and classification tasks

Geo-spatial data-based applications

Application Text-based applications

of SVM Computational biology

Chaotic systems control

S. No. Logistic Regression Support Vector Machine

It is an algorithm used for It is a model used for both

Logistic It is not used to find the best

The risk of overfitting is less in

• Maximizing Margin: SVMs aim to find a decision boundary (hyperplane) that

Kernel Flexibility: SVMs offer a variety of kernel functions, allowing you to