Professional Documents
Culture Documents
Support Vector
Machine
Deepika Kamboj
• SVM is based on the concept of a surface, called a hyperplane, which draws a boundary between
data instances plotted in the multi-dimensional feature space.
• The output prediction of an SVM is one of two conceivable classes which are already defined in
the training data.
NOTE: Don’t get confused between SVM and logistic regression. Both the algorithms try to find the
best hyperplane, but the main difference is logistic regression is a probabilistic approach whereas
support vector machine is based on statistical approaches.
Deepika Kamboj
Support
Vector
Machine
Deepika Kamboj
Support Vectors
• Support vectors are the data points (representing classes), the critical component in a
data set, which are near the identified set of lines (hyperplane).
• Margin: it is the distance between the hyperplane and the observations closest to the
hyperplane (support vectors). In SVM large margin is considered a good margin.
• There are two types of margins hard margin and soft margin.
Deepika Kamboj
Hard Margin: A hard margin SVM seeks to find a decision boundary that completely separates
two classes of data, with no data points allowed in the margin or on the wrong side of the
boundary. It is more rigid and works well when the data is linearly separable and there are no
outliers.
Soft Margin: A soft margin SVM allows for a margin that may contain some misclassified data
points or outliers. It introduces a trade-off between maximizing the margin and minimizing
classification errors. Soft margin SVMs are more flexible and can handle cases where the data is
not perfectly separable.
Deepika Kamboj
Support Vector
Machine
• There may be many possible hyperplanes,
and one of the challenges with the SVM
model is to find the optimal hyperplane.
Dot Product
In SVM
A.B = |A| cosθ * unit vector of B
Deepika Kamboj
Dot Product
Here A and B are 2 vectors, to find the dot product between these 2 vectors we first find the
magnitude of both the vectors and to find magnitude we use the Pythagorean theorem or the
distance formula.
After finding the magnitude we simply multiply it with the cosine angle between both the vectors.
Mathematically it can be written as:
A . B = |A| cosθ * |B|
Where |A| cosθ is the projection of A on B and |B| is the magnitude of vector B
Now in SVM we just need the projection of A not the magnitude of B. To just get the projection we
can simply take the unit vector of B because it will be in the direction of B but its magnitude will
be 1. Hence now the equation becomes:
A.B = |A| cosθ * unit vector of B
Deepika Kamboj
We already know that projection of any vector or another vector is called dot-
product. Hence, we take the dot product of x and w vectors. If the dot product is
greater than ‘c’ then we can say that the point lies on the right side. If the dot
product is less than ‘c’ then the point is on the left side and if the dot product is
equal to ‘c’ then the point lies on the decision boundary.
Deepika Kamboj
You must be having this doubt that why did we take this perpendicular vector w
to the hyperplane? So what we want is the distance of vector X from the decision
boundary and there can be infinite points on the boundary to measure the
distance from. So that’s why we come to standard, we simply take perpendicular
and use it as a reference and then take projections of all the other data points on
this perpendicular vector and then compare the distance.
Deepika Kamboj
Decision
Rule for –ve
and +ve
points
Deepika Kamboj
Margin in
Support Vector
Machine
Deepika Kamboj
Maximize Margin
Deepika Kamboj
Rather than taking 2 constraints forward, we’ll now try to simplify these two
constraints into 1. We assume that green classes have y=-1 and red classes have y=1.
Deepika Kamboj
Hard SVM
We have now found our optimization function but there is a catch here that we don’t find this type of perfectly
linearly separable data in the industry, there is hardly any case we get this type of data and hence we fail to use
this condition we proved here. The type of problem which we just studied is called Hard Margin SVM now we
shall study soft margin which is similar to this but there are few more interesting tricks we use in Soft Margin
SVM.
Deepika Kamboj
Soft SVM
To make a soft margin equation we add 2 more terms to this equation which
is zeta and multiply that by a hyperparameter ‘c’
Deepika Kamboj
For all the correctly classified points our zeta will be equal to 0 and for all
the incorrectly classified points the zeta is simply the distance of that particular
point from its correct hyperplane that means if we see the wrongly classified
green points the value of zeta will be the distance of these points from L1
hyperplane and for wrongly classified redpoint zeta will be the distance of that
point from L2 hyperplane.
Deepika Kamboj
Types of Support Vector
Machine Algorithms
• Linear SVM: When the data is perfectly linearly separable only then we can
use Linear SVM.
• Non-Linear SVM: When the data is not linearly separable then we can use
Non-Linear SVM. Here we use some advanced techniques like kernel tricks to
classify them.
Deepika Kamboj
ü Linear kernel
ü Polynomial kernel
ü Sigmoid kernel
Deepika Kamboj
Kernel
Methods
Deepika Kamboj
Linear Kernel
• The linear kernel is the simplest and most straightforward kernel.
• It calculates the dot product of the input data points in their original feature space.
• This kernel is suitable when the data is already linearly separable, meaning a straight line
can effectively separate the classes.
Polynomial Kernel
• The polynomial kernel transforms data into a higher-dimensional space using polynomial
functions.
• The kernel has a hyperparameter, usually denoted as "d," which represents the degree of
the polynomial.
Sigmoid Kernel
• The sigmoid kernel is inspired by the sigmoid activation function used in neural networks.
• This kernel can be useful when the data has a sigmoid-like shape, though it's less
commonly used than the other kernels.
Strengths of SVM
Weaknesses of SVM
• SVM is applicable only for binary classification.
• The SVM model is very complex – almost like a black box when it deals
with a high-dimensional data set.
• It is slow for a large dataset, i.e., a data set with either large number of
features or large number of instances.
• It is quite memory-intensive.
Deepika Kamboj
Security-based applications
Support Vector
Machine 4.
It is based on probabilistic
approach.
It is based on geometrical or
statistical properties of the data.
Properties of SVM
• Effective for High-Dimensional Data: SVMs are effective for high-dimensional data,
making them suitable for tasks like text classification and image recognition.
• Versatility: SVMs can be used for both classification and regression tasks. In
classification, they are particularly known for their ability to handle non-linear
separable data using various kernel functions.
Properties of SVM
• Robust to Overfitting: SVMs can be less prone to overfitting when appropriate regularization
is applied through the "C" parameter.
• Ability to Handle Non-Linear Data: Using kernel functions (e.g., RBF, polynomial), SVMs
can model non-linear relationships in the data by implicitly mapping the data to a higher-
dimensional space.
• Binary Classification: While SVMs are originally designed for binary classification, they can
be extended to multi-class problems using techniques like one-vs-all or one-vs-one strategies.
Deepika Kamboj
Properties of SVM
Issues in SVM
• Choice of Kernel and Parameters: Selecting the appropriate kernel function and setting the hyperparameters (e.g.,
"C" for regularization, kernel-specific parameters) can be challenging. The performance of the SVM is sensitive to
these choices.
• Computational Complexity: SVMs can be computationally expensive, especially when working with large datasets
or complex kernel functions. Training times can be long, and memory requirements can be significant.
• Sensitivity to Noise: SVMs can be sensitive to noisy data, as they aim to fit a margin that best separates classes.
Noisy or mislabelled data points near the decision boundary can have a significant impact on the model.
Deepika Kamboj
Issues in SVM
• Handling Imbalanced Data: SVMs are not inherently well-suited for imbalanced datasets, where
one class has significantly more data points than the other.
• Multi-Class Classification: SVMs are originally designed for binary classification and need
extensions like one-vs-all or one-vs-one for multi-class problems, which can be computationally
expensive.
• Memory Usage: Storing the support vectors, which are the critical data points for defining the
decision boundary, can consume substantial memory, especially in high-dimensional spaces.