Seminar

SUPPORT VECTOR MACHINE
(Optimization in Machine Learning)
Presented By
Anisha N Rao & Aditi Goel
Department Of Data Science

Prasanna School Of Public Health
Manipal Academy Of Higher Education
May 13, 2022
1/51
Support Vector Machine May 13, 2022
Table of Contents
1 Introduction
2 Support Vector Classification
3 Support Vector Regression
4 Case Study
5 SVC Implementation
6 Conclusion
2/51
Introduction
Support Vector Machine or SVM is one of the most popular Supervised

Learning algorithms, which is used for Classification as well as Regression
problems.
The whole concept of SVM relies upon the appropriate formation of the
hyperplane both in Classification and Regression. The hyperplane will be
generated in an iterative manner by SVM so that the error can be
minimized.
3/51
Support Vector Classification
Basic Concepts
• Support Vectors : Datapoints that are closest to the hyperplane is called
support vectors. Separating line will be defined with the help of these data
points.
• Hyperplane : As we can see in the diagram, it is a decision plane or
space which is divided between a set of objects having different classes.
• Margin : It may be defined as the gap between two lines on the closest
data points of different classes. It can be calculated as the perpendicular
distance from the line to the support vectors. Large margin is considered
as a good margin and small margin is considered as a bad margin.
4/51
Figure: Classification Visualization Using SVC
5/51
Hyperplane
• In Geometry a hyperplane is a subspace whose dimension is one less than

that of its ambient space.A hyperplane separates the space into two spaces.
1• If a space is 3-dimensional then its hyperplane are the 2-dimensional
planes
2• If a space is 2-dimensional then its hyperplane are the 1-dimensional
lines
If a space is 1-dimensional then its hyperplane are single points
Figure: Hyperplane in 3D
6/51
Hyperplane Visualization
7/51
Geometry
• The distance between a point (xo,yo) and a line ax+by+c=0 is equal to:
Figure: Perpendicular distance

8/51
Types Of SVC
• Linear SVC : Linear SVC is used for linearly separable data, which means
if a data set can be classified into two classes by using a single straight
line, then such data is termed as linearly separable data, and classifier is
used called as Linear SVM classifier.
• Non-linear SVC : Non-Linear SVC is used for non-linearly separated data,
which means if a data set cannot be classified by using a straight line,
then such data is termed as non-linear data and classifier used is called as
Non-linear SVM classifier.
9/51
Maximum Margin Classifier - Hard Margin
• MMC is the hyperplane that among all separating hyperplanes ,find the
one that makes the biggest gap(margin) between two classes.
• The core idea of hard margin is to maximize the margin ,under the
constraint that the classifier does not make any mistake.
Figure: MMC 10/51

If we numerically define blue circles as +1 and green circles as -1 ,any
good linear model is expected to satisfy:
The Optimization Problem is
11/51
Support Vector Classifier - Soft Margin
We can extend the concept of a separating hyperplane in order to develop

a hyperplane that almost separates the classes ,using so called soft-margin.
• The generalization of the maximal margin classifier using soft margin is
known as the support vector classifier(SVC).
• It could be worthwhile to misclassify a few training observations in order
to do a better job in classifying remaining observations.
• Slack variables allow some observations to fall on the wrong side of the
margin ,but will penalize them by parameter C:Cost of Misclassification
12/51
Figure: Soft Margin
13/51
14/51
The Kernel Trick
Sometimes a linear boundary simply wont work ,no matter what is the
value of C.
Figure: Classification In Higher Dimension
The kernel is a function that quantifies the similarities between

observations by summarizing the relationship between every single pairs in
the training set.
15/51
Types Of Kernel
Figure: Kernels
16/51
Comparing Kernels
Figure: Accuracy Of Kernels

17/51
Support Vector Regression
SVR Overview
• a regression model estimates a continuous-valued multivariate function
• formulate binary classification as convex Optimization problems
Optimization problem goals :
• find maximum margin separating the hyperplane
• alongside, correctly classify as many training points as possible.
• SVMs represent this optimal hyperplane with support vectors.
18/51
SVR: Concepts, Mathematical Model, and Graphical
Representation
• Consider the following image where the middle line,
f (x) = wx + b
represents a hyperplane and the two dotted lines around the hyperplane
y = f (x) + ϵ
and
y = f (x) − ϵ
represent the decision boundaries where ϵ is the distance from the
hyperplane
19/51
Graphical Presentation
20/51
• The main aim of the support vector regression model is to decide a
decision boundary at a distance from the original hyperplane such that
data points closest to the hyperplane or the support vectors are within that
boundary line. Any hyperplane that satisfies the SVR should also satisfy:
21/51
SVR generalization to SVC
Introduce an ϵ -insensitive region around the function, called the ϵ -tube.

This tube reformulates the optimization problem to find the tube that best
approximates the continuous-valued function, while balancing model
complexity and prediction error.
• hyperplane is represented in terms of support vectors
• training and test data are assumed to be independent and identically
distributed (iid), drawn from the same fixed but unknown probability
distribution function in a supervised-learning context.
Adopting a soft-margin approach similar to that employed in SVM, slack
variables ξ, ξ ∗ can be added to guard against outliers. These variables
determine how many points can be tolerated outside the tube.
22/51
Basic details on Loss Function
Interpret L(x,y,f(x)) as the cost, or loss, of predicting y by f(x) if x is

observed.
• smaller the value L(x,y,f(x)) is, the better f(x) predicts y in the sense of L
• L penalizes predictions whose signs disagree with that of y
• constant loss functions, such as L := 0, are rather meaningless for our
purposes
Loss functions should be convex to ensure that the optimization problem
has a unique solution that can be found in a finite number of steps. A few
examples of loss functions:
23/51
A few examples of loss functions
24/51
(a) Linear loss function
(b) Quadratic loss function
(c) Huber loss function
25/51
Figure: Solutions For SVR with various orders of polynomial
26/51
This graph visualizes how the magnitude of the weights can be interpreted
as a measure of flatness.
• Horizontal line is a 0th-order polynomial solution, has a very large
deviation from desired o/ps so large error.
• Linear function produces better approximations for a portion of data but
still underfits the training data.
• 4th-order solution produces the best tradeoff between function flatness
and prediction error.
• Higher-order solution has zero error, high complexity, overfits the
solution on yet to be seen data.
27/51
Case Study of SVM for Handwriting Recognition
(Discuss difference between Offline Sensing and Online Recognition)

•We present here a case study on developing an efficient
writer-independent HWR system for isolated letters, using pen trajectory
modeling for feature extraction.
• The proposed HWR workflow is composed of preprocessing; feature
extraction; and a hierarchical, three-stage classification phase.
• Preprocessing comprises correcting the slant, normalizing the dimensions
of the letter, and shifting the coordinates with respect to the center of
mass.
28/51
Preprocessing
Figure: Examples of letters before (left) and after (right) preprocessing
29/51
Feature Extraction
This preprocessed data consists of strokes of coordinate pairs [x(t), y(t)].

Model this data using a pen trajectory technique set of features is
obtained after averaging the following functions:
30/51
31/51
32/51
Hierarchical, Three-Stage SVM
A three-stage classifier recognizes one of the 52 classes (26 lowercase and

26 uppercase letters).
• Using a binary SVM classifier, the first stage classifies the instance as
one of two classes: uppercase or lowercase letter.
• Using OAA SVM, the second stage classifies the instance as one of the
manually determined clusters
• Using OAA SVM, with a simple majority vote, the third stage identifies
the letter as one of the 52 classes (or subclusters).
33/51
Clustering for the upper and lower cases of alphabets.
34/51
3-stage Hierarchical SVM Block Diagram
35/51
Confusion plot for classified label versus true label
36/51
Table Of Experimental Results
Experimental results showed an average accuracy of 91.7 %. The three

stages of the classifier achieved, respectively, 99.3 %, 95.7 %, and 96.5 %
accuracy respectively. (Our proposed preprocessing helped improve the
general accuracy of the recognizer by approximately 1.5 % to 2 %.)*
*Recognition rates comparison:
37/51
SVC Implementation
38/51
39/51
40/51
41/51
42/51
43/51
44/51
45/51
46/51
47/51
Conclusion
SVM Advantages
• Support vector machine is very effective even with high dimensional data.
• When you have a data set where number of features is more than the
number of rows of data, SVM can perform in that case as well.
• When classes in the data are points are well separated SVM works really
well.
• SVM can be used for both regression and classification problem. And
last but not the least SVM can work well with image data as well.
48/51
SVM Disadvantages
• When classes in the data are points are not well separated, which means
overlapping classes are there, SVM does not perform well.
• We need to choose an optimal kernel for SVM and this task is difficult.
• SVM on large data set comparatively takes more time to train.
• SVM or Support vector machine is not a probabilistic model so we can
not explanation the classification in terms of probability.
• It is difficult to understand and interpret the SVM model compared to
Decision tree as SVM is more complex.
49/51
References I
https://www.javatpoint.com/
machine-learning-support-vector-machine-algorithm
https://www.geeksforgeeks.org/
support-vector-machine-algorithm
https:
//www.tutorialspoint.com/machine_learning_with_python
https://www.youtube.com/watch?v=jMWjN6mJiSw&t=39s
https://www.youtube.com/results?search_query=john+
pedram+svm+part+25
https://www.researchgate.net/publication/277299933_
Efficient_Learning_Machines_Theories_Concepts_and_
Applications_for_Engineers_and_System_Designers
https://machinehack.com/bootcamp/bootcampcourse
50/51
THANK YOU
51/51

Seminar

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Seminar

Uploaded by

Copyright:

Available Formats

SUPPORT VECTOR MACHINE

(Optimization in Machine Learning)

Department Of Data Science

May 13, 2022

2 Support Vector Classification

3 Support Vector Regression

Support Vector Machine or SVM is one of the most popular Supervised

• In Geometry a hyperplane is a subspace whose dimension is one less than

Figure: Perpendicular distance

Figure: MMC 10/51

The Optimization Problem is

We can extend the concept of a separating hyperplane in order to develop

Figure: Classification In Higher Dimension

The kernel is a function that quantifies the similarities between

Figure: Accuracy Of Kernels

• Consider the following image where the middle line,

Introduce an ϵ -insensitive region around the function, called the ϵ -tube.

Interpret L(x,y,f(x)) as the cost, or loss, of predicting y by f(x) if x is

(Discuss difference between Offline Sensing and Online Recognition)

Figure: Examples of letters before (left) and after (right) preprocessing

This preprocessed data consists of strokes of coordinate pairs [x(t), y(t)].

A three-stage classifier recognizes one of the 52 classes (26 lowercase and

Experimental results showed an average accuracy of 91.7 %. The three

You might also like