You are on page 1of 43

Support Vector Machine

Dr. Manohar Kase


mk10oct@gmail.com
9977999446
Support Vector Machine-
developed by
Cortes and Vapnik (1995)
I have to find a line which divides
these dots in two parts
we can use Logistic Regression or
Discriminant analysis
Logistic regression focuses on maximizing the probability
of the data. The farther the data lies from the separating
hyperplane, higher will be the probability

An SVM tries to find the separating hyperplane that


maximizes the distance of the closest points to the
margin

LDA tries to maximise the distance between the means of


the two groups, while SVM tries to maximise the margin
between the two groups
Support Vector Machine
A Support Vector Machine (SVM) performs
classification by finding the hyperplane that
maximizes the margin between the two classes.
There can be more then two classes, the
number of features can be both scale or
categorical (convert to dummy) or combination
How many Lines can be drawn
• Which of the linear separators/ Hyper plane is
optimal?
Hyper plane-
What is Support Vector
Support vectors are the data points nearest to the
hyperplane
Support Vector Machine (SVM)
Support Vector Machine (SVM) SVMs maximize
the margin around the separating hyperplane.

The decision function is fully specified by a


subset of training samples, the support vectors.
Classifier Margin

denotes +1
denotes -1
Define the margin of
a linear classifier as
the width that the
boundary could be
increased by before
hitting a datapoint.
Maximum Margin

denotes +1
denotes -1
The maximum
margin linear
classifier is the
linear classifier with
the, maximum
margin.
This is the simplest
kind of SVM (Called
an LSVM)
||W||  Euclidean Distance
Is it a good linear separator

Misclassification
There is an assumption that the data is 100% separable by a linear function
This is also called ------ C-SVM classification
Parameter tuning- Cost or -C decreases Margin
penalty or constraint variable decreases
-C increases Margin Increases -less Misclassification
-More Misclassification
-Low bias and high Variance
-High bias low Variance -
- over fitting
under fitting
-high value of C- Hard margin -low C- soft margin
Non linear SVM
The simplest way to separate two groups of data
is with a straight line (1 dimension), flat plane (2
dimensions) or an N-dimensional hyperplane.

However, there are situations where a nonlinear


region can separate the groups more efficiently.
Is linear separator possible
Is linear separator possible
Is linear separator possible
SVM handles this by using a kernel function (nonlinear) to
map the data into a different space where a hyperplane
(linear) cannot be used to do the separation.

This is called kernel trick which means the kernel function


transform the data into a higher dimensional feature
space to make it possible to perform the linear
separation.

It means a non-linear function is learned by a linear


learning machine in a high-dimensional feature space
while the capacity of the system is controlled by a
parameter that does not depend on the dimensionality of
the space.
2-Dimentinal Data
Convert the 2-D data into 3-D
3-D data
Solution
Kernals
• Why use kernels?
– Make non-separable problem separable.
– Map data into better representational space
• Common kernels
– Linear
– Polynomial K(x,z) = (1+xTz)d
• Gives feature conjunctions
– Radial basis function (infinite dimensional space)
Kernels
• Linear: K(xi,xj)= xiTxj
– Mapping Φ: x → φ(x), where φ(x) is x itself

• Polynomial of power p: K(xi,xj)= (1+ xiTxj)p


– Mapping Φ: x → φ(x), where φ(x) has
dimensions

• Gaussian (radial-basis function): K(xi,xj) =


– Mapping Φ: x → φ(x), where φ(x) is infinite-
dimensional: every point is mapped to a function (a
Gaussian); combination of functions for support
vectors is the separator.
Kernels
• Linear: K(xi,xj)= xiTxj
Parameter- Cost and gamma are
– Mapping Φ: x → φ(x), where φ(x)using
identified is x cross
itselfvalidation
and confusion matrix
• Polynomial of power p: K(xi,xj)= (1+ xiTxj)p
– Mapping Φ: x → φ(x), where φ(x) has
dimensions

• Gaussian (radial-basis function): K(xi,xj) =


– Mapping Φ: x → φ(x), where φ(x) is infinite-
dimensional: every point is mapped to a function (a
Gaussian); combination of functions for support
vectors is the separator.
Few more -Kernels
Type of SVM
• Classification SVM Type 1 (also known as C-SVM classification)-
Y is categorical
• Classification SVM Type 2 (also known as nu-SVM
classification)- Y is categorical
• Regression SVM Type 1 (also known as epsilon-SVM
regression)- Y is Scale
• Regression SVM Type 2 (also known as nu-SVM regression)- Y is
Scale

You might also like