You are on page 1of 16


TCS Research Scholar
Machine Intelligence Research Laboratory

What SVM means?
What actually meant for?
To be precise, SVM?!?!?!

Machine learning is a method of teaching computers to make and
improve predictions or behaviors based on some data
Another way to think about machine learning is that it is Pattern
Recognition the act of teaching a program to react to or recognize
The study on statistical learning theory was started in 1960s by Vapnik
Statistical Learning theory is the theory about Machine Learning
Principle from a small sample size
SVM is a practical learning method based on statistical learning

SVM belongs to class of supervised learning algorithm.
SVMs provide a learning technique for,
Pattern Recognition
Regression Estimation
Solution provided SVM is,
Theoretically elegant
Computationally efficient
Very effective in many large practical problems
It has a geometrical interpretation in a high-dimensional feature space that
is nonlinearly related to input space.

Which Hyperplane?

Separate the training set with maximal margin

Understanding the basics

Maximum margin

The Margin

Maximizing the Margin

Non linear Classification

The Kernel Trick

The linear classifier relies on dot product between vectors K(xi,xj)=xiTxj
If every data point is mapped into high-dimensional space via some
transformation : x (x), the dot product becomes:
K(xi,xj)= (xi) T(xj)
A kernel function is some function that corresponds to an inner product in
some expanded feature space.

Examples of Kernel Functions

Linear: K(xi,xj)= xi Txj

Polynomial of power p: K(xi,xj)= (1+ xi Txj)p

Gaussian (radial-basis function network):

K (x i , x j ) exp(

xi x j

Sigmoid: K(xi,xj)= tanh(0xi Txj + 1)

Non linear SVM

SVM locates a separating hyperplane in the feature space and classify points in that space

It does not need to represent the space explicitly, simply by defining a kernel function

The kernel function plays the role of the dot product in the feature space.

Properties of SVM

Flexibility in choosing a similarity function

Sparseness of solution when dealing with large data sets
- only support vectors are used to specify the separating hyperplane
Ability to handle large feature spaces
- complexity does not depend on the dimensionality of the feature space
Overfitting can be controlled by soft margin approach
Nice math property: a simple convex optimization problem which is
guaranteed to converge to a single global solution
Feature Selection

Florian Markowetz , Max-Planck Institute for Molecular Genetics
Classification by Support Vector Machine.ppt, Practical DNA
Microarray Analysis, 2003
Mingyue Tan, The University of British Columbia,Support Vector
Machine & its Application.ppt, 2004.
K.P.Soman, R.Loganathan, V.Ajay,Machine Learning with SVM and
other kernel methods, PHI Learning Private Limited, 2009.