You are on page 1of 14

IME 672

Data Mining & Knowledge


Discovery

Dr. Faiz Hamid


Department of Industrial & Management Engineering
Indian Institute of Technology Kanpur
Email: fhamid@iitk.ac.in
Support Vector Machines
Support Vector Machines
• A relatively new classification method for both linear and
nonlinear data – (Vladimir Vapnik, 1992)
• Uses a nonlinear mapping to transform the original training
data into a higher dimension
• With the new dimension, it searches for the linear optimal
separating hyperplane
• With an appropriate nonlinear mapping to a sufficiently high
dimension, data from two classes can always be separated by a
hyperplane
• SVM finds this hyperplane using support vectors (“essential”
training tuples) and margins (defined by the support vectors)
Support Vector Machines
Support Vector Machines
• There are an infinite number of separating
hyperplanes

• Objective is to find the “best” hyperplane that will


have the minimum classification error on unseen
tuples

• Maximum marginal hyperplane


Support Vector Machines
Support Vector Machines
• A separating hyperplane can be written as
• W●X+b=0
• where W={w1, w2, …, wn} is a weight vector and b a scalar (bias)
• For 2-D it can be written as
• w 0 + w 1 x1 + w 2 x 2 = 0
• The hyperplane defining the sides of the margin:
H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1,
H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
• Any training tuples that fall on hyperplanes H1 or H2 (i.e., the
sides defining the margin) are support vectors
• Constrained (convex) quadratic optimization problem: Quadratic
objective function and linear constraints
Support Vector Machines
• Maximum separating hyperplane
= maximum distance between
the nearest training tuples

• The support vectors are shown


with red thick border
SVM and High Dimensional Data
• Complexity of trained classifier is characterized by the # of support vectors
rather than the dimensionality of the data

• Support vectors: the essential or critical training examples, lie closest to the
decision boundary (MMH)

• If all training examples are removed except the support vectors and the
training is repeated, the same separating hyperplane would be found

• Number of support vectors found can be used to compute an (upper)


bound on the expected error rate of the SVM classifier, which is
independent of the data dimensionality

• SVM with a small number of support vectors can have good generalization,
even when the dimensionality of the data is high
SVM—Linearly Inseparable
• Transform the original input data
into a higher dimensional space
using a nonlinear mapping

• Search for a linear separating


hyperplane in the new space

• MMH found in the new space


corresponds to a nonlinear
separating hypersurface in the
original space
SVM—Linearly Inseparable
SVM—Linearly Inseparable
SVM—Linearly Inseparable

Hyperplane
SVM Related Links
• SVM Website: http://www.kernel-machines.org/
• Representative implementations
– LIBSVM: an efficient implementation of SVM, multi-class
classifications, nu-SVM, one-class SVM, including also
various interfaces with java, python, etc.
– SVM-light: simpler but performance is not better than
LIBSVM, support only binary classification and only in C
– SVM-torch: another recent implementation also written in C

You might also like