Professional Documents
Culture Documents
Before starting:
1- throughout the lecture if you see underlined red colored
text then click on this text for farther information.
2-let me introduce you to “Nodnikit” :She is an outstanding
student. Although she asks many questions but sometimes
these questions are key questions that help us understand
the material more in depth. Also the notes that she gives are
very helpful.
Hi!
Introduction to SVM : History and motivation
x2
yi 1
yi 1 f ( x) sign( w x b)
A separating
hypreplane
w x b 0
x1
x'
Good generalization! xi
Choosing a separating hyperplane.
The SVM approach: Linear separable case
gin
Why it is the best?
ar
-Robust to outliners as
M
we saw and thus
d
d
strong generalization xi
ability.
-It proved itself to have
d
better performance on
test data in both
practice and in theory.
Choosing a separating hyperplane.
The SVM approach: Linear separable case
gin
These are ar
M
Support d
d
Vectors xi
We will see latter that the
d
Optimal hyperplane is
completely defined by
the support vectors.
SVM : Linear separable case.
Formula for the Margin
m
The boundary. Set
x'
(we can rescale w and b).
d
d
-For uniqueness set wt xi b 1 for
such such
that:
For yi 1, w T xi b 1 that:
yi (wT xi b) 1
For yi 1, w T xi b 1
We transformed the problem into a form that can be
efficiently solved. We got an optimization problem with a
convex quadratic objective with only linear constrains and
always has a single global minimum.
SVM : Linear separable case.
The optimization problem:
s.t i 0
SVM : Linear separable case.
The optimization problem cont’:
s.t i 0
We star solving this problem:
n
Lp w i yi xi
0
w i 1
n
Lp
0 y i i 0
b i 1
SVM : Linear separable case.
Inroducing The Legrangin Dual Problem.
{1 , 2 ,........., n } are now our variables, one for each sample
point xi .
SVM : Linear separable case.
Finding “w” and “b” for the boundary wt x b :
n
1 n n
maximaize LD ( ) i i j yi y j xi t x j
i 1 2 i 0 j 0
n
s.t i 0 and y
i 1
i i 0
0 x
2
mapping data to two-dimensional space with
x2
( x ) ( x, x )
Wow!, now we can
use the linear SVM
we learned in this
higher dimensional
space!
0 x
Non Linear SVM:
Mapping the data to higher dimension cont’
i 1
No need to know this mapping explicitly nor do we need to
know the dimension of the new space, because we only use
the dot product of feature vectors in both the training and test.
The Kernel Trick:
The cost of
computation is:
O(m 2 )
(m is the dimension of input)
To believe me that it
is really the real
Kernel :
Higher Order Polynomials (From Andrew Moore)
K=
2
g (u ) du is finite then
Great!, so know
we can check if
” K “is a kernel K (u, v) g (u ) g (v)dudv 0
without the need
to know ( x)
Examples of Kernels:
Now we can
make complex
kernels from
simple ones:
Modularity !
Linearly separable
But low margin!
Such that: yi ( wt xi b) 1 i
i i 0
Under-Fitting Over-Fitting
Trade-Off
Based on [12] and [3]
SVM :Nonlinear case
Recipe and Model selection procedure:
-In most of the real-world applications of SVM we combine what
we learned about the kernel trick and the soft margin and use
them together : n
1 n n
maximize
i 1
i
2
i 1
j 1
i j yi y j K ( xi , x j )
n
constrained to 0 i C i and
i 1
i yi 0
j 1 j 1 n
-The Classification function will be: g ( x ) sign( y K ( x , x ) b)
i 1
i i i
SVM:Nonlinear case
Model selection procedure
-We have to decide which Kernel function and “C” value to use.
-”In practice a Gaussian radial basis or a low degree polynomial
kernel is a good start.” [Andrew.Moore]
- We start checking which set of parameters (such as C
or if we choose Gaussian radial basis) are the most
appropriate by Cross-Validation (K- fold) ( [ 8 ]) :
1) divide randomly all the available training examples into K
equal-sized subsets.
2) use all but one subset to train the SVM with the chosen para’.
3) use the held out subset to measure classification error.
4) repeat Steps 2 and 3 for each subset.
5) average the results to get an estimate of the generalization
error of the SVM classifier.
SVM:Nonlinear case
Model selection procedure cont’
C 25 , 29
is a good choice
SVM For Multi-class classification: (more than two
classes)
Wow! many
applications
for SVM!
Applications of SVM:
Facial Expression Recognition
We see some particular combinations such as (fear vs. disgust) are harder
to distinguish than others.
-Then they moved to constructing their classifier for streaming video
rather than still images: Click here for a demo
of facial expression
recognition (from
another source but
also used SVM)
The Advantages of SVM:
SVMlight: http://svmlight.joachims.org/
By Joachims, is one of the most widely used SVM
classification and regression package. Distributed as C++
source and binaries for Linux, Windows, Cygwin, and
Solaris. Kernels: polynomial, radial basis function, and neural
(tanh).
LibSVM : http://www.csie.ntu.edu.tw/~cjlin/libsvm/
LIBSVM (Library for Support Vector Machines), is developed
by Chang and Lin; also widely used. Developed in C++ and
Java, it supports also multi-class classification, weighted
SVM for unbalanced data, cross-validation and automatic
model selection. It has interfaces for Python, R, Splus,
MATLAB, Perl, Ruby, and LabVIEW. Kernels: linear,
polynomial, radial basis function, and neural (tanh).
That’s all folks !!