Professional Documents
Culture Documents
Lecture 5: SVM II
Princeton University COS 495
Instructor: Yingyu Liang
SVM: objective
Let +1, 1 , , = + . Margin:
,
= min
||
Support Vector Machine:
,
max = max min
,
,
||
SVM: optimization
Optimization (Quadratic Programming):
1
min
, 2
+ 1,
Solved by Lagrange multiplier method:
1
, , =
[ + 1]
Lagrange multiplier
Lagrangian
Consider optimization problem:
min ()
= 0, 1
Lagrangian:
, = + ()
Lagrangian
Consider optimization problem:
min ()
= 0, 1
Solved by setting derivatives of Lagrangian to 0
= 0;
=0
Generalized Lagrangian
Consider optimization problem:
min ()
0, 1
= 0, 1
Generalized Lagrangian:
, , = + () + ()
Generalized Lagrangian
Consider the quantity:
max , ,
,: 0
Why?
,
if satisfies all the constraints
=
+, if does not satisfy the constraints
So minimizing is the same as minimizing
,: 0
Lagrange duality
The primal problem
min = min max , ,
,: 0
Always true:
Lagrange duality
The primal problem
min = min max , ,
,: 0
Lagrange duality
Theorem: under proper conditions, there exists , , such that
= , , =
Moreover, , , satisfy Karush-Kuhn-Tucker (KKT) conditions:
= 0,
= 0
0, = 0,
Lagrange duality
dual complementarity
Theorem: under proper conditions, there exists , , such that
= , , =
Moreover, , , satisfy Karush-Kuhn-Tucker (KKT) conditions:
= 0,
= 0
0, = 0,
Lagrange duality
Theorem: under proper conditions, there exists , , such that
primal constraints
dual constraints
= , , =
Moreover, , , satisfy Karush-Kuhn-Tucker (KKT) conditions:
= 0,
= 0
0, = 0,
Lagrange duality
What are the proper conditions?
A set of conditions (Slater conditions):
, convex, affine
Exists satisfying all < 0
SVM: optimization
SVM: optimization
Optimization (Quadratic Programming):
1
min
, 2
+ 1,
Generalized Lagrangian:
1
, , =
[ + 1]
SVM: optimization
KKT conditions:
= 0, =
= 0, 0 =
(1)
(2)
Plug into :
, , =
1
2
combined with 0 = , 0
(3)
SVM: optimization
Reduces to dual problem:
1
, , =
2
= 0, 0
Since = , we have + = +
Kernel methods
Features
Color Histogram
Extract
features
Red
Green
Blue
Features
Features
Proper feature mapping can make non-linear to linear
Using SVM on the feature space { }: only need ( )
Conclusion: no need to design , only need to design
, = ( )
Polynomial kernels
Fix degree and constant :
, = +
What are ()?
Expand the expression to get ()
Polynomial kernels
Gaussian kernels
Fix bandwidth :
/2 2 )
, = exp(
Also called radial basis function (RBF) kernels
, =
!
, = ( )
Features
Color Histogram
Extract
features
build
hypothesis
Red
Green
Blue
build
hypothesis
Linear model
Polynomial kernels
22
21 2
= sign( () + )
21
22
First layer is fixed. If also learn first layer, it becomes two layer neural network