You are on page 1of 34

Machine Learning Basics

Lecture 5: SVM II
Princeton University COS 495
Instructor: Yingyu Liang

Review: SVM objective

SVM: objective
Let +1, 1 , , = + . Margin:
,
= min

||
Support Vector Machine:
,
max = max min

,
,
||

SVM: optimization
Optimization (Quadratic Programming):
1
min

, 2

+ 1,
Solved by Lagrange multiplier method:
1
, , =

[ + 1]

where is the Lagrange multiplier

Lagrange multiplier

Lagrangian
Consider optimization problem:
min ()

= 0, 1
Lagrangian:

, = + ()

where s are called Lagrange multipliers

Lagrangian
Consider optimization problem:
min ()

= 0, 1
Solved by setting derivatives of Lagrangian to 0

= 0;

=0

Generalized Lagrangian
Consider optimization problem:
min ()

0, 1
= 0, 1
Generalized Lagrangian:
, , = + () + ()

where , s are called Lagrange multipliers

Generalized Lagrangian
Consider the quantity:
max , ,
,: 0

Why?
,
if satisfies all the constraints
=
+, if does not satisfy the constraints
So minimizing is the same as minimizing

min = min = min max , ,

,: 0

Lagrange duality
The primal problem
min = min max , ,

,: 0

The dual problem


max min , ,
,: 0

Always true:

Lagrange duality
The primal problem
min = min max , ,

,: 0

The dual problem


max min , ,
,: 0

Interesting case: when do we have


= ?

Lagrange duality
Theorem: under proper conditions, there exists , , such that
= , , =
Moreover, , , satisfy Karush-Kuhn-Tucker (KKT) conditions:

= 0,
= 0

0, = 0,

Lagrange duality
dual complementarity
Theorem: under proper conditions, there exists , , such that
= , , =
Moreover, , , satisfy Karush-Kuhn-Tucker (KKT) conditions:

= 0,
= 0

0, = 0,

Lagrange duality
Theorem: under proper conditions, there exists , , such that
primal constraints
dual constraints
= , , =
Moreover, , , satisfy Karush-Kuhn-Tucker (KKT) conditions:

= 0,
= 0

0, = 0,

Lagrange duality
What are the proper conditions?
A set of conditions (Slater conditions):
, convex, affine
Exists satisfying all < 0

There exist other sets of conditions


Search KarushKuhnTucker conditions on Wikipedia

SVM: optimization

SVM: optimization
Optimization (Quadratic Programming):
1
min

, 2

+ 1,
Generalized Lagrangian:
1
, , =

[ + 1]

where is the Lagrange multiplier

SVM: optimization
KKT conditions:

= 0, =

= 0, 0 =

(1)

(2)

Plug into :
, , =

1

2

combined with 0 = , 0

(3)

Only depend on inner products

SVM: optimization
Reduces to dual problem:

1
, , =
2

= 0, 0

Since = , we have + = +

Kernel methods

Features


Color Histogram

Extract
features
Red

Green

Blue

Features

Features
Proper feature mapping can make non-linear to linear
Using SVM on the feature space { }: only need ( )
Conclusion: no need to design , only need to design
, = ( )

Polynomial kernels
Fix degree and constant :
, = +
What are ()?
Expand the expression to get ()

Polynomial kernels

Figure from Foundations of Machine Learning, by M. Mohri, A. Rostamizadeh, and A. Talwalkar

Figure from Foundations of Machine Learning, by M. Mohri, A. Rostamizadeh, and A. Talwalkar

Gaussian kernels
Fix bandwidth :

/2 2 )

, = exp(
Also called radial basis function (RBF) kernels

What are ()? Consider the un-normalized version


, = exp( / 2 )
Power series expansion:
+

, =
!

Mercers condition for kenerls


Theorem: , has expansion

, = ( )

if and only if for any function (),


, 0
(Omit some math conditions for and )

Constructing new kernels


Kernels are closed under positive scaling, sum, product, pointwise
limit, and composition with a power series + (, )
Example: 1 , , 2 , are kernels, then also is
, = 21 , + 32 ,
Example: 1 , is kernel, then also is
, = exp(1 , )

Kernels v.s. Neural networks

Features

Color Histogram

Extract
features

build
hypothesis
Red

Green

Blue

Features: part of the model


Nonlinear model

build
hypothesis

Linear model

Polynomial kernels

Figure from Foundations of Machine Learning, by M. Mohri, A. Rostamizadeh, and A. Talwalkar

Polynomial kernel SVM as two layer neural network


12

22
21 2

= sign( () + )

21
22

First layer is fixed. If also learn first layer, it becomes two layer neural network

You might also like