Professional Documents
Culture Documents
I2ml3e Chap13
I2ml3e Chap13
INTRODUCTION
TO
MACHNE
LEARNNG
3RD EDTON
ETHEM ALPAYDIN
The MIT Press, 2014
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 13:
KERNEL MACHNES
Kernel Machines
3
if C1
X x , r t where r
t
t t t 1 x
1 if x t
C 2
find w and w0 such that
w T xt w0 1 for r t 1
w T xt w0 1 for r t 1
which can be rewritten as
r t w T xt w0 1
min w subject to r t w T xt w0 1, t
1 2
2
Margin
6
min w subject to r t w T xt w 0 1, t
1 2
2
Lp w t r t w T xt w 0 1
N
1 2
2 t 1
w r w x w 0 t
N N
1 2 t t T t
2 t 1 t 1
Lp N
0 w t r t xt
w t 1
Lp N
0 t r t 0
w 0 t 1
7
Ld w w w T t r t xt w0 t r t t
1 T
2 t t t
w w t
1 T
2 t
r r x x t
1 t s t s t T s
2 t s t
subject to t r t 0 and t 0, t
t
Most t are 0 and only a small number have t >0; they are
the support vectors
8
Soft Margin Hyperplane
9
r t w T x t w0 1 t
Soft error
t
t
New primal is
1
2
2
Lp w C t t t t r t w T x t w0 1 t t t t
10
Hinge Loss
11
0 if y t r t 1
Lhinge (y , r )
t t
1 y t t
r otherwise
n-SVM
12
1 1
min w - n t
2
2 N t
subject to
r t w T x t w 0 t , t 0, 0
Ld r r x x
1 N t s t s t T s
2 t 1 s
subject to
1
t t t
r 0 ,0 t
N t
, t
n
gx w x r x
T t t
x
t T
gx t r t K xt , x
t
Vectorial Kernels
14
Polynomials of degree q:
K x , x x x 1
t T t q
K x, y xT y 1
2
x1y1 x 2 y 2 12
1 2 x1y1 2 x 2 y 2 2 x1 x 2 y1y 2 x12 y12 x 22 y 22
x 1, 2 x1 , 2 x 2 , 2 x1 x 2 , x , x 2
1
2 T
2
Vectorial Kernels
15
Radial-basis functions:
xt x 2
K xt , x exp
2s 2
Defining kernels
16
Kernel engineering
Defining good measures of similarity
String kernels, graph kernels, image kernels, ...
Empirical kernel map: Define a set of templates mi
and score function s(x,mi)
(xt)=[s(xt,m1), s(xt,m2),..., s(xt,mM)]
and
K(x,xt)= (x)T (xt)
Multiple Kernel Learning
17
t s r t r s i K i xt , x s
1
Ld t
t 2 t s i
g(x) t r t i K i xt , x
t i
1-vs-all
Pairwise separation
Error-Correcting Output Codes (section 17.5)
Single multiclass optimization
1 K
min w i C it
2
2 i 1 i t
subject to
w zt T xt w zt 0 w i T xt wi 0 2 it , i z t , it 0
SVM for Regression
19
min w C t t
1 2
2
t
r t w T x w0 t
w x w r
T
0
t
t
t , t 0
20
Kernel Regression
21
2 t
subject to
w T xu w T xv 1 t , t : r u r v , it 0
One-Class Kernel Machines
23
min R 2 C t
t
subject to
x t a R 2 t , t 0
Ld x x r r x x
N
t t T s t s t s t T s
t t 1 s
subject to
0 t C , t 1
t
24
Large Margin Nearest Neighbor
25