Professional Documents
Culture Documents
L THANH TM
L THANH TM
NGI HNG DN KHOA HC: PGS. TS. NGUYN NH THC TS. TRN THI SN
LI CM N
Trc tin, ti xin chn thnh cm n PGS.TS. Nguyn nh Thc v TS. Trn Thi Sn hng dn tn tnh cho ti trong sut thi gian thc hin lun vn. Ti xin cm n GS. Akihiro Sugimoto (National Institute of Informatics, Tokyo, Japan) v TS. Yousun Kang (National Institute of Informatics, Tokyo, Japan) ch dn v cho ti nhng gp qu bu v ni dung lun vn trong thi gian thc tp 6 thng Vin Tin hc Quc gia Nht Bn (National Institute of Informatics, Tokyo, Japan). Ti xin cm n GS. Seiichi Mita (Toyota Technological Institutue, Nagoya, Japan) tn tnh h tr, hng dn v gip ti c nhng kinh nghim thc tin trong qu trnh thc tp 3 thng Hc vin K thut Toyota, Nagoya, Nht Bn (Toyota Technological Institute, Nagoya, Japan). Ti xin cm n GS. D. McAllister (Toyota Technological Institute, Chicago, USA) v GS. L. El Ghaoui (University of California, Bekerley, USA) tn tnh ging dy cho ti nhng nn tng c bn v my hc, ti u v th gic my tnh. Ti xin cm n ThS. Trn L Hng D v nghin cu sinh M. Kloft (University of California, Bekerley, USA) trao i, tho lun v truyn t nhng kinh nghim qu bu trong qu trnh thc nghim ti. Ti cng xin gi li cm n qu thy c, anh ch v bn b trong khoa Cng ngh thng tin, Trng i Hc Khoa Hc T Nhin TP.HCM, nhng ngi gip cng nh cung cp cho ti nhng kin thc, kinh nghim. Con xin cm n ba m v gia nh lun yu thng, h tr con trong sut thi gian hc tp, gip con c thm t tin thc hin tt cng vic. Xin chn thnh cm n! Ngi thc hin L Thanh Tm
MC LC
LI CM N .............................................................................................................1 MC LC ...................................................................................................................2 Danh mc cc k hiu v ch vit tt ..........................................................................5 Danh mc cc bng .....................................................................................................6 Danh mc cc hnh v, th ......................................................................................7 M U .....................................................................................................................8 Chng 1 1.1 1.2 Gii thiu ................................................................................................9 Mc tiu ..................................................................................................9 ng gp ca lun vn ...........................................................................9 Xy dng kernel cho thut ton SVM .............................................9 p dng kernel xy dng cho bi ton phn loi nh....................10
Cc ng gp khc lin quan ................................................................11 Cu trc ca lun vn............................................................................11 Thut ton phn lp da trn SVM ......................................................13 Hc vi mt kernel Support Vector Machine (SVM) .......................13 Thut ton phn lp SVM ..............................................................13 Kernel trong thut ton phn lp SVM ..........................................15
2.1.1 2.1.2
2.1.2.1 o tng ng s dng kernel .............................................15 2.1.2.2 Kernel xc nh dng (Positive Definite Kernel) ...................16 2.1.2.3 Xy dng khng gian ti sinh kernel Hibert (Reproducting Kernel Hibert Space RKHS) ......................................................................17 2.2 Hc vi nhiu kernel Multiple Kernel Learning (MKL) ...................19 SILP ...............................................................................................20
2.2.1
2.2.2 Chng 3 3.1 3.2 3.3 Chng 4 4.1 4.2 Chng 5 5.1
SimpleMKL ...................................................................................22
Phng php kernel ..............................................................................24 M hnh ti c trng (Bag-of-feature model BoF) ..........................25 Cc ci tin ca m hnh BoF ...............................................................26 Phng php biu din tha (Sparse Coding) ......................................28 Hierarchical Spatial Matching Kernel ..................................................30 Kernel thp khng gian (Spatial Pyramid Matching Kernel SPMK) 30 Kernel xut: Hierarchical Spatial Matching Kernel ........................31 Thc nghim .........................................................................................36 Phn loi nh (Image categorization) ...................................................36 Gii thiu bi ton phn loi nh ...................................................36 ng dng ca phn loi nh ..........................................................37 Nhng thch thc ca bi ton phn loi nh ................................38 Cc hng tip cn........................................................................38
5.1.4.1 Hng tip cn da trn c trng............................................39 5.1.4.2 Hng tip cn da trn phng php hc ...............................39 5.2 Thc nghim .........................................................................................41 Phn loi i tng ........................................................................42
5.2.1
5.2.1.1 C s d liu Oxford Flowers:..................................................42 5.2.1.2 C s d liu CALTECH:.........................................................44 5.2.2 5.2.3 Phn loi cnh (scene categorization) ............................................48 Th nghim Sparse Coding cho Hierarchical Spatial Matching
Kernel (ScHSMK) .............................................................................................50 5.2.3.1 ScHSMK trn c s d liu Oxford Flower .............................50
5.2.3.2 ScHSMK trn c s d liu CALTECH-101 ...........................51 Kt lun v kin ngh ................................................................................................53 Kt lun............................................................................................................53 Kin ngh .........................................................................................................54 Danh mc cng trnh ca tc gi ...............................................................................55 Ti liu tham kho .....................................................................................................56
Danh mc cc bng
Bng 5.1: Bng so snh chnh xc phn lp (%) khi s dng mt c trng trn c s d liu Oxford Flower (vi NN k hiu cho thut ton phn lp lng ging gn nht: Nearest Neighbour) .........................................................................42 Bng 5.2: Bng so snh chnh xc phn lp (%) gia HSMK v SPMK trn c s d liu Oxford Flower .....................................................................................44 Bng 5.3: Bng so snh kt qu phn lp trn c s d liu CALTECH-101 ....45 Bng 5.4: Bng so snh chnh xc phn lp ca HSMK v SPMK trn c s d liu CALTECH-101 .............................................................................................46 Bng 5.5: Bng so snh kt qu phn lp trn c s d liu CALTECH-256 ....48 Bng 5.6: Bng so snh kt qu phn lp trn c s d liu MIT Scene (8 lp) 48 Bng 5.7: Bng so snh kt qu phn lp trn c s d liu MIT Scene............50 Bng 5.8: Bng so snh kt qu phn lp s dng Sparse Coding so vi s dng vector quantization (Kmeans) trn Oxford Flower ...................................................51 Bng 5.9: Bng so snh kt qu phn lp s dng Sparse Coding so vi s dng vector quantization (Kmeans) trn CALTECH-101 .................................................52
Danh mc cc hnh v, th
Hnh 1: M hnh tng qut cho phng php kernel ..........................................24 Hnh 2: Minh ha kernel HSMK c p dng trn nh X v Y vi L=2 v R=2 (a). u tin, HSMK chia nh ra thnh 2l x 2l cc vng con vi l=0, 1, 2 nh SPMK (b). Tuy nhin, HSMK s dng m hnh coarse-to-fine cho mi vng con bng cch tnh ton tng ng trn mt chui cc resolution khc nhau 2-r x 2-r vi r = 0, 1, 2 (c). Cng thc (4.8) m vector trng s c tnh t MKL vi kernel c bn c phn b ng nht c s dng xp x so khp ti u gia cc vng con thay v s dng m hnh BoF nh trong SPMK .......................................................32 Hnh 3: M hnh mi lin h gia cc thnh phn (Pictorial) .............................40 Hnh 4: Minh ha c s d liu Oxford Flower (17 lp) ....................................44 Hnh 5: Minh ha c s d liu CALTECH-101 ................................................45 Hnh 6: Minh ha c s d liu CALTECH-256 ................................................48 Hnh 7: Minh ha c s d liu MIT-Scene (8 lp) ............................................50
M U
Vi s bng n ca d liu nh, vic phn loi cc nh ra thnh cc lp ng ngha l mt trong nhng nhu cu c bn cho vic qun l v truy vn nh da trn ni dung ca nh. Thm na, phn loi nh l mt trong nhng bi ton c bn trong lnh vc th gic my tnh v ng dng my hc v nhn c s quan tm ca nhiu nh khoa hc trn th gii. Bi ton phn loi nh c rt nhiu thch thc t vic nh c chp di nhiu gc khc nhau, iu kin chiu sng khc nhau, s a dng cc th hin ca cng mt lp ng ngha cng nh s phc tp ca thng tin nn trong nh. gii quyt bi ton phn loi nh th c hai hng tip cn chnh l da trn c trng hoc da trn phng php hc. Trong , hng tip cn da trn phng php hc m c bit l nhnh tip cn da trn phng php kernel l mt trong nhng phng php c p dng rt rng ri v mang li kt qu cao trong bi ton phn loi nh ni ring v trong lnh vc th gic my tnh ni chung, do tnh mm do khi m t nh trong nhng iu kin phc tp nh trn. Do vy, trong lun vn ny, ti xut kernel mi, t tn l Hierarchical Spatial Matching Kernel (HSMK) v p dng cho bi ton phn loi nh. HSMK l m hnh ci tin t m hnh Spatial Pyramid Maching (SPM), nhng thay v s dng m hnh Bag-of-Feature (BoF) m hnh cho cc vng con (subregions), HSMK s dng m hnh th mn (coarse to fine C2F) cho cc vng con m c hin thc ha bng phng php multiresolution (tm dch nhiu loi phn gii), tc xem xt vng con trn mt chui cc phn gii (resolution) khc nhau, do vy, n c th miu t c thng tin tng qut ca vng con t nhng phn gii th, cng nh nhng thng tin chi tit ca vng con nhng phn gii mn hn nh cch thc xem xt mt vng trn bn , c th t c o tng ng tt hn trn cc vng con ny. T th nghim cho thy, kernel xut - HSMK cho hiu qu rt tt cho bi ton phn loi nh v t c kt qu ti u (state-of-theart) trn nhiu c s d liu chun cho bi ton phn loi nh.
Chng 1
Gii thiu
1.1 Mc tiu
Trong lun vn ny, ti nghin cu vic xy dng kernel cho thut ton phn lp trong lnh vc my hc, c th l thut ton phn lp Support Vector Machine (SVM). SVM thc hin vic phn lp bng cch tm siu phng (hyperplane) m cho php cc i ha khong cch bin (maximize margins). Trong khi , kernel ca SVM dng o tng ng gia cc mu hc, vic ny ng gp ln vo hiu qu phn lp ca thut ton SVM. Thm na, SVM l thut ton phn lp hiu qu v c s dng rt rng ri trong nhiu lnh vc, c bit trong lnh vc th gic my tnh. T kernel tuyn tnh (linear kernel) m s dng hm tng quan (correlation), hay tch ni (inner product) tnh tng ng trong vic phn chia lp thi gian u khi thut ton SVM c xut. Cc nh nghin cu nhn thy rng, d liu ngy cng phong ph v a dng, vic ny i hi cn phi s dng cc kernel phi tuyn (non-linear kernel) c th tm c siu phng hiu qu hn. Do vy, nghin cu xy dng kernel l mt trong nhng ch c nhiu nh nghin cu trn th gii quan tm. nh gi s hiu qu ca kernel xut, ti p dng kernel xut vo bi ton phn loi nh trong lnh vc th gic my tnh. Trong , bi ton phn loi i tng v phn loi cnh l hai th hin c th ca bi ton phn loi nh c thc nghim da trn vic p dng kernel xut phn lp.
1.2 ng gp ca lun vn
1.2.1 Xy dng kernel cho thut ton SVM
Lun vn xut Hierarchical Spatial Matching Kernel (HSMK), tm dch kernel so khp c tnh khng gian v phn cp. HSMK l s ci tin ca Sptial Pyramid Matching Kernel SPMK (tm dch kernel so khp dng thp) da trn m hnh th mn (coarse to fine C2F). SPMK c xut bi Lazebnik v cc
10
ng s [19] thc hin vic chia nh trn mt chui cc li c kch thc khc nhau thnh cc vng con (subregions), sau p dng m hnh ti c trng (Bag of features BoF) [6] m hnh cho cc vng con ny. Kernel xut - HSMK cng thc hin vic chia nh da trn mt chui cc li c kch thc khc nhau nh trong SPMK, nhng thay v s dng m hnh BoF m c bit hn ch trong vic m hnh vng c th o c tng ng ti u, HSMK s dng m hnh C2F c th xem xt vng trn nhiu kch c khc nhau, vic ny c th cho php HSMK t c s xp x tng ng ti u tt hn khi s dng BoF nh trong SPMK. HSMK c ti v cc ng s cng b trong bi bo Hiearchical Spatial Matching Kernel for Image Categorization hi ngh quc t v phn tch v nhn dng nh (International Conference on Image Analysis and Recognition ICIAR) Burnaby, British Columbia, Canada vo nm 2011.
11
kernel learning MKL), ti u tuyn tnh kt hp boosting (linear programming boosting LP-B).
12
chng 5 trnh by vic p dng HSMK vo bi ton phn loi nh m c th l bi ton phn loi i tng v bi ton phn loi cnh trn nhng c s d liu chun nh: Oxford Flower, CALTECH-101, CALTECH-256, MIT Scene v UIUC Scene.
13
Chng 2
Trong chng ny, ti trnh by khi qut l thuyt phn lp ca thut ton Support Vector Machine (SVM). Ti cng nhc li l thuyt kernel p dng cho thut ton SVM trong chng ny. Cui cng l mt hng nghin cu ang c cng ng nghin cu my hc rt quan tm l vic hc vi nhiu kernel cho SVM, hay c bit vi tn gi bi ton Multiple Kernel Learning (MKL).
yi {1, +1} . Dtrain = ( X , Y ) c gi l tp d liu hun luyn cho thut ton phn
lp SVM. B phn lp tuyn tnh c m hnh nh sau:
y ( x) = sign( wT x + b)
Vi w
n
(2.1)
l vector trng s v b
T w xk + b +1 T w xk + b 1
Ta c th kt hp tp iu kin (2.2) thnh:
yk = +1 yk = 1
(2.2)
yk ( wT xk + b) 1
k = 1, , N
(2.3)
14
Vi iu kin rng buc nh trong (2.3), vi cc tp d liu khng th phn tch c trn tt c cc mu hc th li gii cho thut ton phn lp SVM l rng, iu ny rt d xy ra trong thc t, do d liu hun luyn lun c nhiu. gii quyt cho trng hp ny, Cortes v Vapnik [3] thay i cng thc (2.3) thnh:
yk ( wT xk + b) 1 k
k = 1, , N
(2.4)
Vi bin slack k > 0 gii quyt cho trng hp mt s mu trong tp d liu hun luyn vi phm iu kin phn lp. Ta c th thy nhng mu c k > 1 l nhng mu vi phm iu kin phn lp so vi rng buc trong (2.3). Cng thc ti u dng nguyn thy (primal problem) theo khng gian trng s ca SVM c dng nh sau:
N 1 T w w + C k 2 k =1
min J P ( w, ) =
w,b ,
s.t.
yk ( wT xk + b) 1 k ,
k = 1,..., N
k 0,
k = 1,..., N
(2.5)
Vi C l mt s nguyn dng, c s dng iu khin gia vic ti u hm mc tiu v nhng mu vi phm rng buc phn lp ca SVM trong (2.3). T (2.5), ta c biu thc Lagrangian tng ng l:
N N
L( w, b, ; , v) = J p ( w, ) k ( yk ( w xk + b) 1 + k ) vk k (2.6)
T k =1 k =1
Vi cc h s Lagrangian
max min L( w, b, ; , v)
,v
w,b ,
(2.7)
15
L =0 w L =0 b L =0 k
w = k yk xk
k =1
k =1
yk = 0
(2.8)
0 k C , k = 1,..., N
Thay (2.8) vo (2.6) ta c bi ton i ngu dng ti u bc hai (Dual Quadratic Programming) cho bi ton SVM nh sau:
N 1 N T max J D ( ) = yk yl xk xl k l + k 2 k ,l =1 k =1
s.t.
k =1
yk = 0 k = 1,..., N
(2.9)
0 k C,
Do biu thc (2.9) l dng bi ton ti u bc hai (Quadratic Programming), do vy c th s dng cc b gii ti u (optimization solvers) tm li gii.
m rng kh nng phn lp ca thut ton SVM, thay v s dng hm tch ni (inner product) o tng ng gia 2 mu xi , x j trong khng gian d liu nhp hun luyn, khi nim kernel c a ra. u tin, d liu nhp s c chuyn sang khng gian H bng hm nh x nh sau:
: X H,
tng ng trong khng gian H, k hiu .,. ng nh sau:
H
( x)
(2.10)
tnh tng ng gia cc mu hc trong H, ta c th s dng hm tch ni . tin li, ta nh ngha hm tng
k:XX ,
m tha iu kin:
( x, x ')
k ( x, x ')
(2.11)
16
k ( x, x ') = ( x ), ( x ')
x, x ' X
(2.12)
Hm c nh ngha nh (2.12) thuc lp kernel xc nh dng (Positive Definite Kernel). iu ny cho php thut ton SVM, khi tnh tch ni c th s dng bt k hm kernel xc nh dng thay th cho ( x ), ( x ')
H
khi tnh
ton cho kernel k ( x, x ') . K thut ny c bit vi tn gi mo kernel (kernel trick). iu ny dn ti vi hm kernel xc nh dng, ta khng cn bit dng tng minh ca dng hm chuyn khng gian t khng gian d liu nhp vo khng gian H, m iu ny c nh ngha khng tng minh thng qua hm kernel. lm r hm kernel xc nh dng, ti nhc li mt s nh ngha sau: nh ngha 1 (ma trn Gram) Cho mt kernel K : X X v dy d liu x1 ,..., xn X . Ta gi ma trn
Ki j = k ( xi , x j )
l ma trn Gram hay ma trn kernel k cho dy d liu x1 ,..., xn . nh ngha 2 (ma trn xc nh dng)
(2.13)
i , j =1
cc K
i j
ij
(2.14)
Vi du bng trong (2.14) xy ra khi c1 = ... = cn = 0 , khi ma trn c gi l xc nh dng ngt (strictly positive definite). nh ngha 3 (Kernel xc nh dng) Nu n v x1 ,..., xn X , ma trn Gram K ij = k ( xi , x j ) l xc nh dng th ta gi kernel l kernel xc nh dng.
17
Trong phng php hc SVM vi kernel, ta c nh quan trng sau: nh kernel Mt hm k : X X l mt kernel xc nh dng khi v ch khi tn ti mt khng gian Hilbert H v mt hm nh x : X H tha iu kin x, x ' X , ta c k ( x, x ') = ( x ), ( x ') Chng minh : Gi s kernel c vit di dng (2.12), ta c:
2
i , j =1
cc
i
( xi ), ( x j )
c ( x ), c ( x )
i =1 i i j =1 j j H
c ( x )
i =1 i i
0. (2.15)
H
c trnh by trong 2.1.2.3, tc xy dng khng gian Hilbert v hm nh x cng nh cc tnh cht mong mun t kernel xc nh dng. 2.1.2.3 Xy dng khng gian ti sinh kernel Hibert (Reproducting
Kernel Hibert Space RKHS) Trong phn ny, ti trnh by cch xy dng khng gian Hilbert m mi phn t ca khng gian l mt hm kernel xc nh dng. Cho kernel k, ta thnh lp tp hp F nh sau:
n F = f (.) = i k (., xi ); i =1
Vi k (., x) : X
n , i , xi X
(2.16)
f (.) = i k (., xi )
i =1
(2.17)
f,g
:= i j k ( xi , x ' j )
i =1 j =1
n'
(2.18)
18
Vi ghi ch rng, chng ta c th s dng tnh cht i xng ca kernel vit li nh sau:
n' n
j f (x ' j ) = f , g
j =1
= i g ( xi )
i =1
(2.19)
f, f
i , j =1
k(x , x ) 0
i j i j
(2.20)
c1 ,..., c p
, ta c:
i , j =1
cc
i
fi , f j
c f , c
i =1 i i j =1
fj
F
(2.21)
Do vy, .,.
Thm na, khi g (.) = k (., x ) th theo nh ngha tch ni trong (2.18), ta c:
f , k (., x )
= i k ( xi , x) = f ( x),
i =1
x X
(2.22)
= k ( x, x ')
(2.23)
Tnh cht ny c bit vi tn gi tnh cht ti sinh (reproducing property) ca kernel. Tc mt hm f c th c biu din nh mt hm tuyn tnh c nh ngha bng tch ni trong khng gian vector ca tp hm F (nh biu thc (2.22)). chng minh tnh xc nh (definiteness property) ca tch ni, ti nhc li bt ng thc Cauchy-Schwarz:
nh : bt ng thc Cauchy-Schwarz
Nu k l kernel xc nh dng v x1 , x2 X th ta c:
k ( x1 , x2 ) 2 k ( x1 , x1 ) k ( x2 , x2 )
Chng minh
(2.24)
19
0 det( K ) = k ( x1 , x1 )k ( x2 , x2 ) k ( x1 , x2 ) 2
(reproducing property) trong (2.22), ta c:
(2.25)
f ( x ) = k (., x ), f
iu ny chng minh rng: f , f
F
k ( x, x ). f , f
(2.26)
=0 f =0.
Do tnh cht (2.22), nn khng gian vector c thnh lp trn tp F c gi l khng gian ti sinh kernel Hilbert (reproducing kernel Hibert space RKHS) t kernel xc nh dng k. Thm na, RKHS xc nh duy nht kernel xc nh dng k v ngc li, iu ny c trnh by trong nh l Moore-Aronszajn [1].
nh l Moore-Aronszajn [1]
i vi mi kernel xc nh dng k, th tn ti duy nht khng gian ti sinh kernel Hibert H m kernel ca n l k v ngc li. Chi tit hn v vic xy dng khng gian ti sinh kernel Hilbert xin tham kho [1].
20
Gi l s lng kernel dng cho vic hc SVM, cc kernel xc nh dng dng cho vic hc tng ng l k1, k2, k , th kt hp tuyn tnh kernel cho vic hc SVM c biu din l:
k (.,.) = k (.,.)
=1
s.t.
Vi l h s kt hp tuyn tnh ca kernel xc nh dng k (.,.) . Dng nguyn thy (Primal) cho bi ton hc MKL nh sau:
N 1 2 min ( w ' 2 ) + C i w ' , w0 , , 2 =1 i =1
= 1, 0
=1
= 1,...,
(2.27)
s.t. yi ( w ' , ( xi ) + w0 ) 1 i
=1
(2.28)
= 1, 0, 0,
=1
k (.,.) vo khng gian RKHS tng ng. Trong cng thc (2.28), do ta c lng
nhn gia hai bin primal l v w ' nn cng thc l khng li. Bng cch t
2.2.1 SILP
Trong [30], Sonnenburg chuyn (2.28) v dng bi ton SILP (semi-infinitive linear program), hay l bi ton ti u tuyn tnh hm mc tiu vi s lng rng buc l v tn. Dng SILP ca bi ton MKL c Sonnenburg a ra nh sau:
21
max
,
s.t.
p =1
= 1, p 0 S p ( ) ,
p = 1,..., Z
(2.29)
p =1
n Z = n 0 i C , i yi = 0 i =1 Vi hm S p ( ) c nh ngha nh sau:
n 1 n n S p ( ) = i j yi y j k p ( xi , x j ) i 2 i =1 j =1 i =1
(2.30)
y l bi ton ti u tuyn tnh (linear program) do , ch c rng buc bi iu kin tuyn tnh. Nhng do c tha khi tm li gii ti u.
Thut ton SILP cho MKL Nhp: Tham s C>0 (parameter regularization), tp kernel, tp d liu hun
luyn.
Xut: Cc tham s , b,
1 1: Khi to trng s cho cc kernel: p , p = 1,...,
0 0 2: ( , b ) T vic gii SVM vi
3: t 0 (gn t = 0, t dng xc nh s vng lp ca thut ton SILP) 4: while (iu kin dng cha tha) do 5: 6:
t +1
p S p ( ) vi ,
p =1
22
7: 8: 9: 10:
if
p =1
t p
S p ( t +1 ) t then
break end if
t t +1
11: end while Chi tit v thut ton SILP, xin tham kho thm trong [30]. Thut ton SILP c ci t v cng b trong Shogun Toolbox a ch: http://www.shoguntoolbox.org
2.2.2 SimpleMKL
Trong [28], Rakotomamonjy chuyn (2.28) v dng ti u nh sau:
min
s.t.
1 1 g ( ) = min{ vp 2 p =1 p
+ C L( yi , v p , p ( xi )
i =1 p =1 {v|v p = p w p },b
Hp
+ b)}
= 1, 0
=1
(2.31)
Rakotomamonjy s dng phng php gii gradient descent (tm dch l gim dn theo hng xc nh bi hng o hm ca hm mc tiu theo bin ti u) gii cho dng ti u MKL c miu t trong (2.31). c th thc hin, Rakotomamonjy a ra cng thc tnh o hm ca tham s SVM theo vector trng s ca kernel nh sau:
g 1 n n = i* * yi y j k p ( xi , x j ) j p 2 i =1 j =1
cc im ny tha:
(2.32)
( *) = 0 p
b (b*) = 0 p
(2.33)
23
Thut ton SimpleMKL cho MKL Nhp: Tham s C>0 (parameter regularization), tp kernel, tp d liu hun
luyn.
Xut: Cc tham s , b,
1 1: Khi to trng s cho cc kernel: p , p = 1,...,
2: while (iu kin dng khng tha) do 3: 4: descent. 5: end while Chi tit v thut ton SimpleMKL, xin tham kho thm trong [28]. Thut ton SimpleMKL c ci t v cng b trong SimpleMKL Toolbox a ch: http://asi.insa-rouen.fr/enseignants/~arakotom/code/mklindex.html.
24
Chng 3
Trong cc hng tip cn cho bi ton phn loi nh, phng php kernel l mt trong nhng phng php hiu qu v c p dng ph bin. Phng php kernel s dng hm phi tuyn (non-linear) tnh tng ng ca hai mu thay v s dng tch ni (inner product) hm tuyn tnh (linear) cc i khong cch bin (maximize margine) thng qua vic gii bi ton ti u bc hai (quadratic optimization), tiu biu l thut ton hc gim st Support Vector Machines (SVM). Phng php kernel thng bao gm cc bc sau: t nh u vo, thc hin rt trch cc c trng, c th l c trng cc b hay c trng ton cc, tp hp nhng c trng ny i din cho nh, k tip tp c trng s c biu din theo mt phng php biu din nh thch hp c th p dng hm tnh tng ng, kt qu ny c a vo b phn lp SVM xc nh mu thuc phn lp no nh c tm tt trong Hnh 1.
nh u vo
Rt trch c trng
B phn lp SVM
Trong , lun vn tp trung nghin cu phn biu din nh (xy dng kernel) cho bi ton phn loi nh.
25
26
Nh vy, t tp c trng ca nh, ta chuyn thnh tp nhng t vng i din cho nh. Sau , ta ly histogram ca t vng ghi nhn tng s ln xut hin ca cc t vng trong nh. T thc nghim cho thy t c tng ng gia hai nh tt, th phi p dng vi kernel phi tuyn nh intersection kernel hoc phn lp s dng thut ton SVM.
27
Trong cc ci tin t m hnh BoF th SPM mang li hiu qu cao v n gin khi thc hin. Do vy, SPM c s dng nh mt thnh phn chnh trong nhiu h thng t kt qu tt nht (state-of-the-art) trong lnh vc phn loi nh [12]. Cng nh m hnh BoF, th SPM s mang li hiu qu tt nht khi c s dng vi kernel phi tuyn nh intersection kernel hoc
kernel phi tuyn ny c phc tp tnh ton cao cng nh khng gian lu tr ln so vi phng php tuyn tnh. gii quyt vn ny, Maji v cc ng s [23] a ra mt phng php tnh ton xp x nng cao hiu qu xy dng histogram intersection kernel, gim phc tp tnh ton, nhng hiu qu trn ch t c bng cch s dng bng ph c tnh ton trc, m c xem nh mt loi tnh ton trc cho hun luyn SVM phi tuyn. x l cho d liu ln, Yang v cc ng s [36] a ra m hnh tuyn tnh SPM vi biu din tha (spare coding) (ScSPM) trong tch ni (kernel tuyn tnh) c s dng thay v kernel phi tuyn da trn tnh cht tuyn tnh ca d liu tha. Wang & Wang [34] xut m hnh hc trn nhiu kch c (multiscale learning - MSL) bng cch s dng multiple kernel learning (MKL) xc nh cc h s cho m hnh SPM thay v s dng h s c xc nh trc ca m hnh SPM nguyn thy. Trong lun vn ny, ti xut hm kernel mi da trn hng tip cn ca m hnh th mn (coarse to fine C2F) cho cc vng con (subregion) trong m hnh SPM, v t tn l Hierarchical Spatial Matching Kernel (HSMK). M hnh C2F gip cho vng con c xem xt nhiu mc khc nhau, c th hnh tng nh khi xem bn , mc th cho php quan st ton cnh, thm na, mc mn th cho php quan st cc chi tit. Do vy, HSMK khng ch gip m t thng tin th t khng gian ca c trng cc b m cn c th o chnh xc tng ng gia cc tp hp ca c trng cc b khng th t ly t cc vng con. Trong HSMK, vic p dng m hnh C2F trn cc vng con c hin thc ha bng cch s dng nhiu phn gii (multi-resolution). Do vy, c trng cc b c th miu t thng tin chi tit ca nh hoc i tng t vng con phn gii mn (fine resolution) v c thng tin ton cc ca vng con phn gii th hn. Thm na, vic so
28
khp da trn m hnh C2F l qu trnh phn cp (hierarchical), iu ny c ngha l c trng m khng tm c s so khp phn gii mn c kh nng c so khp phn gii th hn. Nh vy, kernel c xut c th t c s xp x so khp ti u (optimal matching) tt hn gia cc vng con so vi SPM. Tm li, HSMK ch trng vo vic ci thin o tng ng gia cc vng con bng cch s dng m hnh C2F, c hin thc ha bng cch s dng nhiu phn gii (multi-resolution), thay v s dng m hnh BoF trn cc vng con nh trong SPM. Vic xem xt vng con bng cch s dng mt chui cc phn gii (resolution) khc nhau tng t nh trong kernel so khp dng thp (pyramid matching kernel) [13], nhng thay v s dng vector trng s c nh ngha trc cho cc intersection kernel c bn trn cc vng con cho vic kt hp trn nhiu phn gii (resolution) khc nhau, ti chuyn bi ton v dng hc trn nhiu kernel c phn b ng nht (uniform multiple kernel learning uniform MKL) tm vector trng s hiu qu hn. u im ca HSMK l n c th c dng trn tp hp cc c trng khng th t c s phn t khc nhau bng cch p dng chun ha cn bc hai theo ng cho (square root diagonal normalization) [28] cho cc intersection kernel c bn trn vng con m iu ny khng c xem xt trong PMK [13].
min min xm vk
V m =1 k =1.. K
(3.2)
Vi V = {v1, v2, , vK} l tp hp K cluster tm c bi phng php Kmeans (hay cn gi l codebook hay t in) v || . || l L2-norm ca vector. Hay ta c th din t l vi vector xm ta tm vector vk (codeword t vng) tng ng trong t in bng cch tm vector vk sao cho khong cch t xm ti vk l ngn nht (thng thng th hm Euclide c s dng tnh khong cch). Ta c th chuyn cng
29
thc ti u (optimization) (3.2) v bi ton phn tch ma trn (matrix factorization) nh sau: Ta gi uj l vector xc nh t vng ca xj trong t in (codebook), vi
min xm umV
U ,V m =1
(3.3)
Tha:
Card ( u m ) = 1, u m = 1, u m 0, m,
Vi | . | l L1-norm ca vector v vector u 0 c dng k hiu vector khng m tc cc thnh phn ca vector u khng m. Rng buc Card(um) = 1 l rng buc mnh, ta lm yu rng buc ny bng cch thay bng L1-norm regularization ca um. iu ny lm cho kt qu um khi ti u (optimization) s c rt t thnh phn khc 0 trong um. Cng thc ti u (3.3) tr thnh:
M
min xm umV
U ,V m =1
+ um
(3.4)
Tha:
vk 1, k .
iu kin L2-norm trn vk trnh li gii bt k. Bi v trong cng thc ca hm mc tiu (objective function) trong cng thc ti u (3.4), ta c lng um.V, do vy ta c th gim um xung t ln v tng V ln t ln th ta c th gim hm mc tiu v bi ton ti u nh trong cng thc (3.4) cn c gi l biu din tha (sparse coding).
30
Chng 4
Trong chng ny, u tin ti m t cng thc gc ca SPM lm c s cho vic gii thiu kernel mi HSMK m s dng m hnh th mn (coarse to fine C2F) trn cc vng con nh l mt c s ci tin hiu qu so vi kernel SPM.
4.1 Kernel thp khng gian (Spatial Pyramid Matching Kernel SPMK)
Mi nh c i din l tp hp cc vector c trng trong khng gian d chiu. Cc c trng c lng t ha (quantized) thnh cc thnh phn ri rc c gi l t hnh nh (visual words) bng cch s dng thut ton hc phn cm khng gim st K-means (unsupervised clustering algorithm Kmeans) hoc bng phng php biu din tha (sparse coding). Vic so khp gia cc c trng cc b chuyn thnh vic so khp trn min ri rc ca cc t hnh nh (visual words) tng ng. iu ny c ngha l cc t hnh nh c so khp khi chng ging nhau v khng so khp khi chng khng ging nhau. SPM thc hin trn mt chui cc t l (scale) khc nhau vi l = 0, 1, 2, ,L ca nh u vo. Trn mi t l co dn, n chia nh thnh 2l x 2l cc vng con theo chiu dc v chiu ngang ca nh v p dng m hnh BoF o tng ng gia cc vng con ny. Gi X, Y l hai tp hp cha cc vector trong khng gian D chiu. S tng ng gia hai tp hp trn t l l l tng tng ng gia tt c cc vng con tng ng ca nh t l .
K l ( X , Y ) = I ( X il , Yi l ),
i =1
22 l
(4.1)
Vi
X il v Yi l c nh ngha nh sau:
31
I ( X , Yi ) = min( H X l ( j ), HY l ( j )),
l i l j =1
i i
(4.2)
Vi V l tng s t hnh nh c dng trong b t in xy dng, H ( j ) l s ln xut hin ca t hnh nh th j m thu c t vic lng t ha ca c trng cc b trong tp . Cui cng, kernel SPM c tnh l tng c trng s c nh ngha trc ca tng ng trn chui cc t l c p dng:
L 1 1 K ( X , Y ) = L K 0 ( X , Y ) + L l +1 K l ( X , Y ). 2 l =1 2
(4.3)
Trng s
1 2
L l +1
con c nh ngha t l tng ng. Trng s ny c dng b cho vic so khp trn nhiu vng c t l khc nhau bi v c trng cc b d tm thy s so khp nhng vng rng ln hn. Thm na, nhng cp c trng c so khp t l l cng xut hin t l mn hn (l ) ca resolution vi > 0 .
th i t l l ca tp hp vector trong nh X, Y. mi phn gii (resolution) r, ti p dng intersection kernel F r c chun ha bng phng php chun ha cn bc hai theo ng cho (square root diagonal normalization) o tng ng gia chng nh sau:
F ( X , Yi ) =
r l i l
I ( X il (r ), Yi l (r )) I ( X (r ), X (r )) I (Yi (r ), Yi (r ))
l i l i l l
(4.4)
32
Vi
Thm na, histogram intersection ca mt tp hp bt k v chnh n bng vi s phn t ca tp hp (cardinality). Do vy, gi phn t ca tp hp
X l ( r )
i
Y l ( r )
i
ln lt l s
I ( X il (r ), Yi l (r )) F ( X , Yi ) = . X l ( r )Y l ( r )
r l i l
i i
(4.5)
Vic chun ha cn bc hai theo ng cho cho intersection kernel khng ch duy tr vic tha iu kin Mercer v kernel m cn gip b cho vic khc nhau v s lng phn t gia cc tp hp nh th hin trong cng thc (4.5). t c tng ng tng hp ca m hnh C2F, ti nh ngha s kt hp tuyn tnh (linear combination) trn mt chui cc kernel cc b, vi mi thnh phn kernel cc b c tnh ton theo cng thc (4.5) mi phn gii. T , hm kernel F gia tp hp
X il v tp hp Yi l
F ( X , Yi ) = r F r ( X il , Yi l )
r =0
(4.6)
Vi:
r =0
= 1, r 0, r = 0,1, 2,..., R.
Hnh 2: Minh ha kernel HSMK c p dng trn nh X v Y vi L=2 v R=2 (a). u tin, HSMK chia nh ra thnh 2l x 2l cc vng con vi l=0, 1, 2
33
nh SPMK (b). Tuy nhin, HSMK s dng m hnh coarse-to-fine cho mi vng con bng cch tnh ton tng ng trn mt chui cc resolution khc nhau 2-r x 2-r vi r = 0, 1, 2 (c). Cng thc (4.8) m vector trng s c tnh t MKL vi kernel c bn c phn b ng nht c s dng xp x so khp ti u gia cc vng con thay v s dng m hnh BoF nh trong SPMK
Hn na, khi cc kernel cc b c kt hp tuyn tnh a vo thut ton SVM, ta c th chuyn bi ton v dng MKL, trong cc kernel c bn c nh ngha nh trong cng thc (4.5) trn cc phn gii khc nhau ca cc vng con nh sau:
N 1 ( w 2 ) 2 + C i min w , w0 , , 2 =1 i =1
s.t.
yi ( w , ( xi ) + w0 ) 1 i
=1
(4.7)
= 1, 0, 0,
=1
Vi xi l mu nh hc, yi l nhn lp tng ng ca xi, N l s mu dng cho vic hc, ( w , w0 , ) l cc tham s ca SVM, C l tham s xc nh bin mm (soft margin) c nh ngha trc b cho li ca mu hc trong thut ton SMV, l vector trng s cho cc kernel cc b c bn, l s lng kernel cc b c bn ca cc vng con trn mt chui cc resolution khc nhau, bt k thnh phn no ca vector u khng m (hay vector
0 ngha l
cn c gi l
vector khng m), ( x ) l hm chuyn vector x vo khng gian ti sinh kernel Hibert (RKHS) v .,. k hiu cho tch ni (inner product). MKL tm cc tham s cho SVM ng thi xc nh vector trng s cho cc kernel cc b c bn. Thm na, nhng kernel cc b c bn ny c nh ngha trn nhiu phn gii khc nhau ca cng vng con. Do vy, trng lp thng tin gia chng l cao. T th nghim ca Gehler v Nowozin [12] v c bit l ca Kloft v cc ng s [16] chng t rng MKL vi kernel c bn c phn b ng nht - phng php xp x
34
chuyn bi ton MKL v dng SVM truyn thng vi kernel phi tuyn l phng php hiu qu nht xt trn kha cnh ca chnh xc cng nh thi gian tnh ton. Do vy, cng thc (4.6) vi cc h s kt hp tuyn tnh t c t MKL vi kernel c bn phn b ng nht tr thnh (Uniform MKL):
1 R r l l F ( X , Yi ) = F ( X i , Yi ). R + 1 r =0
l i l
(4.8)
Hnh 2 minh ha cch thc hin ca HSMK vi L=2 v R=2. HSMK cng thc hin vic xem xt vng con trn mt chui cc phn gii khc nhau nh trong PMK c th t c o tng ng tt hn. Tuy nhin, HSMK tnh ton vector trng s da trn MKL vi kernel c bn phn b ng nht, do vy t c hiu qu hn, cng nh c th gii thch v mt l thuyt thay v dng vector trng s nh ngha trc nh trong PMK. Thm na vic p dng chun ha kernel c bn bng phng php chun ha cn bc hai theo ng cho gip cho HSMK thc hin tt trn cc tp vector c s phn t khc nhau m khng c xem xt trong PMK. HSMK c nh ngha da trn vic tnh ton SPM trong m hnh C2F, iu ny mang n s hiu qu khi thc hin trn tp hp vector khng th t, thm ch cc tp vector ny c s phn t khc nhau. V mt ton hc, cng thc ca HSMK c nh ngha nh sau:
L 1 1 K ( X , Y ) = L F0 ( X , Y ) + L l +1 Fl ( X , Y ) 2 l =1 2
(4.9)
Vi:
1 2 R r l l Fl ( X , Y ) = F ( X , Yi ) = F ( X i , Yi ). R + 1 i =1 r =0 i =1
l i l
22 l
2l
Tm li, HSMK s dng thut ton cy kd (kd-tree) chuyn c trng cc b thnh cc t hnh nh ri rc v sau intersection kernel c chun ha bng phng php chun ha cn bc hai theo ng cho dng o tng ng trn histogram c V t hnh nh. Mi nh l tp hp gm
c trng cc b trong
khng gian D chiu v thut ton cy kd c phc tp log(V) thc hin vic chuyn c trng cc b. Do vy, phc tp ca HSMK l O(DMlog(V)) vi
35
36
Chng 5
Thc nghim
Trong chng ny, ti trnh by vic p dng HSMK vo bi ton phn loi nh. HSMK c thc nghim trn hai bi ton c th cho phn loi nh l phn loi i tng v phn loi cnh trn nhng c s d liu chun nh Oxford Flower, CALTECH-101, CALTECH-256, MIT Scene, UIUC Scene.
37
M hnh ton cho bi ton phn loi nh c th din t nh sau: cho tp d liu (dataset) D cha M nh X = {X1, X2, , XM} c nh ngha trn N lp ng ngha Y = {Y1, Y2, , YN}, trong mi nh Xi thuc X c phn loi vo mt lp duy nht Yi Y hay ta c th ni nh Xi c gn nhn Yi. Chn ngu nhin k nh t mi lp ng ngha cho trc (N lp) to tp d liu hun luyn DTrain. Ghi ch, mi lp ta c th chn s lng nh khc nhau lm d liu hun luyn, nhng thng thng ngi ta thng chn s lng nh ca mi lp l bng nhau trnh i hin tng bias d liu (tm dch hin tng chim u th v s lng ca mt lp so vi lp khc) khng cn thit. Tp cc nh cn li c gi l tp d liu kim tra DTest. Mc tiu ca bi ton l t tp nh d liu hun luyn DTrain, ta tm b phn lp F nhn thng tin u vo l Xi v tr v Yi tng ng sao cho khi thc hin vic kim tra b phn lp F trn tp d liu DTest t chnh xc cao nht c th. Giai on hun luyn: t tp d liu DTrain = {(Xi, Yi) | Xi thuc tp nh c chn, Yi l nhn tng ng ca Yi}. Mc tiu l hc b phn lp F: X Y. Giai on kim tra: trn tp d liu DTest = {(Xj, Yj) | Xj thuc tp nh khng c chn hun luyn, Yj l nhn tng ng ca Xj}. Dng b phn lp F c hc t tp hun luyn v nh kim tra Xj, ta thu c kt qu phn lp l Zj = F(Xj). Nu Zj=Yj th kt lun l vic phn lp cho nh Xj l ng, ngc li nu Zj Yj th vic phn lp cho nh Xj l sai.
38
nh cng ngy cng ln, pht trin theo cp s m nh c th thy qua cc website chia s nh nh Flickr, Picasa v.v, trong khi thi im lun vn c vit, cc my tm kim nh Google, Bing, Yahoo v.v ch c th da trn thng tin vn bn (text), cc h tr truy vn trn nh thc ra ch c thc hin bng cch truy vn vn bn trn nhng nh du ng ngha (tag) bng vn bn ca ngi s dng to cho nh, hoc tn ca tp tin nh thay v ni dung m nh cha. Hn na, gii quyt bi ton phn loi nh tc gii quyt bi ton lm th no biu din i tng trong nh tt, cng nh tm c o tng ng thch hp gia cc i tng, do vy n c th m rng gii cc bi ton quan trng khc trong lnh vc th gic my tnh nh pht hin i tng, nhn dng i tng v.v
39
5.1.4.1
Trong cng trnh [26], Nilsback v cc ng s kho st nhiu loi c trng cho bi ton phn loi i tng trn c s d liu Oxford Flower, t mu sc n hnh dng ca hoa, cng nh xy dng nhng c trng da trn vic phn on (segmentation) nh, rt trch ra cc i tng, loi b vng nn (background) c th gy nhiu cho vic phn loi, v xy dng c trng trn vng i tng c phn on ny: nh SIFT-Internal, SIFT-Boundary (c hai c trng ny da trn c trng SIFT c cng b bi Lowe trong [20][21]). Thm na, SIFT-Internal c bit l c trng tt nht cho bi ton phn loi i tng trn c s d liu Oxford Flower. Trong cng trnh [27], Oliva v Torralba xut c trng ton cc GIST m hnh cho cc nh cha cnh gii quyt bi ton phn loi cnh nh trn c s d liu MIT Scene hay UIUC Scene (UIUC Scene l c s d liu m rng t c s d liu MIT Scene). GIST c bit l c trng tt nht cho bi ton phn loi cnh. Ngoi ra, trong cc cng trnh [4][12][15] cng trnh by cc th nghim trong vic kt hp nhiu loi c trng li vi nhau miu t c nh tt hn. Nhng vic kt hp ca nhiu loi c trng yu cu chn la b trng s cho mi c trng khi kt hp li vi nhau t hiu qu tt nht. Vn ny dn n cc bi ton lin quan n vic xy dng phng php hc, Gehler v Nowozin [12] xut s dng phng php hc MKL hoc LP-B (Linear Programming Boosting) xc nh b trng cho cc c trng ngay trong qu trnh hc, thm na MKL cng c dng kt hp cc c trng nh trong [33][37]. Da trn vic kho st cc cng trnh cng b trn bi ton phn loi nh nh trong [7][8][13][14][18][34][36], c trng SIFT l c trng c s dng ph bin nht v t hiu qu cao cho bi ton phn loi nh.
5.1.4.2 Hng tip cn da trn phng php hc
Fergus v cc ng s [10] s dng tp hp cc thnh phn khc nhau ca i tng v xy dng th m hnh i tng da trn mi quan h v v tr gia
40
cc thnh phn cng nh s hin din ca cc thnh phn i tng trong nh. tng ny pht trin t tng ca m hnh mi lin h gia cc thnh phn (pictorial) c s dng trong cng trnh ca Fischler v Elschlager [11] nh c minh ha trong Hnh 3, im khc l cch tip cn ca Fergus ch gi li nhng phn m cn thit c th phn bit gia cc lp i tng vi nhau, do vy n c th trnh vic m hnh nhng phn khc nhau c tnh ton cc (tc vn cng mt lp i tng, nhng cc i tng li c nhiu th hin khc nhau hay s a dng ca th hin ca cng mt i tng), m hnh ny cn c gi l m hnh chm sao (constellation model).
Felzenszwalb v cc ng s [9] xut phng php latent-SVM xy dng m hnh quyt nh cho vic m hnh i tng m s dng tp ca cc phn i tng nhiu phn gii khc nhau da trn HoG [5] (deformable part model). Ngoi ra, xy dng b hc da trn kernel o tng ng ca cc i tng h tr cho vic phn loi l hng tip cn c nhiu nh nghin cu quan tm [13][18][34][36]. Ni bt trong hng tip cn ny l m hnh BoF v SPM nh c trnh by trong Chng 3. Kernel xut HSMK theo hng tip cn kernel, p dng cho bi ton phn loi nh. HSMK l s ci tin ca SPMK c th tnh ton tng ng gia nh tt hn di nhiu thch thc ca bi ton phn loi nh.
41
42
CALTECH-256 [14]. i vi bi ton phn loi cnh, ti nh gi hiu qu ca HSMK trn c s d liu MIT scene [27] v UIUC scene [18].
C s d liu ny cha 17 lp cc loi hoa thng dng ca Vng quc Anh, c thu thp bi Nilsback v cc ng s [25]. Mi lp ch 80 nh hoa c chp vi cc kch thc khc nhau, gc nhn khc nhau cng nh c iu kin chiu sng khc nhau. Hn na, hoa trong cng 1 lp nh Irises, Fritillaries v Pansies c s a dng rt ln v mu sc cng nh hnh dng, v trong mt s trng hp tng ng gia cc lp hoa khc nhau rt gn nhau, v d nh gia Dandelion v ColtsFoot. Hnh 4 minh ha mt s mu hoa trong c s d liu Oxford Flower. thc hin th nghim, ti s dng cch thit lp ca Gehler v Nowozin [12], chn ngu nhin 40 mu t mi lp hc (training set) v dng phn cn li cho vic kim tra (testing set), v khng s dng tp kim th (validation set) nh trong [25][26] chn tham s ti u cho h thng.
Bng 5.1: Bng so snh chnh xc phn lp (%) khi s dng mt c trng trn c s d liu Oxford Flower (vi NN k hiu cho thut ton phn lp lng ging gn nht: Nearest Neighbour)
Phng php HSV (NN) [26] SIFT-Internal (NN) [26] SIFT-Boundary (NN) [26] HOG [26] HSV (SVM) [12] SIFT-Internal (SVM) [12] SIFT-Boundary (SVM) [12] HOG (SVM) [12] SIFT (MSL) [34] Dense SIFT (HSMK)
chnh xc (%) 43.0 55.1 32.0 49.6 61.3 70.6 59.4 58.5 65.3 72.9
43
44
Bng 5.1 cho thy rng HSMK t c kt qu tt nht (state-of-the-art result) khi s dng mt loi c trng so vi cc hng tip cn c. N khng ch cho kt qu tt hn SIFT-Internal [26] m c bit l loi c trng tt nht cho c s d liu ny, vi lu l SIFT-Internel c tnh ton trn nh c segmentation, m cn tt hn c SPM vi h s ti u bng h thng hc t l MSL [34]. Thm na, Bng 5.2 cho thy rng kt qu t c t HSMK cng tt hn so vi SPMK.
Bng 5.2: Bng so snh chnh xc phn lp (%) gia HSMK v SPMK trn c s d liu Oxford Flower
C s d liu CALTECH:
cho thy tnh hiu qu cng nh kh nng m rng, ti cng nh gi HSMK trn c s d liu ln CALTECH-101 v CALTECH-256.Nhng c s d liu ny c tnh a dng cc th hin trong cng mt lp rt ln, cng nh s a dng v gc nhn v c s phc tp ca nn trong nh. Thm na, nh trong Hnh 5 minh ha mt s mu trong c s d liu CALTECH-101, mi hng minh ha mt lp trong c s d liu, ta c th nhn thy hang th 4 th hin lp chair v hng th 5 th hin lp Windsor_chair rt ging nhau v hnh dng v c a dng trong cng lp. i vi c s d liu CALTECH-101, ti thc hin th nghim khi s dng 5, 10, 15, 20, 25, 30 mu hc hun luyn cho mi lp, bao gm c lp nn (background class) v s dng n 50 mu mi lp cho kim tra. Bng 5.3 so snh kt qu phn lp da trn HSMK v cc cch tip cn khc. C th thy rng, HSMK t kt qu tng ng (comparable result) vi kt qu tt nht (state-of-theart result) thm ch khi ch s dng mt loi c trng trong khi cc cch tip cn
45
khc s dng kt hp ca nhiu loi c trng v thut ton hc phc tp nh MKL v linear programming boosting (LP-B) [12].
Bng 5.4 cho thy rng kt qu t c t HSMK cng tt hn so vi kt qu t t SPMK. Thm na, khi thc hin th nghim nu b lp nn th HSMK t c chnh xc 78.4% cho trng hp dng mi lp 30 mu hc. iu ny chng t, cch tip cn xut c tnh hiu qu cao mc d c thc hin n gin hn so vi cc cch tip cn t kt qu cao khc.
Bng 5.3: Bng so snh kt qu phn lp trn c s d liu CALTECH-101
5 (mu hc) Grauman & Darrell [13] Wang & Wang [34] Lazebnik et al. [18]
2
34.8% -
46
Yang et al. [36] Boimann et al. [4] Gehler & Nowozin (MKL) [12] Gehler & Nowozin (LP-Beta) [12] Gehler & Nowozin (LP-B) [12] Phng php xut (HSMK)
Bng 5.4: Bng so snh chnh xc phn lp ca HSMK v SPMK trn c s d liu CALTECH-101
Hnh 6 minh ha s a dng v th hin ca cc i tng trong c s d liu CALTECH-256, mi hng l mt lp trong c s d liu, CALTECH-256 l phin bn m rng ca CALTECH-101, nhng khng c chun ha nh trong CALTECH-101 nn s phc tp v nn l rt ln. V trn c s d liu CALTECH-256, ti thc hin th nghim vi HSMK khi s dng 15 v 30 mu t mi lp cho vic hc, bao gm c lp nn (clutter class) v 25 mu cho mi lp cho vic kim tra, cc mu u c chn ngu nhin t c s d liu CALTECH-256. Ti cng lp trnh li thut ton SPMK [14] nhng s dng c trng SIFT vi phn b dy t th nghim ca ti c th so snh cng bng v s hiu qu ca HSMK v SPMK. Nh trong Bng 5.5, HSMK cho chnh xc phn lp hn 3 phn trm so vi chnh xc ca SPMK.
47
48
Hnh 6: Minh ha c s d liu CALTECH-2563 Bng 5.5: Bng so snh kt qu phn lp trn c s d liu CALTECH-256
Kernel Griffin et al. (SPMK) [14] Yang et al. (ScSPM) [36] Gehler & Nowozin (MKL) [12] SPMK (vi Dense SIFT) Phng php xut (HSMK)
Phng php GIST [27] c trng cc b [15] Dense SIFT (SPMK) Dense SIFT (HSMK)
49
50
Trn c s d liu UIUC Scene5, ti thit lp th nghim nh miu t trong cng trnh ca Lazebnik v cc ng s [18]. Chn ngu nhin 100 mu t mi lp cho vic hc v kim tra tt c cc mu cn li trong c s d liu. T Bng 5.7 cho thy, kt qu t HSMK cng tt hn so vi kt qu ca SPMK [18] v SPM da trn biu din tha (sparse coding) [36].
Bng 5.7: Bng so snh kt qu phn lp trn c s d liu MIT Scene
Phng php Lazebnik et al. [18] Yang et al. [36] SPMK Phng php xut (HSMK)
5.2.3 Th nghim Sparse Coding cho Hierarchical Spatial Matching Kernel (ScHSMK)
nng cao hiu qu phn lp, thay v s dng Kmeans thnh lp t in, v thng k t vng, ti th nghim s dng m hnh biu din tha (sparse coding) kt hp vi HSMK trn hai c s d liu phn loi i tng l Oxford Flower v CALTECH-101.
5.2.3.1 ScHSMK trn c s d liu Oxford Flower
i vi c s d liu Oxford Flower, ti s dng kch c ca t in l M=800, v trong qu trnh tnh HSMK, ti th nghim vi trng hp dng kernel tuyn tnh (tch ni) thay cho cng thc (4.2) intersection kernel v gi l Linear Hierarchial Spatial Matching Kernel (HSMK-L), cc tham s th nghim khc c s dng nh trong phn 5.2.1.1. Bng 5.8 cho thy kt qu phn lp s dng biu din tha
C s d liu MIT-Scene c cung cp a ch: http://people.csail.mit.edu/torralba/code/spatialenvelope/spatial_envelope_256x256_static_8outd oorcategories.zip 5 C s d liu UIUC-Scene c cung cp a ch: http://www-cvr.ai.uiuc.edu/ponce_grp/data/scene_categories/scene_categories.zip
4
51
(Sparse Coding) lun cho kt qu tt hn so vi khi s dng lng t ha vector (vector quantization) (v d nh s dng thut ton Kmeans). Trong trng hp s dng biu din tha (sparse coding) th HSMK cng tt hn so vi SPMK khong 2 phn trm c khi s dng kernel tuyn tnh hay intersection kernel cho tnh ton c bn trn cc vng con nh trong cng thc (4.2). Thm na t Bng 5.8, ta c th thy biu din tha (sparse coding) c xu hng lm cho cc c trng t c tnh tuyn tnh nhiu hn so vi lng t ha vector (vector quantization), nn khi ta thay intersection kernel bng kernel tuyn tnh, kt qu khng thay i ng k.
Bng 5.8: Bng so snh kt qu phn lp s dng Sparse Coding so vi s dng vector quantization (Kmeans) trn Oxford Flower
Phng php SPMK Sparse Code + SPMK + Linear kernel (ScSPMK-L) Sparse Code + SPMK + Intersection kernel (ScSPMK) HSMK Sparse Code + HSMK + Linear Kernel (ScHSMK-L) Sparse Code + HSMK + Intersection kernel (ScHSMK)
5.2.3.2
i vi c s d liu CALTECH-101, ti s dng hai loi kch c ca t in l M=400 v M=800. V th nghim cho hai trng hp v s lng mu hc l 15 v 30 mu hc cho mi lp, cc tham s khc nh trong th nghim phn 5.2.1.2 i vi c s d liu CALTECH-101 .Ti cng thc hin th nghim vi trng hp dng kernel tuyn tnh (tch ni) thay cho cng thc (4.2) intersection kernel nh trong th nghim phn 5.2.3.1. Bng 5.9 cho thy HSMK vi biu din tha (Sparse coding) t c kt qu ti u (state of the art) trn c s d liu CALTECH-101. HSMK lun tt hn SPMK khong 2 n 4 phn trm vi cng phng php to t vng lng t ha vector (vector quantization) hay biu din tha (sparse coding), s dng kernel tuyn tnh hay intersection kernel. Khi s dng
52
biu din tha (sparse coding) kt qu c ci thin so vi khi ch s dng lng t ha vector (vector quantization), iu ny c th gii thch qua cng thc ti u nh c trnh by trong phn 0. Biu din tha (Sparse coding) cng lm cho cc c trng tr nn tuyn tnh hn, nh c th thy kt qu phn lp khi s dng biu din tha (sparse coding) vi kernel tuyn tnh, ta c th thu c kt qu tt hn hoc ngang vi khi s dng lng t ha vector (vector quantization) vi intersection kernel.
Bng 5.9: Bng so snh kt qu phn lp s dng Sparse Coding so vi s dng vector quantization (Kmeans) trn CALTECH-101
Vector quantization Sparse coding + linear kernel Sparse coding + intersection kernel Vector quantization Sparse coding + linear kernel Sparse coding + intersection kernel Vector quantization Sparse coding + linear kernel Sparse coding + intersection kernel Vector quantization Sparse coding + linear kernel Sparse coding + intersection kernel Boimain et al. [4]
30 mu hc 73.46 73.54 75.68 75.59 77.15 79.02 75.13 75.52 76.96 77.33 78.93
80.60 79.1
53
54
qu ti u trn c s d liu CALTECH-101 (c s d liu quan trong cho vic nh gi phn loi nh).
Kin ngh
Nghin cu v mt l thuyt s nh hng ca m hnh th mn (coarse to fine C2F) cho vic biu din nh v xy dng kernel. Nghin cu v l thuyt s tc ng ca biu din tha (sparse coding) ln nhiu phn gii (multi-resolution) trong HSMK.
55
Cc bi bo trn c http://sites.google.com/site/lttamvn
6
lu
tr
trn
trang
web
nghin
cu
nhn:
56
57
[10] R Fergus, P Perona, and A Zisserman. (2003),"Object class recognition by unsupervised scale-invariant learning," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2. [11] M Fischler and R Elschlager. (1973)"The representation and matching of pictorial structures," IEEE Transactions on Computers, pp. 67-92. [12] P Gehler and S Nowozin. (2009),"On feature combination for multiclass object classiffication," in ICCV, pp. 221-228. [13] K Grauman and T Darrell. (2005),"The pyramid match kernel: discriminative classiffication with sets of image features," in ICCV, pp. 1458-1465. [14] G Griffin, A Holub, and P Perona (2007)"Caltech-256 object category dataset," Technical Report 7694, California Institute of Technology, USA. [15] M Johnson. (2008), "Semantic Segmentation and Image Search," PhD Thesis, University of Cambridge, UK. [16] M Kloft, U Brefeld, P Laskov, and S Sonnenburg. (2008),"Non-sparse multiple kernel learning," in NIPS Workshop on Kernel Learning: Automatic Selection of Kernels. [17] R.I Kondor and T Jebara. (2003),"A kernel between sets of vectors," in ICML, pp. 361-368. [18] G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. Jordan. (2004), Learning the Kernel Matrix with Semidefinite Programming, in Journal of Machine Learning Research, vol. 5:27-72. [19] S. Lazebnik, C. Schmid, and J. Ponce. (2006),"Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories," in CVPR, vol. 2. [20] David G Lowe. (2004)"Distinctive Image Features from Scale-Invariant keypoints," International Journal of Computer Vision, vol. 60 (2): pp 91110.
58
[21] David G Lowe. (1999),"Object recognition from local scale-invariant features," in International Conference on Computer Vision, Corfu, Greece. [22] J Mairal, F Bach, J Ponce, and G Sapiro. (2009),"Online dictionary learning for sparse coding," in ICML, pp. 689-696. [23] S Maji, A Berg, and J Malik. (2008),"Classiffication using intersection kernel support vector machines is efficient," in CVPR, pp. 1-8. [24] F Moosmann, B Triggs, and F Jurie. (2008),"Randomized clustering forests for building fast and discriminative visual vocabularies," in NIPS Workshop on Kernel Learning: Automatic Selection of Kernels. [25] M.E Nilsback and A Zisserman. (2006),"A visual vocabulary for ower classiffication," in CVPR, vol. 2, pp. 1447-1454. [26] M.E Nilsback and A Zisserman. (2008),"Automated ower classiffication over a large number of classes," in ICVGIP. [27] A Oliva and A Torralba. (2001)"Modeling the shape of the scene: A holistic representation of the spatial envelope," in IJCV, pp. 145-175. [28] A. Rakotomamonjy, F. Bach, Y. Grandvalet, and S. Canu. (2008) SimpleMKL, in Journal of Machine Learning Research, vol. 9:24912521. [29] B Scholkopf, and A.J Smola. (2002) Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, USA. [30] S. Sonnenburg, G. Ratsch, C. Schafer, and B. Scholkopf. (2006) Large Scale Multiple Kernel Learning, in Journal of Machine Learning Research. [31] V. Vapnik, and A. Lerner. (1963), Pattern recognition using generalized portrait method, in Automation and Remote Control, 24, 774-780. [32] V. Vapnik, and A. Chervonenkis. (1964), A note on one class of perceptrons, in Automation and Remote Control, 25.
59
[33] M. Varma and D. Ray. (2007),"Learning the discriminative powerinvariance trade-off," in IEEE 11th International Conference on Computer Vision. [34] S.C Wang and Y.C.F Wang. (2010),"A multi-scale learning framework for visual categorization," in ACCV. [35] L Yang, R Jin, R Sukthankar, and F Jurie. (2008),"Unifying discriminative visual code-book generation with classiffier training for object category recognition," in CVPR, Los Alamitos, CA, USA, vol. 0, pp. 1-8. [36] J Yang, K Yu, Y Gong, and T Huang. (2009),"Linear spatial pyramid matching using sparse coding for image classiffication," in CVPR, pp. 1794-1801. [37] Q Yuan, A Thangali, V Ablavsky, and S Sclaroff. (2008),"Multiplicative kernels: Object detection, segmentation and pose estimation," in Computer Vision and Pattern Recognition.