You are on page 1of 61

I HC QUC GIA THNH PH H CH MINH

TRNG I HC KHOA HC T NHIN

L THANH TM

PHN LOI NH DA TRN HNG TIP CN KERNEL

LUN VN THC S NGNH KHOA HC MY TNH

Thnh ph H Ch Minh - 2011

I HC QUC GIA THNH PH H CH MINH

TRNG I HC KHOA HC T NHIN

L THANH TM

PHN LOI NH DA TRN HNG TIP CN KERNEL

Ngnh: KHOA HC MY TNH M s: 60.48.01

LUN VN THC S (Chuyn ngnh Tin hc)

NGI HNG DN KHOA HC: PGS. TS. NGUYN NH THC TS. TRN THI SN

Thnh ph H Ch Minh - 2011

LI CM N
Trc tin, ti xin chn thnh cm n PGS.TS. Nguyn nh Thc v TS. Trn Thi Sn hng dn tn tnh cho ti trong sut thi gian thc hin lun vn. Ti xin cm n GS. Akihiro Sugimoto (National Institute of Informatics, Tokyo, Japan) v TS. Yousun Kang (National Institute of Informatics, Tokyo, Japan) ch dn v cho ti nhng gp qu bu v ni dung lun vn trong thi gian thc tp 6 thng Vin Tin hc Quc gia Nht Bn (National Institute of Informatics, Tokyo, Japan). Ti xin cm n GS. Seiichi Mita (Toyota Technological Institutue, Nagoya, Japan) tn tnh h tr, hng dn v gip ti c nhng kinh nghim thc tin trong qu trnh thc tp 3 thng Hc vin K thut Toyota, Nagoya, Nht Bn (Toyota Technological Institute, Nagoya, Japan). Ti xin cm n GS. D. McAllister (Toyota Technological Institute, Chicago, USA) v GS. L. El Ghaoui (University of California, Bekerley, USA) tn tnh ging dy cho ti nhng nn tng c bn v my hc, ti u v th gic my tnh. Ti xin cm n ThS. Trn L Hng D v nghin cu sinh M. Kloft (University of California, Bekerley, USA) trao i, tho lun v truyn t nhng kinh nghim qu bu trong qu trnh thc nghim ti. Ti cng xin gi li cm n qu thy c, anh ch v bn b trong khoa Cng ngh thng tin, Trng i Hc Khoa Hc T Nhin TP.HCM, nhng ngi gip cng nh cung cp cho ti nhng kin thc, kinh nghim. Con xin cm n ba m v gia nh lun yu thng, h tr con trong sut thi gian hc tp, gip con c thm t tin thc hin tt cng vic. Xin chn thnh cm n! Ngi thc hin L Thanh Tm

MC LC
LI CM N .............................................................................................................1 MC LC ...................................................................................................................2 Danh mc cc k hiu v ch vit tt ..........................................................................5 Danh mc cc bng .....................................................................................................6 Danh mc cc hnh v, th ......................................................................................7 M U .....................................................................................................................8 Chng 1 1.1 1.2 Gii thiu ................................................................................................9 Mc tiu ..................................................................................................9 ng gp ca lun vn ...........................................................................9 Xy dng kernel cho thut ton SVM .............................................9 p dng kernel xy dng cho bi ton phn loi nh....................10

1.2.1 1.2.2 1.3 1.4 Chng 2 2.1

Cc ng gp khc lin quan ................................................................11 Cu trc ca lun vn............................................................................11 Thut ton phn lp da trn SVM ......................................................13 Hc vi mt kernel Support Vector Machine (SVM) .......................13 Thut ton phn lp SVM ..............................................................13 Kernel trong thut ton phn lp SVM ..........................................15

2.1.1 2.1.2

2.1.2.1 o tng ng s dng kernel .............................................15 2.1.2.2 Kernel xc nh dng (Positive Definite Kernel) ...................16 2.1.2.3 Xy dng khng gian ti sinh kernel Hibert (Reproducting Kernel Hibert Space RKHS) ......................................................................17 2.2 Hc vi nhiu kernel Multiple Kernel Learning (MKL) ...................19 SILP ...............................................................................................20

2.2.1

2.2.2 Chng 3 3.1 3.2 3.3 Chng 4 4.1 4.2 Chng 5 5.1

SimpleMKL ...................................................................................22

Phng php kernel ..............................................................................24 M hnh ti c trng (Bag-of-feature model BoF) ..........................25 Cc ci tin ca m hnh BoF ...............................................................26 Phng php biu din tha (Sparse Coding) ......................................28 Hierarchical Spatial Matching Kernel ..................................................30 Kernel thp khng gian (Spatial Pyramid Matching Kernel SPMK) 30 Kernel xut: Hierarchical Spatial Matching Kernel ........................31 Thc nghim .........................................................................................36 Phn loi nh (Image categorization) ...................................................36 Gii thiu bi ton phn loi nh ...................................................36 ng dng ca phn loi nh ..........................................................37 Nhng thch thc ca bi ton phn loi nh ................................38 Cc hng tip cn........................................................................38

5.1.1 5.1.2 5.1.3 5.1.4

5.1.4.1 Hng tip cn da trn c trng............................................39 5.1.4.2 Hng tip cn da trn phng php hc ...............................39 5.2 Thc nghim .........................................................................................41 Phn loi i tng ........................................................................42

5.2.1

5.2.1.1 C s d liu Oxford Flowers:..................................................42 5.2.1.2 C s d liu CALTECH:.........................................................44 5.2.2 5.2.3 Phn loi cnh (scene categorization) ............................................48 Th nghim Sparse Coding cho Hierarchical Spatial Matching

Kernel (ScHSMK) .............................................................................................50 5.2.3.1 ScHSMK trn c s d liu Oxford Flower .............................50

5.2.3.2 ScHSMK trn c s d liu CALTECH-101 ...........................51 Kt lun v kin ngh ................................................................................................53 Kt lun............................................................................................................53 Kin ngh .........................................................................................................54 Danh mc cng trnh ca tc gi ...............................................................................55 Ti liu tham kho .....................................................................................................56

Danh mc cc k hiu v ch vit tt


BoF C2F MKL HSMK PMK SPM SPMK SVM Bag of feature Coarse to fine Multiple Kernel Learning Hierarchical Spatial Matching Kernel Pyramid Matching Kernel Spatial Pyramid Matching Spatial Pyramid Matching Kernel Support Vector Machine

Danh mc cc bng
Bng 5.1: Bng so snh chnh xc phn lp (%) khi s dng mt c trng trn c s d liu Oxford Flower (vi NN k hiu cho thut ton phn lp lng ging gn nht: Nearest Neighbour) .........................................................................42 Bng 5.2: Bng so snh chnh xc phn lp (%) gia HSMK v SPMK trn c s d liu Oxford Flower .....................................................................................44 Bng 5.3: Bng so snh kt qu phn lp trn c s d liu CALTECH-101 ....45 Bng 5.4: Bng so snh chnh xc phn lp ca HSMK v SPMK trn c s d liu CALTECH-101 .............................................................................................46 Bng 5.5: Bng so snh kt qu phn lp trn c s d liu CALTECH-256 ....48 Bng 5.6: Bng so snh kt qu phn lp trn c s d liu MIT Scene (8 lp) 48 Bng 5.7: Bng so snh kt qu phn lp trn c s d liu MIT Scene............50 Bng 5.8: Bng so snh kt qu phn lp s dng Sparse Coding so vi s dng vector quantization (Kmeans) trn Oxford Flower ...................................................51 Bng 5.9: Bng so snh kt qu phn lp s dng Sparse Coding so vi s dng vector quantization (Kmeans) trn CALTECH-101 .................................................52

Danh mc cc hnh v, th
Hnh 1: M hnh tng qut cho phng php kernel ..........................................24 Hnh 2: Minh ha kernel HSMK c p dng trn nh X v Y vi L=2 v R=2 (a). u tin, HSMK chia nh ra thnh 2l x 2l cc vng con vi l=0, 1, 2 nh SPMK (b). Tuy nhin, HSMK s dng m hnh coarse-to-fine cho mi vng con bng cch tnh ton tng ng trn mt chui cc resolution khc nhau 2-r x 2-r vi r = 0, 1, 2 (c). Cng thc (4.8) m vector trng s c tnh t MKL vi kernel c bn c phn b ng nht c s dng xp x so khp ti u gia cc vng con thay v s dng m hnh BoF nh trong SPMK .......................................................32 Hnh 3: M hnh mi lin h gia cc thnh phn (Pictorial) .............................40 Hnh 4: Minh ha c s d liu Oxford Flower (17 lp) ....................................44 Hnh 5: Minh ha c s d liu CALTECH-101 ................................................45 Hnh 6: Minh ha c s d liu CALTECH-256 ................................................48 Hnh 7: Minh ha c s d liu MIT-Scene (8 lp) ............................................50

M U
Vi s bng n ca d liu nh, vic phn loi cc nh ra thnh cc lp ng ngha l mt trong nhng nhu cu c bn cho vic qun l v truy vn nh da trn ni dung ca nh. Thm na, phn loi nh l mt trong nhng bi ton c bn trong lnh vc th gic my tnh v ng dng my hc v nhn c s quan tm ca nhiu nh khoa hc trn th gii. Bi ton phn loi nh c rt nhiu thch thc t vic nh c chp di nhiu gc khc nhau, iu kin chiu sng khc nhau, s a dng cc th hin ca cng mt lp ng ngha cng nh s phc tp ca thng tin nn trong nh. gii quyt bi ton phn loi nh th c hai hng tip cn chnh l da trn c trng hoc da trn phng php hc. Trong , hng tip cn da trn phng php hc m c bit l nhnh tip cn da trn phng php kernel l mt trong nhng phng php c p dng rt rng ri v mang li kt qu cao trong bi ton phn loi nh ni ring v trong lnh vc th gic my tnh ni chung, do tnh mm do khi m t nh trong nhng iu kin phc tp nh trn. Do vy, trong lun vn ny, ti xut kernel mi, t tn l Hierarchical Spatial Matching Kernel (HSMK) v p dng cho bi ton phn loi nh. HSMK l m hnh ci tin t m hnh Spatial Pyramid Maching (SPM), nhng thay v s dng m hnh Bag-of-Feature (BoF) m hnh cho cc vng con (subregions), HSMK s dng m hnh th mn (coarse to fine C2F) cho cc vng con m c hin thc ha bng phng php multiresolution (tm dch nhiu loi phn gii), tc xem xt vng con trn mt chui cc phn gii (resolution) khc nhau, do vy, n c th miu t c thng tin tng qut ca vng con t nhng phn gii th, cng nh nhng thng tin chi tit ca vng con nhng phn gii mn hn nh cch thc xem xt mt vng trn bn , c th t c o tng ng tt hn trn cc vng con ny. T th nghim cho thy, kernel xut - HSMK cho hiu qu rt tt cho bi ton phn loi nh v t c kt qu ti u (state-of-theart) trn nhiu c s d liu chun cho bi ton phn loi nh.

Chng 1

Gii thiu

1.1 Mc tiu
Trong lun vn ny, ti nghin cu vic xy dng kernel cho thut ton phn lp trong lnh vc my hc, c th l thut ton phn lp Support Vector Machine (SVM). SVM thc hin vic phn lp bng cch tm siu phng (hyperplane) m cho php cc i ha khong cch bin (maximize margins). Trong khi , kernel ca SVM dng o tng ng gia cc mu hc, vic ny ng gp ln vo hiu qu phn lp ca thut ton SVM. Thm na, SVM l thut ton phn lp hiu qu v c s dng rt rng ri trong nhiu lnh vc, c bit trong lnh vc th gic my tnh. T kernel tuyn tnh (linear kernel) m s dng hm tng quan (correlation), hay tch ni (inner product) tnh tng ng trong vic phn chia lp thi gian u khi thut ton SVM c xut. Cc nh nghin cu nhn thy rng, d liu ngy cng phong ph v a dng, vic ny i hi cn phi s dng cc kernel phi tuyn (non-linear kernel) c th tm c siu phng hiu qu hn. Do vy, nghin cu xy dng kernel l mt trong nhng ch c nhiu nh nghin cu trn th gii quan tm. nh gi s hiu qu ca kernel xut, ti p dng kernel xut vo bi ton phn loi nh trong lnh vc th gic my tnh. Trong , bi ton phn loi i tng v phn loi cnh l hai th hin c th ca bi ton phn loi nh c thc nghim da trn vic p dng kernel xut phn lp.

1.2 ng gp ca lun vn
1.2.1 Xy dng kernel cho thut ton SVM
Lun vn xut Hierarchical Spatial Matching Kernel (HSMK), tm dch kernel so khp c tnh khng gian v phn cp. HSMK l s ci tin ca Sptial Pyramid Matching Kernel SPMK (tm dch kernel so khp dng thp) da trn m hnh th mn (coarse to fine C2F). SPMK c xut bi Lazebnik v cc

10

ng s [19] thc hin vic chia nh trn mt chui cc li c kch thc khc nhau thnh cc vng con (subregions), sau p dng m hnh ti c trng (Bag of features BoF) [6] m hnh cho cc vng con ny. Kernel xut - HSMK cng thc hin vic chia nh da trn mt chui cc li c kch thc khc nhau nh trong SPMK, nhng thay v s dng m hnh BoF m c bit hn ch trong vic m hnh vng c th o c tng ng ti u, HSMK s dng m hnh C2F c th xem xt vng trn nhiu kch c khc nhau, vic ny c th cho php HSMK t c s xp x tng ng ti u tt hn khi s dng BoF nh trong SPMK. HSMK c ti v cc ng s cng b trong bi bo Hiearchical Spatial Matching Kernel for Image Categorization hi ngh quc t v phn tch v nhn dng nh (International Conference on Image Analysis and Recognition ICIAR) Burnaby, British Columbia, Canada vo nm 2011.

1.2.2 p dng kernel xy dng cho bi ton phn loi nh


cho thy s hiu qu ca kernel xut - HSMK, ti p dng vo bi ton phn loi nh thng qua hai th hin l bi ton phn loi i tng v phn loi cnh. T thc nghim trn nhiu c s d liu nh chun (benchmark dataset) cho bi ton phn loi i tng nh Oxford Flower, CALTECH-101, CALTECH-256, cng nh cho bi ton phn loi cnh nh MIT Scene, UIUC Scene. HSMK cho kt qu vt tri so vi SPMK, vi lu rng, SPMK c bit nh kernel tt nht c dng m hnh i tng cho vic tnh ton tng ng trong nhiu bi ton ca lnh vc th gic my tnh, c bit l bi ton phn loi nh. Thm na, vic s dng kernel xut - HSMK cng cho kt qu cao nht (state of the art) hoc ngang vi cc cch tip cn khc trn cc c s d liu chun ny. Mt khc, hng tip cn s dng HSMK ch s dng mt kernel phi tuyn vi SVM trn mt loi c trng, trong khi cc phng php t kt qu cao nht khc trn cc c s d liu chun trn thng s dng trn nhiu loi c trng, cng nh s dng cc phng php hc phc tp nh hc vi nhiu kernel (multiple

11

kernel learning MKL), ti u tuyn tnh kt hp boosting (linear programming boosting LP-B).

1.3 Cc ng gp khc lin quan


Lun vn khng trnh by tt c cc ng gp c cng b ca ti trong thi gian l mt hc vin cao hc. Trong phn ny, ti trnh by tm tt ng gp khc lin quan n hng ca lun vn v my hc v th gic my tnh. Ti xut thut ton phn on (segmentation) mu cho nh bin bo giao thng da trn thut ton phn lp SVM. Thay v x l trn tng im nh (pixel) nh cch tip cn truyn thng, thut ton xut x l trn mt vng cc im nh c th s dng cc thng tin ln cn, nng cao hiu qu phn on mu trong nh giao thng. Thut ton ny c p dng vo vic pht hin bin bo giao thng cho h thng li xe t ng trong n H thng li xe t ng (Autonomous driving system) ca Hc Vin Cng Ngh Toyota, Nagoya, Nht Bn (Toyota Technological Institute, Nagoya, Japan). Cng trnh ny c cng b trong bi bo Realtime Traffic Sign Detection Using Color and Shape-Based Features hi ngh ln hai v h thng c s d liu v h thng thng tin thng minh (Asian Conference on Intelligent Information and Database Systems ACIIDS) Hu, Vit Nam, 2010.

1.4 Cu trc ca lun vn


Trong chng 2, ti trnh by khi qut nn tng l thuyt ca thut ton phn lp da trn Support Vector Machine (SVM), t SVM truyn thng vi vic hc da trn mt kernel ti dng hc nhiu kernel ca SVM, hay c bit vi tn gi bi ton Multiple Kernel Learning (MKL) cng nh l thuyt v kernel c s dng trong SVM cng nh trong MKL. Tip , trong chng 3, ti trnh by phng php hc da trn kernel m c xem l mt trong nhng hng tip cn chnh v hiu qu cho bi ton phn loi nh v trong chng 4, ti trnh by kernel m lun vn xut - Hiearchical Spatial Matching Kernel (HSMK). Cui cng,

12

chng 5 trnh by vic p dng HSMK vo bi ton phn loi nh m c th l bi ton phn loi i tng v bi ton phn loi cnh trn nhng c s d liu chun nh: Oxford Flower, CALTECH-101, CALTECH-256, MIT Scene v UIUC Scene.

13

Chng 2

Thut ton phn lp da trn SVM

Trong chng ny, ti trnh by khi qut l thuyt phn lp ca thut ton Support Vector Machine (SVM). Ti cng nhc li l thuyt kernel p dng cho thut ton SVM trong chng ny. Cui cng l mt hng nghin cu ang c cng ng nghin cu my hc rt quan tm l vic hc vi nhiu kernel cho SVM, hay c bit vi tn gi bi ton Multiple Kernel Learning (MKL).

2.1 Hc vi mt kernel Support Vector Machine (SVM)


2.1.1 Thut ton phn lp SVM
Thut ton phn lp SVM c xut bi Cortes v Vapnik vo nm 1995 [3]. Nhng nhng tng chnh ca thut ton phn lp SVM bt ngun t hai cng trnh ca Vapnik v Lerner vo nm 1963 [31] v Vapnik v Chervonenkis vo nm 1964 [32]. Thut ton SVM l mt b phn lp nh phn c xy dng cho mt tp d liu hun luyn nh sau: Gi X = {x1 , x2 ,..., xN } vi xi
n

l tp d liu nhp v Y = { y1 , y2 ,..., y N }

tng ng l tp d liu xut, hay cn gi l nhn ca cc mu d liu nhp vi

yi {1, +1} . Dtrain = ( X , Y ) c gi l tp d liu hun luyn cho thut ton phn
lp SVM. B phn lp tuyn tnh c m hnh nh sau:

y ( x) = sign( wT x + b)
Vi w
n

(2.1)

l vector trng s v b

. Khi , ta c rng buc d liu cho

thut ton hc SVM nh sau:

T w xk + b +1 T w xk + b 1
Ta c th kt hp tp iu kin (2.2) thnh:

yk = +1 yk = 1
(2.2)

yk ( wT xk + b) 1

k = 1, , N

(2.3)

14

Vi iu kin rng buc nh trong (2.3), vi cc tp d liu khng th phn tch c trn tt c cc mu hc th li gii cho thut ton phn lp SVM l rng, iu ny rt d xy ra trong thc t, do d liu hun luyn lun c nhiu. gii quyt cho trng hp ny, Cortes v Vapnik [3] thay i cng thc (2.3) thnh:

yk ( wT xk + b) 1 k

k = 1, , N

(2.4)

Vi bin slack k > 0 gii quyt cho trng hp mt s mu trong tp d liu hun luyn vi phm iu kin phn lp. Ta c th thy nhng mu c k > 1 l nhng mu vi phm iu kin phn lp so vi rng buc trong (2.3). Cng thc ti u dng nguyn thy (primal problem) theo khng gian trng s ca SVM c dng nh sau:
N 1 T w w + C k 2 k =1

min J P ( w, ) =
w,b ,

s.t.

yk ( wT xk + b) 1 k ,

k = 1,..., N

k 0,

k = 1,..., N

(2.5)

Vi C l mt s nguyn dng, c s dng iu khin gia vic ti u hm mc tiu v nhng mu vi phm rng buc phn lp ca SVM trong (2.3). T (2.5), ta c biu thc Lagrangian tng ng l:
N N

L( w, b, ; , v) = J p ( w, ) k ( yk ( w xk + b) 1 + k ) vk k (2.6)
T k =1 k =1

Vi cc h s Lagrangian

k 0, vk 0 vi k = 1,..., N . T biu thc (2.6), ta

c li gii ca vn tng ng vi vic gii bi ton:

max min L( w, b, ; , v)
,v
w,b ,

(2.7)

Ly o hm tng phn cho mi bin ca hm Lagrangian L trong (2.6), ta c:

15

L =0 w L =0 b L =0 k

w = k yk xk
k =1

k =1

yk = 0

(2.8)

0 k C , k = 1,..., N

Thay (2.8) vo (2.6) ta c bi ton i ngu dng ti u bc hai (Dual Quadratic Programming) cho bi ton SVM nh sau:
N 1 N T max J D ( ) = yk yl xk xl k l + k 2 k ,l =1 k =1

s.t.

k =1

yk = 0 k = 1,..., N

(2.9)

0 k C,

Do biu thc (2.9) l dng bi ton ti u bc hai (Quadratic Programming), do vy c th s dng cc b gii ti u (optimization solvers) tm li gii.

2.1.2 Kernel trong thut ton phn lp SVM


2.1.2.1 o tng ng s dng kernel

m rng kh nng phn lp ca thut ton SVM, thay v s dng hm tch ni (inner product) o tng ng gia 2 mu xi , x j trong khng gian d liu nhp hun luyn, khi nim kernel c a ra. u tin, d liu nhp s c chuyn sang khng gian H bng hm nh x nh sau:

: X H,
tng ng trong khng gian H, k hiu .,. ng nh sau:
H

( x)

(2.10)

tnh tng ng gia cc mu hc trong H, ta c th s dng hm tch ni . tin li, ta nh ngha hm tng

k:XX ,
m tha iu kin:

( x, x ')

k ( x, x ')

(2.11)

16

k ( x, x ') = ( x ), ( x ')

x, x ' X

(2.12)

Hm nh trong (2.12) c gi l hm kernel. 2.1.2.2 Kernel xc nh dng (Positive Definite Kernel)

Hm c nh ngha nh (2.12) thuc lp kernel xc nh dng (Positive Definite Kernel). iu ny cho php thut ton SVM, khi tnh tch ni c th s dng bt k hm kernel xc nh dng thay th cho ( x ), ( x ')
H

khi tnh

ton cho kernel k ( x, x ') . K thut ny c bit vi tn gi mo kernel (kernel trick). iu ny dn ti vi hm kernel xc nh dng, ta khng cn bit dng tng minh ca dng hm chuyn khng gian t khng gian d liu nhp vo khng gian H, m iu ny c nh ngha khng tng minh thng qua hm kernel. lm r hm kernel xc nh dng, ti nhc li mt s nh ngha sau: nh ngha 1 (ma trn Gram) Cho mt kernel K : X X v dy d liu x1 ,..., xn X . Ta gi ma trn

K c chiu n n c cha cc phn t nh sau:

Ki j = k ( xi , x j )
l ma trn Gram hay ma trn kernel k cho dy d liu x1 ,..., xn . nh ngha 2 (ma trn xc nh dng)

(2.13)

Ma trn i xng cc s thc c chiu n n c gi l xc nh dng khi v ch khi vi c1 ,..., cn , ta c:

i , j =1

cc K
i j

ij

(2.14)

Vi du bng trong (2.14) xy ra khi c1 = ... = cn = 0 , khi ma trn c gi l xc nh dng ngt (strictly positive definite). nh ngha 3 (Kernel xc nh dng) Nu n v x1 ,..., xn X , ma trn Gram K ij = k ( xi , x j ) l xc nh dng th ta gi kernel l kernel xc nh dng.

17

Trong phng php hc SVM vi kernel, ta c nh quan trng sau: nh kernel Mt hm k : X X l mt kernel xc nh dng khi v ch khi tn ti mt khng gian Hilbert H v mt hm nh x : X H tha iu kin x, x ' X , ta c k ( x, x ') = ( x ), ( x ') Chng minh : Gi s kernel c vit di dng (2.12), ta c:
2

i , j =1

cc
i

( xi ), ( x j )

c ( x ), c ( x )
i =1 i i j =1 j j H

c ( x )
i =1 i i

0. (2.15)
H

c trnh by trong 2.1.2.3, tc xy dng khng gian Hilbert v hm nh x cng nh cc tnh cht mong mun t kernel xc nh dng. 2.1.2.3 Xy dng khng gian ti sinh kernel Hibert (Reproducting

Kernel Hibert Space RKHS) Trong phn ny, ti trnh by cch xy dng khng gian Hilbert m mi phn t ca khng gian l mt hm kernel xc nh dng. Cho kernel k, ta thnh lp tp hp F nh sau:
n F = f (.) = i k (., xi ); i =1

Vi k (., x) : X

n , i , xi X

(2.16)

l hm v cng l mt phn t trong F. Ta thy rng tp hp F

trn s to thnh mt khng gian vector nu ta gn vi hai php ton cng

( f + g )( x) = f ( x) + g ( x) v php nhn vi s thc ( f )( x) = f ( x), .


Ta nh ngha php tch ni ca hai phn t trong khng gian ny, cho:
n n'

f (.) = i k (., xi )
i =1

g (.) = j k (., x ' j )


j =1

(2.17)

Vi n, n ' , i , j , xi , x ' j X , th tch ni c dng nh sau:

f,g

:= i j k ( xi , x ' j )
i =1 j =1

n'

(2.18)

18

Vi ghi ch rng, chng ta c th s dng tnh cht i xng ca kernel vit li nh sau:
n' n

j f (x ' j ) = f , g
j =1

= i g ( xi )
i =1

(2.19)

T tnh cht xc nh dng ca kernel k, ta c:

f, f

i , j =1

k(x , x ) 0
i j i j

(2.20)

T biu thc (2.20), ta suy ra rng vi mi hm f1 ,..., f p F v mi h s

c1 ,..., c p

, ta c:

i , j =1

cc
i

fi , f j

c f , c
i =1 i i j =1

fj
F

(2.21)

Do vy, .,.

l kernel xc nh dng trong khng gian vector ca tp hm F.

Thm na, khi g (.) = k (., x ) th theo nh ngha tch ni trong (2.18), ta c:

f , k (., x )

= i k ( xi , x) = f ( x),
i =1

x X

(2.22)

Tng t (2.22), ta c trng hp c bit nh sau:

k (., x ), k (., x ')

= k ( x, x ')

(2.23)

Tnh cht ny c bit vi tn gi tnh cht ti sinh (reproducing property) ca kernel. Tc mt hm f c th c biu din nh mt hm tuyn tnh c nh ngha bng tch ni trong khng gian vector ca tp hm F (nh biu thc (2.22)). chng minh tnh xc nh (definiteness property) ca tch ni, ti nhc li bt ng thc Cauchy-Schwarz:
nh : bt ng thc Cauchy-Schwarz

Nu k l kernel xc nh dng v x1 , x2 X th ta c:

k ( x1 , x2 ) 2 k ( x1 , x1 ) k ( x2 , x2 )
Chng minh

(2.24)

19

Do k l kernel xc nh dng nn ma trn Gram K ij = k ( xi , x j ) vi kch c

2 2 cng xc nh dng, hay cc tr ring (eigenvalues) ca ma trn Gram l


khng m, dn n nh thc ma trn Gram det( K ) 0 . Khai trin det(K) ta c iu phi chng minh.

0 det( K ) = k ( x1 , x1 )k ( x2 , x2 ) k ( x1 , x2 ) 2
(reproducing property) trong (2.22), ta c:

(2.25)

T bt ng thc Cauchy-Schwarz trong (2.24) v tnh cht ti sinh


2 F

f ( x ) = k (., x ), f
iu ny chng minh rng: f , f
F

k ( x, x ). f , f

(2.26)

=0 f =0.

Do tnh cht (2.22), nn khng gian vector c thnh lp trn tp F c gi l khng gian ti sinh kernel Hilbert (reproducing kernel Hibert space RKHS) t kernel xc nh dng k. Thm na, RKHS xc nh duy nht kernel xc nh dng k v ngc li, iu ny c trnh by trong nh l Moore-Aronszajn [1].
nh l Moore-Aronszajn [1]

i vi mi kernel xc nh dng k, th tn ti duy nht khng gian ti sinh kernel Hibert H m kernel ca n l k v ngc li. Chi tit hn v vic xy dng khng gian ti sinh kernel Hilbert xin tham kho [1].

2.2 Hc vi nhiu kernel Multiple Kernel Learning (MKL)


gii quyt cho vn chn kernel nh th no cho mt bi ton c th trong thc t, trong [18], Lanckriet v cc ng s xut vic hc SVM vi nhiu kernel. Hai tng chnh ca bi ton Multiple Kernel Learning (MKL) l: (i) tham s ha hm kernel l s kt hp tuyn tnh ca nhiu kernel xc nh dng, (ii) h s ca vic kt hp tuyn tnh cc kernel c ti u ha trong qu trnh hun luyn s dng thut ton m rng ca SVM.

20

Gi l s lng kernel dng cho vic hc SVM, cc kernel xc nh dng dng cho vic hc tng ng l k1, k2, k , th kt hp tuyn tnh kernel cho vic hc SVM c biu din l:

k (.,.) = k (.,.)
=1

s.t.

Vi l h s kt hp tuyn tnh ca kernel xc nh dng k (.,.) . Dng nguyn thy (Primal) cho bi ton hc MKL nh sau:
N 1 2 min ( w ' 2 ) + C i w ' , w0 , , 2 =1 i =1

= 1, 0
=1

= 1,...,

(2.27)

s.t. yi ( w ' , ( xi ) + w0 ) 1 i
=1

(2.28)

= 1, 0, 0,
=1

Vi (xi, yi) l d liu hun luyn, ( xi ) l hm chuyn d liu nhp xi ca kernel

k (.,.) vo khng gian RKHS tng ng. Trong cng thc (2.28), do ta c lng
nhn gia hai bin primal l v w ' nn cng thc l khng li. Bng cch t

w = w ' ta s chuyn v c dng bi ton tm cc tiu ca hm li (convex


minimization problem) [2]. gii bi ton MKL, cc nh nghin cu my hc m hnh ha bi ton v nhiu dng khc nhau, trong hai thut gii: SILP ca Sonnenburg [30] v SimpleMKL ca Rakotomamonjy [28] m s dng SVM truyn thng nh mt phn ca thut gii c s dng rng ri hn c.

2.2.1 SILP
Trong [30], Sonnenburg chuyn (2.28) v dng bi ton SILP (semi-infinitive linear program), hay l bi ton ti u tuyn tnh hm mc tiu vi s lng rng buc l v tn. Dng SILP ca bi ton MKL c Sonnenburg a ra nh sau:

21

max
,

s.t.

p =1

= 1, p 0 S p ( ) ,

p = 1,..., Z

(2.29)

p =1

n Z = n 0 i C , i yi = 0 i =1 Vi hm S p ( ) c nh ngha nh sau:

n 1 n n S p ( ) = i j yi y j k p ( xi , x j ) i 2 i =1 j =1 i =1

(2.30)

y l bi ton ti u tuyn tnh (linear program) do , ch c rng buc bi iu kin tuyn tnh. Nhng do c tha khi tm li gii ti u.
Thut ton SILP cho MKL Nhp: Tham s C>0 (parameter regularization), tp kernel, tp d liu hun

Z nn ta c v hn iu kin rng buc cn

luyn.
Xut: Cc tham s , b,
1 1: Khi to trng s cho cc kernel: p , p = 1,...,
0 0 2: ( , b ) T vic gii SVM vi

3: t 0 (gn t = 0, t dng xc nh s vng lp ca thut ton SILP) 4: while (iu kin dng cha tha) do 5: 6:

( t , t ) T vic gii (2.29) vi tp rng buc { 0 ,..., t }

t +1

T vic gii * = arg max


Z

p S p ( ) vi ,
p =1

22

7: 8: 9: 10:

if

p =1

t p

S p ( t +1 ) t then

break end if

t t +1

11: end while Chi tit v thut ton SILP, xin tham kho thm trong [30]. Thut ton SILP c ci t v cng b trong Shogun Toolbox a ch: http://www.shoguntoolbox.org

2.2.2 SimpleMKL
Trong [28], Rakotomamonjy chuyn (2.28) v dng ti u nh sau:

min
s.t.

1 1 g ( ) = min{ vp 2 p =1 p

+ C L( yi , v p , p ( xi )
i =1 p =1 {v|v p = p w p },b

Hp

+ b)}

= 1, 0
=1

(2.31)

Rakotomamonjy s dng phng php gii gradient descent (tm dch l gim dn theo hng xc nh bi hng o hm ca hm mc tiu theo bin ti u) gii cho dng ti u MKL c miu t trong (2.31). c th thc hin, Rakotomamonjy a ra cng thc tnh o hm ca tham s SVM theo vector trng s ca kernel nh sau:

g 1 n n = i* * yi y j k p ( xi , x j ) j p 2 i =1 j =1
cc im ny tha:

(2.32)

Vi ( *, b*) l im cc i ca hm g , ngi ta cng chng minh c rng

( *) = 0 p

b (b*) = 0 p

(2.33)

23

Thut ton SimpleMKL cho MKL Nhp: Tham s C>0 (parameter regularization), tp kernel, tp d liu hun

luyn.
Xut: Cc tham s , b,
1 1: Khi to trng s cho cc kernel: p , p = 1,...,

2: while (iu kin dng khng tha) do 3: 4: descent. 5: end while Chi tit v thut ton SimpleMKL, xin tham kho thm trong [28]. Thut ton SimpleMKL c ci t v cng b trong SimpleMKL Toolbox a ch: http://asi.insa-rouen.fr/enseignants/~arakotom/code/mklindex.html.

( , b ) T vic gii SVM vi kernel k (.,.) = p k p (.,.)


p =1

Cp nht trong s , s dng bc gradient trong thut ton gradient

24

Chng 3

Phng php kernel

Trong cc hng tip cn cho bi ton phn loi nh, phng php kernel l mt trong nhng phng php hiu qu v c p dng ph bin. Phng php kernel s dng hm phi tuyn (non-linear) tnh tng ng ca hai mu thay v s dng tch ni (inner product) hm tuyn tnh (linear) cc i khong cch bin (maximize margine) thng qua vic gii bi ton ti u bc hai (quadratic optimization), tiu biu l thut ton hc gim st Support Vector Machines (SVM). Phng php kernel thng bao gm cc bc sau: t nh u vo, thc hin rt trch cc c trng, c th l c trng cc b hay c trng ton cc, tp hp nhng c trng ny i din cho nh, k tip tp c trng s c biu din theo mt phng php biu din nh thch hp c th p dng hm tnh tng ng, kt qu ny c a vo b phn lp SVM xc nh mu thuc phn lp no nh c tm tt trong Hnh 1.

nh u vo

Rt trch c trng

Biu din nh (xy dng kernel)

B phn lp SVM

Hnh 1: M hnh tng qut cho phng php kernel

Trong , lun vn tp trung nghin cu phn biu din nh (xy dng kernel) cho bi ton phn loi nh.

25

3.1 M hnh ti c trng (Bag-of-feature model BoF)


xy dng hm tnh ton tng ng gia hai nh c biu din di dng tp cc c trng, m hnh ti c trng BoF [6] mn tng m hnh ti t (Bag-of-word model) t lnh vc x l ngn ng t nhin (Natural Language Processing) trong bi ton tm ch ca vn bn. Mi nh s tng ng l mi vn bn, cc c trng trong nh tng ng vi cc t trong vn bn, v ch ca vn bn tng ng l lp ng ngha hay nhn cho nh cn c phn loi. M hnh BoF c thc hin gm hai pha chnh: xy dng t in t cc c trng, s dng t in m hnh nh cho vic tnh ton tng ng. Xy dng t in (codeword dictionary): tng chnh l lng t ha (quantized) cc c trng v khng gian nh hn gim nhiu. Trong bc ny, ton b cc c trng ca tt c cc nh trong tp d liu hc c chn, ta t tn l tp c trng ca tp hc Dtrain ( cho tin li v mt thi gian v ph hp kch c b nh, thng ngi ta s chn tp con Dset c chn ngu nhin t Dtrain), sau thut ton phn cm khng gim st Kmeans (unsupervised learning Kmeans clustering algorithm) c p dng, phn tp c trng v Nw cm, mi t vng (codeword) trong b t in (codebook hay dictionary) c nh ngha l cc tm (centroid) ca cm thu c t thut ton Kmeans. Tp hp tt c cc t vng to thnh b t in cho m hnh BoF. T tp c trng i din cho mi nh, vi mi c trng, ta tm mt t vng tng ng trong b t in. Bc ny ta tnh khong cch t mi c trng n mi t vng trong b t in, c trng c khong cch ngn nht n t vng no, th c trng s tng ng vi t vng .Thng thng th khong cch Euclid c s dng trong bc ny. M hnh ton cho vic nh x c trng thnh t vng nh sau: gi D l t in gm Nw t vng D = {W1, W2, , WNw} v

l hm tnh khong cch c s


(3.1)

dng, mi c trng xf s tng ng vi t vng c ch s l:

id x f = arg min ( x f ,Wi )


D

26

Nh vy, t tp c trng ca nh, ta chuyn thnh tp nhng t vng i din cho nh. Sau , ta ly histogram ca t vng ghi nhn tng s ln xut hin ca cc t vng trong nh. T thc nghim cho thy t c tng ng gia hai nh tt, th phi p dng vi kernel phi tuyn nh intersection kernel hoc phn lp s dng thut ton SVM.

2 (chi-square) kernel trong

3.2 Cc ci tin ca m hnh BoF


Nhiu phng php nghin cu gn y c xut ci tin m hnh BoF truyn thng. Hng tip cn s dng m hnh pht sinh (generative model) [4][7] m hnh s ng hin ca cc t vng, hoc thay v s dng Kmeans lng t ha xy dng t vng cho t in, trong [24][35], cc tc gi thc hin vic hc tm b t vng cho phn bit cao gia cc lp ng ngha nng cao hiu qu phn lp. Mc khc, hng tip cn s dng biu din tha (sparse coding) thay cho Kmeans xy dng b t in cng t c nhiu thnh cng nh trong trong cc cng b [22][36]. Biu din tha c cch xy dng b t in tng t vi Kmeans, u thc hin vic gii bi ton ti u (optimization), nhng biu din tha s dng rng buc mm hn so vi Kmeans, do vy s nhn t li hn khi thc hin vic ti to li c trng ban u (error reconstruction), cng nh t c b t vng tt hn, c trnh by chi tit trong phn 3.3. Mt trong nhng im yu chnh ca m hnh BoF l b qua thng tin khng gian ca c trng cc b trong nh (spatial information), khc phc iu ny, Lazebnik v cc cng s [18] xut m hnh thp khng gian (spatial pyramid kernel), mt cch m rng ca m hnh BoF, SPM s dng mt chui cc li c kch thc khc nhau chia nh thnh cc vng con (subregion) v sau s dng m hnh BoF thng k tng hp (aggregated statistics) c trng cc b trn cc vng con (subregions) c nh thay v ch s dng trn ton nh nh trong m hnh BoF c in, cui cng tp hp cc m hnh BoF trn cc vng con c ni li theo th t c nh ngha trc m hnh cho nh.

27

Trong cc ci tin t m hnh BoF th SPM mang li hiu qu cao v n gin khi thc hin. Do vy, SPM c s dng nh mt thnh phn chnh trong nhiu h thng t kt qu tt nht (state-of-the-art) trong lnh vc phn loi nh [12]. Cng nh m hnh BoF, th SPM s mang li hiu qu tt nht khi c s dng vi kernel phi tuyn nh intersection kernel hoc

2 (chi-square) kernel. Nhng

kernel phi tuyn ny c phc tp tnh ton cao cng nh khng gian lu tr ln so vi phng php tuyn tnh. gii quyt vn ny, Maji v cc ng s [23] a ra mt phng php tnh ton xp x nng cao hiu qu xy dng histogram intersection kernel, gim phc tp tnh ton, nhng hiu qu trn ch t c bng cch s dng bng ph c tnh ton trc, m c xem nh mt loi tnh ton trc cho hun luyn SVM phi tuyn. x l cho d liu ln, Yang v cc ng s [36] a ra m hnh tuyn tnh SPM vi biu din tha (spare coding) (ScSPM) trong tch ni (kernel tuyn tnh) c s dng thay v kernel phi tuyn da trn tnh cht tuyn tnh ca d liu tha. Wang & Wang [34] xut m hnh hc trn nhiu kch c (multiscale learning - MSL) bng cch s dng multiple kernel learning (MKL) xc nh cc h s cho m hnh SPM thay v s dng h s c xc nh trc ca m hnh SPM nguyn thy. Trong lun vn ny, ti xut hm kernel mi da trn hng tip cn ca m hnh th mn (coarse to fine C2F) cho cc vng con (subregion) trong m hnh SPM, v t tn l Hierarchical Spatial Matching Kernel (HSMK). M hnh C2F gip cho vng con c xem xt nhiu mc khc nhau, c th hnh tng nh khi xem bn , mc th cho php quan st ton cnh, thm na, mc mn th cho php quan st cc chi tit. Do vy, HSMK khng ch gip m t thng tin th t khng gian ca c trng cc b m cn c th o chnh xc tng ng gia cc tp hp ca c trng cc b khng th t ly t cc vng con. Trong HSMK, vic p dng m hnh C2F trn cc vng con c hin thc ha bng cch s dng nhiu phn gii (multi-resolution). Do vy, c trng cc b c th miu t thng tin chi tit ca nh hoc i tng t vng con phn gii mn (fine resolution) v c thng tin ton cc ca vng con phn gii th hn. Thm na, vic so

28

khp da trn m hnh C2F l qu trnh phn cp (hierarchical), iu ny c ngha l c trng m khng tm c s so khp phn gii mn c kh nng c so khp phn gii th hn. Nh vy, kernel c xut c th t c s xp x so khp ti u (optimal matching) tt hn gia cc vng con so vi SPM. Tm li, HSMK ch trng vo vic ci thin o tng ng gia cc vng con bng cch s dng m hnh C2F, c hin thc ha bng cch s dng nhiu phn gii (multi-resolution), thay v s dng m hnh BoF trn cc vng con nh trong SPM. Vic xem xt vng con bng cch s dng mt chui cc phn gii (resolution) khc nhau tng t nh trong kernel so khp dng thp (pyramid matching kernel) [13], nhng thay v s dng vector trng s c nh ngha trc cho cc intersection kernel c bn trn cc vng con cho vic kt hp trn nhiu phn gii (resolution) khc nhau, ti chuyn bi ton v dng hc trn nhiu kernel c phn b ng nht (uniform multiple kernel learning uniform MKL) tm vector trng s hiu qu hn. u im ca HSMK l n c th c dng trn tp hp cc c trng khng th t c s phn t khc nhau bng cch p dng chun ha cn bc hai theo ng cho (square root diagonal normalization) [28] cho cc intersection kernel c bn trn vng con m iu ny khng c xem xt trong PMK [13].

3.3 Phng php biu din tha (Sparse Coding)


Gi X = {x1, x2, , xM} l tp hp cc c trng, vi xi thuc khng gian Lng t ha (quantization) bng cch p dng K-means c th m hnh nh sau:
M
d

min min xm vk
V m =1 k =1.. K

(3.2)

Vi V = {v1, v2, , vK} l tp hp K cluster tm c bi phng php Kmeans (hay cn gi l codebook hay t in) v || . || l L2-norm ca vector. Hay ta c th din t l vi vector xm ta tm vector vk (codeword t vng) tng ng trong t in bng cch tm vector vk sao cho khong cch t xm ti vk l ngn nht (thng thng th hm Euclide c s dng tnh khong cch). Ta c th chuyn cng

29

thc ti u (optimization) (3.2) v bi ton phn tch ma trn (matrix factorization) nh sau: Ta gi uj l vector xc nh t vng ca xj trong t in (codebook), vi

Card (u j ) = 1 , tc xj thuc v t vng th p th thnh phn th p trong uj bng 1,


cn cc thnh phn cn li bng 0, v tp U = {u1, u2, , uM} l tp hp xc nh t vng ca vector c trng tng ng vi tp X. Do vy, ta c th a cng thc Kmeans v dng sau:
M

min xm umV
U ,V m =1

(3.3)

Tha:

Card ( u m ) = 1, u m = 1, u m 0, m,

Vi | . | l L1-norm ca vector v vector u 0 c dng k hiu vector khng m tc cc thnh phn ca vector u khng m. Rng buc Card(um) = 1 l rng buc mnh, ta lm yu rng buc ny bng cch thay bng L1-norm regularization ca um. iu ny lm cho kt qu um khi ti u (optimization) s c rt t thnh phn khc 0 trong um. Cng thc ti u (3.3) tr thnh:
M

min xm umV
U ,V m =1

+ um

(3.4)

Tha:

vk 1, k .

iu kin L2-norm trn vk trnh li gii bt k. Bi v trong cng thc ca hm mc tiu (objective function) trong cng thc ti u (3.4), ta c lng um.V, do vy ta c th gim um xung t ln v tng V ln t ln th ta c th gim hm mc tiu v bi ton ti u nh trong cng thc (3.4) cn c gi l biu din tha (sparse coding).

30

Chng 4

Hierarchical Spatial Matching Kernel

Trong chng ny, u tin ti m t cng thc gc ca SPM lm c s cho vic gii thiu kernel mi HSMK m s dng m hnh th mn (coarse to fine C2F) trn cc vng con nh l mt c s ci tin hiu qu so vi kernel SPM.

4.1 Kernel thp khng gian (Spatial Pyramid Matching Kernel SPMK)
Mi nh c i din l tp hp cc vector c trng trong khng gian d chiu. Cc c trng c lng t ha (quantized) thnh cc thnh phn ri rc c gi l t hnh nh (visual words) bng cch s dng thut ton hc phn cm khng gim st K-means (unsupervised clustering algorithm Kmeans) hoc bng phng php biu din tha (sparse coding). Vic so khp gia cc c trng cc b chuyn thnh vic so khp trn min ri rc ca cc t hnh nh (visual words) tng ng. iu ny c ngha l cc t hnh nh c so khp khi chng ging nhau v khng so khp khi chng khng ging nhau. SPM thc hin trn mt chui cc t l (scale) khc nhau vi l = 0, 1, 2, ,L ca nh u vo. Trn mi t l co dn, n chia nh thnh 2l x 2l cc vng con theo chiu dc v chiu ngang ca nh v p dng m hnh BoF o tng ng gia cc vng con ny. Gi X, Y l hai tp hp cha cc vector trong khng gian D chiu. S tng ng gia hai tp hp trn t l l l tng tng ng gia tt c cc vng con tng ng ca nh t l .

K l ( X , Y ) = I ( X il , Yi l ),
i =1

22 l

(4.1)

Vi

X il l tp hp ca c trng ca vng con th i t l l ca tp hp vector nh

X. Intersection kernel I gia vng con

X il v Yi l c nh ngha nh sau:

31

I ( X , Yi ) = min( H X l ( j ), HY l ( j )),
l i l j =1
i i

(4.2)

Vi V l tng s t hnh nh c dng trong b t in xy dng, H ( j ) l s ln xut hin ca t hnh nh th j m thu c t vic lng t ha ca c trng cc b trong tp . Cui cng, kernel SPM c tnh l tng c trng s c nh ngha trc ca tng ng trn chui cc t l c p dng:
L 1 1 K ( X , Y ) = L K 0 ( X , Y ) + L l +1 K l ( X , Y ). 2 l =1 2

(4.3)

Trng s

1 2
L l +1

tng ng t l l l nghch o t l chiu rng gia cc vng

con c nh ngha t l tng ng. Trng s ny c dng b cho vic so khp trn nhiu vng c t l khc nhau bi v c trng cc b d tm thy s so khp nhng vng rng ln hn. Thm na, nhng cp c trng c so khp t l l cng xut hin t l mn hn (l ) ca resolution vi > 0 .

4.2 Kernel xut: Hierarchical Spatial Matching Kernel


ci thin tnh hiu qu trong vic tnh ton tng ng gia cc vng con, ti xut vic s dng m hnh th mn (coarse to fine C2F) trn cc vng con bng cch thc hin trn mt chui cc phn gii (resolution) khc nhau (2-r x 2-r) vi r = 0, 1, 2, ,R nh trong PMK [13]. Vi X i v
l

Yi l ln lt l tp hp ca cc c trng cc b ca nhng vng con

th i t l l ca tp hp vector trong nh X, Y. mi phn gii (resolution) r, ti p dng intersection kernel F r c chun ha bng phng php chun ha cn bc hai theo ng cho (square root diagonal normalization) o tng ng gia chng nh sau:

F ( X , Yi ) =
r l i l

I ( X il (r ), Yi l (r )) I ( X (r ), X (r )) I (Yi (r ), Yi (r ))
l i l i l l

(4.4)

32

Vi

l X il ( r ) v Yi ( r ) ln lt l tp hp X il v Yi l phn gii (resolution) r.

Thm na, histogram intersection ca mt tp hp bt k v chnh n bng vi s phn t ca tp hp (cardinality). Do vy, gi phn t ca tp hp

X l ( r )
i

Y l ( r )
i

ln lt l s

X il ( r ) v Yi l ( r ) cng thc (4.4) tr thnh nh sau:

I ( X il (r ), Yi l (r )) F ( X , Yi ) = . X l ( r )Y l ( r )
r l i l
i i

(4.5)

Vic chun ha cn bc hai theo ng cho cho intersection kernel khng ch duy tr vic tha iu kin Mercer v kernel m cn gip b cho vic khc nhau v s lng phn t gia cc tp hp nh th hin trong cng thc (4.5). t c tng ng tng hp ca m hnh C2F, ti nh ngha s kt hp tuyn tnh (linear combination) trn mt chui cc kernel cc b, vi mi thnh phn kernel cc b c tnh ton theo cng thc (4.5) mi phn gii. T , hm kernel F gia tp hp

X il v tp hp Yi l

trong m hnh C2F c nh ngha nh sau:


l i l R

F ( X , Yi ) = r F r ( X il , Yi l )
r =0

(4.6)

Vi:

r =0

= 1, r 0, r = 0,1, 2,..., R.

Hnh 2: Minh ha kernel HSMK c p dng trn nh X v Y vi L=2 v R=2 (a). u tin, HSMK chia nh ra thnh 2l x 2l cc vng con vi l=0, 1, 2

33

nh SPMK (b). Tuy nhin, HSMK s dng m hnh coarse-to-fine cho mi vng con bng cch tnh ton tng ng trn mt chui cc resolution khc nhau 2-r x 2-r vi r = 0, 1, 2 (c). Cng thc (4.8) m vector trng s c tnh t MKL vi kernel c bn c phn b ng nht c s dng xp x so khp ti u gia cc vng con thay v s dng m hnh BoF nh trong SPMK

Hn na, khi cc kernel cc b c kt hp tuyn tnh a vo thut ton SVM, ta c th chuyn bi ton v dng MKL, trong cc kernel c bn c nh ngha nh trong cng thc (4.5) trn cc phn gii khc nhau ca cc vng con nh sau:
N 1 ( w 2 ) 2 + C i min w , w0 , , 2 =1 i =1

s.t.

yi ( w , ( xi ) + w0 ) 1 i
=1

(4.7)

= 1, 0, 0,
=1

Vi xi l mu nh hc, yi l nhn lp tng ng ca xi, N l s mu dng cho vic hc, ( w , w0 , ) l cc tham s ca SVM, C l tham s xc nh bin mm (soft margin) c nh ngha trc b cho li ca mu hc trong thut ton SMV, l vector trng s cho cc kernel cc b c bn, l s lng kernel cc b c bn ca cc vng con trn mt chui cc resolution khc nhau, bt k thnh phn no ca vector u khng m (hay vector

0 ngha l

cn c gi l

vector khng m), ( x ) l hm chuyn vector x vo khng gian ti sinh kernel Hibert (RKHS) v .,. k hiu cho tch ni (inner product). MKL tm cc tham s cho SVM ng thi xc nh vector trng s cho cc kernel cc b c bn. Thm na, nhng kernel cc b c bn ny c nh ngha trn nhiu phn gii khc nhau ca cng vng con. Do vy, trng lp thng tin gia chng l cao. T th nghim ca Gehler v Nowozin [12] v c bit l ca Kloft v cc ng s [16] chng t rng MKL vi kernel c bn c phn b ng nht - phng php xp x

34

chuyn bi ton MKL v dng SVM truyn thng vi kernel phi tuyn l phng php hiu qu nht xt trn kha cnh ca chnh xc cng nh thi gian tnh ton. Do vy, cng thc (4.6) vi cc h s kt hp tuyn tnh t c t MKL vi kernel c bn phn b ng nht tr thnh (Uniform MKL):

1 R r l l F ( X , Yi ) = F ( X i , Yi ). R + 1 r =0
l i l

(4.8)

Hnh 2 minh ha cch thc hin ca HSMK vi L=2 v R=2. HSMK cng thc hin vic xem xt vng con trn mt chui cc phn gii khc nhau nh trong PMK c th t c o tng ng tt hn. Tuy nhin, HSMK tnh ton vector trng s da trn MKL vi kernel c bn phn b ng nht, do vy t c hiu qu hn, cng nh c th gii thch v mt l thuyt thay v dng vector trng s nh ngha trc nh trong PMK. Thm na vic p dng chun ha kernel c bn bng phng php chun ha cn bc hai theo ng cho gip cho HSMK thc hin tt trn cc tp vector c s phn t khc nhau m khng c xem xt trong PMK. HSMK c nh ngha da trn vic tnh ton SPM trong m hnh C2F, iu ny mang n s hiu qu khi thc hin trn tp hp vector khng th t, thm ch cc tp vector ny c s phn t khc nhau. V mt ton hc, cng thc ca HSMK c nh ngha nh sau:
L 1 1 K ( X , Y ) = L F0 ( X , Y ) + L l +1 Fl ( X , Y ) 2 l =1 2

(4.9)

Vi:

1 2 R r l l Fl ( X , Y ) = F ( X , Yi ) = F ( X i , Yi ). R + 1 i =1 r =0 i =1
l i l

22 l

2l

Tm li, HSMK s dng thut ton cy kd (kd-tree) chuyn c trng cc b thnh cc t hnh nh ri rc v sau intersection kernel c chun ha bng phng php chun ha cn bc hai theo ng cho dng o tng ng trn histogram c V t hnh nh. Mi nh l tp hp gm

c trng cc b trong

khng gian D chiu v thut ton cy kd c phc tp log(V) thc hin vic chuyn c trng cc b. Do vy, phc tp ca HSMK l O(DMlog(V)) vi

35

M=max( X , Y ). Vi lu rng, phc tp ca thut ton so khp ti u [17] l O(DM3).

36

Chng 5

Thc nghim

Trong chng ny, ti trnh by vic p dng HSMK vo bi ton phn loi nh. HSMK c thc nghim trn hai bi ton c th cho phn loi nh l phn loi i tng v phn loi cnh trn nhng c s d liu chun nh Oxford Flower, CALTECH-101, CALTECH-256, MIT Scene, UIUC Scene.

5.1 Phn loi nh (Image categorization)


5.1.1 Gii thiu bi ton phn loi nh
Phn loi nh l bi ton phn loi mi nh cho vo mt lp ng ngha c th (semantic class). Lp ng ngha c nh ngha da trn vic nh m t loi phong cnh g, v d: ni, b bin hay ta nh - trong trng hp ny, y l bi ton con ca phn loi nh, cn c bit di tn phn loi cnh (scene categorization). Ngoi ra, lp ng ngha c th c nh ngha l nh cha i tng quan tm, v d: gh, du thuyn hay gu trc trng hp ny, bi ton cn c gi di tn phn loi i tng (object categorization). Vi ghi ch, mi nh ch thuc v mt lp xc nh c nh ngha trc. Phn loi nh l mt trong nhng bi ton trong lnh vc th gic my tnh nhn c s quan tm ln nht trong cng ng nghin cu, l mt trong nhng ch chnh tho lun chnh ca cc hi ngh hng u th gii nh CVPR, ICCV, ECCV Phn loi nh l bi ton cho trc tp d liu hun luyn gm nhiu lp vi s lng c nh, mi nh c nh nhn thuc v mt lp nht nh, yu cu ca bi ton l xy dng m hnh nh cng nh phng php hc c th xc nh chnh xc nhn ca cc nh trong tp d liu kim tra. Tp d liu hun liu v tp d liu kim tra l hai tp d liu ring bit, khng c phn chung. Trong qu trnh hun luyn, cc nh hun luyn s c nhn tng ng i km. Ngc li, trong qu trnh kim tra, chng trnh s xc nh nhn ca tng nh c b nhn i km, kt qu s c so snh vi nhn i km tng ng ca tng nh, nu ging nhau tc kt qu d on ca chng trnh l chnh xc, ngc li l sai.

37

M hnh ton cho bi ton phn loi nh c th din t nh sau: cho tp d liu (dataset) D cha M nh X = {X1, X2, , XM} c nh ngha trn N lp ng ngha Y = {Y1, Y2, , YN}, trong mi nh Xi thuc X c phn loi vo mt lp duy nht Yi Y hay ta c th ni nh Xi c gn nhn Yi. Chn ngu nhin k nh t mi lp ng ngha cho trc (N lp) to tp d liu hun luyn DTrain. Ghi ch, mi lp ta c th chn s lng nh khc nhau lm d liu hun luyn, nhng thng thng ngi ta thng chn s lng nh ca mi lp l bng nhau trnh i hin tng bias d liu (tm dch hin tng chim u th v s lng ca mt lp so vi lp khc) khng cn thit. Tp cc nh cn li c gi l tp d liu kim tra DTest. Mc tiu ca bi ton l t tp nh d liu hun luyn DTrain, ta tm b phn lp F nhn thng tin u vo l Xi v tr v Yi tng ng sao cho khi thc hin vic kim tra b phn lp F trn tp d liu DTest t chnh xc cao nht c th. Giai on hun luyn: t tp d liu DTrain = {(Xi, Yi) | Xi thuc tp nh c chn, Yi l nhn tng ng ca Yi}. Mc tiu l hc b phn lp F: X Y. Giai on kim tra: trn tp d liu DTest = {(Xj, Yj) | Xj thuc tp nh khng c chn hun luyn, Yj l nhn tng ng ca Xj}. Dng b phn lp F c hc t tp hun luyn v nh kim tra Xj, ta thu c kt qu phn lp l Zj = F(Xj). Nu Zj=Yj th kt lun l vic phn lp cho nh Xj l ng, ngc li nu Zj Yj th vic phn lp cho nh Xj l sai.

5.1.2 ng dng ca phn loi nh


T tp d liu hc tp d liu c gn nhn do con ngi xy dng, h thng c th hc thc hin vic phn lp cho mt lng v hn cc nh mi cn c gn nhn. Ni cch khc, ta ch thc hin gn nhn th cng trn mt s lng nh nh (xy dng tp d liu hc) hay c th xem thao tc ny nh s nh ngha v mt ng ngha cho h thng hot ng sau ny, t h thng s gip ta gn nhn t ng cho s lng nh mi ty . Phn loi nh l mt phn quan trng trong vic xy dng h thng truy vn nh da trn thng tin nh. y l mt trong nhng ng dng quan trng, do s lng

38

nh cng ngy cng ln, pht trin theo cp s m nh c th thy qua cc website chia s nh nh Flickr, Picasa v.v, trong khi thi im lun vn c vit, cc my tm kim nh Google, Bing, Yahoo v.v ch c th da trn thng tin vn bn (text), cc h tr truy vn trn nh thc ra ch c thc hin bng cch truy vn vn bn trn nhng nh du ng ngha (tag) bng vn bn ca ngi s dng to cho nh, hoc tn ca tp tin nh thay v ni dung m nh cha. Hn na, gii quyt bi ton phn loi nh tc gii quyt bi ton lm th no biu din i tng trong nh tt, cng nh tm c o tng ng thch hp gia cc i tng, do vy n c th m rng gii cc bi ton quan trng khc trong lnh vc th gic my tnh nh pht hin i tng, nhn dng i tng v.v

5.1.3 Nhng thch thc ca bi ton phn loi nh


Phn loi nh l bi ton c rt nhiu thch thc. Do mi nh ch c gn nhn thuc v mt lp nht nh, v trong t nhin, cc i tng cng nh phong cnh t nhin rt a dng v phong ph, do vy thng tin v nhn ch cung cp nh phn thng tin khi qut v nh, ni mt cch khc, y l ng ngha ca nh. Trong khi, mi nh l mt th hin c th ca lp ng ngha v c lu tr theo tng pixel, do vy s c nhng thch thc sau:
Khc bit v gc nhn Khc bit v chiu sng i tng b che ph mt phn Khc bit v ln ca i tng S sai bit ca i tng (hnh c v li) i tng b nhiu do thng tin nn S a dng v th hin ca i tng

5.1.4 Cc hng tip cn


C hai hng tip cn chnh cho bi ton phn loi nh: (i) da trn c trng, (ii) da trn phng php hc c th phn loi cc i tng.

39

5.1.4.1

Hng tip cn da trn c trng

Trong cng trnh [26], Nilsback v cc ng s kho st nhiu loi c trng cho bi ton phn loi i tng trn c s d liu Oxford Flower, t mu sc n hnh dng ca hoa, cng nh xy dng nhng c trng da trn vic phn on (segmentation) nh, rt trch ra cc i tng, loi b vng nn (background) c th gy nhiu cho vic phn loi, v xy dng c trng trn vng i tng c phn on ny: nh SIFT-Internal, SIFT-Boundary (c hai c trng ny da trn c trng SIFT c cng b bi Lowe trong [20][21]). Thm na, SIFT-Internal c bit l c trng tt nht cho bi ton phn loi i tng trn c s d liu Oxford Flower. Trong cng trnh [27], Oliva v Torralba xut c trng ton cc GIST m hnh cho cc nh cha cnh gii quyt bi ton phn loi cnh nh trn c s d liu MIT Scene hay UIUC Scene (UIUC Scene l c s d liu m rng t c s d liu MIT Scene). GIST c bit l c trng tt nht cho bi ton phn loi cnh. Ngoi ra, trong cc cng trnh [4][12][15] cng trnh by cc th nghim trong vic kt hp nhiu loi c trng li vi nhau miu t c nh tt hn. Nhng vic kt hp ca nhiu loi c trng yu cu chn la b trng s cho mi c trng khi kt hp li vi nhau t hiu qu tt nht. Vn ny dn n cc bi ton lin quan n vic xy dng phng php hc, Gehler v Nowozin [12] xut s dng phng php hc MKL hoc LP-B (Linear Programming Boosting) xc nh b trng cho cc c trng ngay trong qu trnh hc, thm na MKL cng c dng kt hp cc c trng nh trong [33][37]. Da trn vic kho st cc cng trnh cng b trn bi ton phn loi nh nh trong [7][8][13][14][18][34][36], c trng SIFT l c trng c s dng ph bin nht v t hiu qu cao cho bi ton phn loi nh.
5.1.4.2 Hng tip cn da trn phng php hc

Fergus v cc ng s [10] s dng tp hp cc thnh phn khc nhau ca i tng v xy dng th m hnh i tng da trn mi quan h v v tr gia

40

cc thnh phn cng nh s hin din ca cc thnh phn i tng trong nh. tng ny pht trin t tng ca m hnh mi lin h gia cc thnh phn (pictorial) c s dng trong cng trnh ca Fischler v Elschlager [11] nh c minh ha trong Hnh 3, im khc l cch tip cn ca Fergus ch gi li nhng phn m cn thit c th phn bit gia cc lp i tng vi nhau, do vy n c th trnh vic m hnh nhng phn khc nhau c tnh ton cc (tc vn cng mt lp i tng, nhng cc i tng li c nhiu th hin khc nhau hay s a dng ca th hin ca cng mt i tng), m hnh ny cn c gi l m hnh chm sao (constellation model).

Hnh 3: M hnh mi lin h gia cc thnh phn (Pictorial)

Felzenszwalb v cc ng s [9] xut phng php latent-SVM xy dng m hnh quyt nh cho vic m hnh i tng m s dng tp ca cc phn i tng nhiu phn gii khc nhau da trn HoG [5] (deformable part model). Ngoi ra, xy dng b hc da trn kernel o tng ng ca cc i tng h tr cho vic phn loi l hng tip cn c nhiu nh nghin cu quan tm [13][18][34][36]. Ni bt trong hng tip cn ny l m hnh BoF v SPM nh c trnh by trong Chng 3. Kernel xut HSMK theo hng tip cn kernel, p dng cho bi ton phn loi nh. HSMK l s ci tin ca SPMK c th tnh ton tng ng gia nh tt hn di nhiu thch thc ca bi ton phn loi nh.

41

5.2 Thc nghim


Hu ht cc phng php tip cn gn y cho bi ton phn loi nh (image categorization) u s dng c trng cc b bt bin miu t nh, bi v cc c trng ny d miu t v so khp khi i tng hoc cnh trong hnh c nhiu thay i v gc nhn (viewpoints), c chiu sng khc nhau, do chp cc thi im khc nhau trong ngy (illuminations) hoc s phc tp ca thng tin nn trong nh (background clutter). Trong nhng c trng bt bin cc b, SIFT [20] l mt trong nhng c trng hiu qu v c s dng ph bin nht. t c tnh phn bit cao, ti s dng SIFT vi mt dy (dense SIFT) bng cch s dng b miu t SIFT trn vng ch nht con 16 x 16 v tnh ton trn tt c cc im nh (pixel) thay v ch tnh trn cc im quan trng (key points) nh trong [20] hay trn mt li cc im nh trong [18]. Thm na, tng kh nng m rng, dense SIFT c tnh ton trn nh mc xm. c trng cc b vi phn b dy mang n kh nng miu t thng tin ca nhng vng ng nht, v d nh bu tri, nc hay c m nhng trng hp ny im quan trng (key points) khng tn ti. Hn na, s kt hp ca c trng phn b dy (dense features) v m hnh C2F cho php nh c miu t chnh xc hn bi v nhng c trng cc b c th miu t y hn thng tin ca nhng vng ln cn (neighbor information) trn nhiu cp khc nhau ca resolution nh. Sau , thut ton phn cm khng gim st Kmeans c s dng trn tp con ngu nhin cha cc c trng SIFT xy dng tp cc t hnh nh, cn c gi l t in. T cc kt qu trong cc cng trnh c cng b, ti thc hin th nghim vi hai knh c khc nhau ca t in l M bng 400 v 800. chng minh s hiu qu ca HSMK, ti thc hin trn 2 th hin khc nhau ca bi ton phn loi nh: phn loi i tng (object categorization) v phn loi cnh (scene categorization). i vi bi ton phn loi i tng, chng ti s dng c s d liu Oxford Flower, v chng minh kh nng m rng cng nh tnh hiu qu ca HSMK trn c s d liu ln, ti cng thc hin th nghim trn hai c s d liu ln cho bi ton phn loi i tng l: CALTECH-101 [7], v

42

CALTECH-256 [14]. i vi bi ton phn loi cnh, ti nh gi hiu qu ca HSMK trn c s d liu MIT scene [27] v UIUC scene [18].

5.2.1 Phn loi i tng


5.2.1.1 C s d liu Oxford Flowers:

C s d liu ny cha 17 lp cc loi hoa thng dng ca Vng quc Anh, c thu thp bi Nilsback v cc ng s [25]. Mi lp ch 80 nh hoa c chp vi cc kch thc khc nhau, gc nhn khc nhau cng nh c iu kin chiu sng khc nhau. Hn na, hoa trong cng 1 lp nh Irises, Fritillaries v Pansies c s a dng rt ln v mu sc cng nh hnh dng, v trong mt s trng hp tng ng gia cc lp hoa khc nhau rt gn nhau, v d nh gia Dandelion v ColtsFoot. Hnh 4 minh ha mt s mu hoa trong c s d liu Oxford Flower. thc hin th nghim, ti s dng cch thit lp ca Gehler v Nowozin [12], chn ngu nhin 40 mu t mi lp hc (training set) v dng phn cn li cho vic kim tra (testing set), v khng s dng tp kim th (validation set) nh trong [25][26] chn tham s ti u cho h thng.
Bng 5.1: Bng so snh chnh xc phn lp (%) khi s dng mt c trng trn c s d liu Oxford Flower (vi NN k hiu cho thut ton phn lp lng ging gn nht: Nearest Neighbour)

Phng php HSV (NN) [26] SIFT-Internal (NN) [26] SIFT-Boundary (NN) [26] HOG [26] HSV (SVM) [12] SIFT-Internal (SVM) [12] SIFT-Boundary (SVM) [12] HOG (SVM) [12] SIFT (MSL) [34] Dense SIFT (HSMK)

chnh xc (%) 43.0 55.1 32.0 49.6 61.3 70.6 59.4 58.5 65.3 72.9

43

44

Hnh 4: Minh ha c s d liu Oxford Flower (17 lp)1

Bng 5.1 cho thy rng HSMK t c kt qu tt nht (state-of-the-art result) khi s dng mt loi c trng so vi cc hng tip cn c. N khng ch cho kt qu tt hn SIFT-Internal [26] m c bit l loi c trng tt nht cho c s d liu ny, vi lu l SIFT-Internel c tnh ton trn nh c segmentation, m cn tt hn c SPM vi h s ti u bng h thng hc t l MSL [34]. Thm na, Bng 5.2 cho thy rng kt qu t c t HSMK cng tt hn so vi SPMK.
Bng 5.2: Bng so snh chnh xc phn lp (%) gia HSMK v SPMK trn c s d liu Oxford Flower

Kernel SPMK HSMK


5.2.1.2

M = 400 68.09% 71.76%

M = 800 69.12% 72.94%

C s d liu CALTECH:

cho thy tnh hiu qu cng nh kh nng m rng, ti cng nh gi HSMK trn c s d liu ln CALTECH-101 v CALTECH-256.Nhng c s d liu ny c tnh a dng cc th hin trong cng mt lp rt ln, cng nh s a dng v gc nhn v c s phc tp ca nn trong nh. Thm na, nh trong Hnh 5 minh ha mt s mu trong c s d liu CALTECH-101, mi hng minh ha mt lp trong c s d liu, ta c th nhn thy hang th 4 th hin lp chair v hng th 5 th hin lp Windsor_chair rt ging nhau v hnh dng v c a dng trong cng lp. i vi c s d liu CALTECH-101, ti thc hin th nghim khi s dng 5, 10, 15, 20, 25, 30 mu hc hun luyn cho mi lp, bao gm c lp nn (background class) v s dng n 50 mu mi lp cho kim tra. Bng 5.3 so snh kt qu phn lp da trn HSMK v cc cch tip cn khc. C th thy rng, HSMK t kt qu tng ng (comparable result) vi kt qu tt nht (state-of-theart result) thm ch khi ch s dng mt loi c trng trong khi cc cch tip cn

C s d liu Oxford Flower c cung cp cho nghin cu khoa hc a ch: http://www.robots.ox.ac.uk/%7Evgg/data/flowers/17/17flowers.tgz

45

khc s dng kt hp ca nhiu loi c trng v thut ton hc phc tp nh MKL v linear programming boosting (LP-B) [12].

Hnh 5: Minh ha c s d liu CALTECH-1012

Bng 5.4 cho thy rng kt qu t c t HSMK cng tt hn so vi kt qu t t SPMK. Thm na, khi thc hin th nghim nu b lp nn th HSMK t c chnh xc 78.4% cho trng hp dng mi lp 30 mu hc. iu ny chng t, cch tip cn xut c tnh hiu qu cao mc d c thc hin n gin hn so vi cc cch tip cn t kt qu cao khc.
Bng 5.3: Bng so snh kt qu phn lp trn c s d liu CALTECH-101

5 (mu hc) Grauman & Darrell [13] Wang & Wang [34] Lazebnik et al. [18]
2

10 (mu hc) 44.0% -

15 (mu hc) 50.0% 61.4% 56.4%

20 (mu hc) 53.5% -

25 (mu hc) 55.5% -

30 (mu hc) 58.2% 64.6%

34.8% -

C s d liu CALTECH-101 c cung cp a ch: http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz

46

Yang et al. [36] Boimann et al. [4] Gehler & Nowozin (MKL) [12] Gehler & Nowozin (LP-Beta) [12] Gehler & Nowozin (LP-B) [12] Phng php xut (HSMK)

56.9% 42.1% 54.2% 46.5% 50.5%

55.1% 65.0% 59.7% 62.2%

67.0% 72.8% 62.3% 70.4% 66.7% 69.0%

67.1% 73.6% 71.1% 72.3%

70.5% 75.5% 73.8% 74.4%

73.2% 79.1% 73.7% 77.8% 77.2% 77.3%

Bng 5.4: Bng so snh chnh xc phn lp ca HSMK v SPMK trn c s d liu CALTECH-101

SPMK (M = 400) HSMK (M = 400) SPMK (M = 800) HSMK (M = 800)

5 (mu hc) 48.18% 50.68% 48.11% 50.48%

10 (mu hc) 58.86% 61.97% 59.70% 62.17%

15 (mu hc) 65.34% 67.91% 66.84% 68.95%

20 (mu hc) 69.35% 71.35% 69.98% 72.32%

25 (mu hc) 71.95% 73.92% 72.62% 74.36%

30 (mu hc) 73.46% 75.59% 75.13% 77.33%

Hnh 6 minh ha s a dng v th hin ca cc i tng trong c s d liu CALTECH-256, mi hng l mt lp trong c s d liu, CALTECH-256 l phin bn m rng ca CALTECH-101, nhng khng c chun ha nh trong CALTECH-101 nn s phc tp v nn l rt ln. V trn c s d liu CALTECH-256, ti thc hin th nghim vi HSMK khi s dng 15 v 30 mu t mi lp cho vic hc, bao gm c lp nn (clutter class) v 25 mu cho mi lp cho vic kim tra, cc mu u c chn ngu nhin t c s d liu CALTECH-256. Ti cng lp trnh li thut ton SPMK [14] nhng s dng c trng SIFT vi phn b dy t th nghim ca ti c th so snh cng bng v s hiu qu ca HSMK v SPMK. Nh trong Bng 5.5, HSMK cho chnh xc phn lp hn 3 phn trm so vi chnh xc ca SPMK.

47

48

Hnh 6: Minh ha c s d liu CALTECH-2563 Bng 5.5: Bng so snh kt qu phn lp trn c s d liu CALTECH-256

Kernel Griffin et al. (SPMK) [14] Yang et al. (ScSPM) [36] Gehler & Nowozin (MKL) [12] SPMK (vi Dense SIFT) Phng php xut (HSMK)

15 (mu hc) 28.4% 27.7% 30.6% 25.3% 27.2%

30 (mu hc) 34.2% 34.0% 35.6% 31.3% 34.1%

5.2.2 Phn loi cnh (scene categorization)


HSMK cng c th nghim trn c s d liu MIT Scene (gm 8 lp) v UIUC Scene (gm 15 lp). Trn nhng c s d liu ny, ti chn kch c ca b t in l M = 400. Hnh 7 minh ha mt s mu trong c s d liu MIT Scene, mi hng l mt lp, c s d liu UIUC Scene l s m rng ca MIT Scene, n bao gm 8 lp ca MIT Scene v b sung thm 7 lp na, nhng UIUC Scene cc nh l nh mc xm, trong khi MIT Scene th cha nh mu. Trn c s d liu MIT Scene, ti chn ngu nhin 100 mu t mi lp cho vic hun luyn v chn ngu nhin 100 mu khc trn mi lp cho vic kim th. Nh trong Bng 5.6, t l phn lp ca HSMK cao hn 2.5 phn trm so vi SPMK. Cch tip cn c xut cng cho kt qu cao hn s vi cc cch tip cn khc s dng c trng cc b [15] cng nh s kt hp ca nhiu c trng cc b [15] trn 10 phn trm, v cng tt hn so vi cch s dng c trng ton cc GIST [27] m c bit nh c trng tt nht trong vic phn loi cnh.
Bng 5.6: Bng so snh kt qu phn lp trn c s d liu MIT Scene (8 lp)

Phng php GIST [27] c trng cc b [15] Dense SIFT (SPMK) Dense SIFT (HSMK)

chnh xc (%) 83.7 77.2 85.8 88.3

C s d liu CALTECH-256 c cung cp a ch: http://www.vision.caltech.edu/Image_Datasets/Caltech256/

49

50

Hnh 7: Minh ha c s d liu MIT-Scene (8 lp)4

Trn c s d liu UIUC Scene5, ti thit lp th nghim nh miu t trong cng trnh ca Lazebnik v cc ng s [18]. Chn ngu nhin 100 mu t mi lp cho vic hc v kim tra tt c cc mu cn li trong c s d liu. T Bng 5.7 cho thy, kt qu t HSMK cng tt hn so vi kt qu ca SPMK [18] v SPM da trn biu din tha (sparse coding) [36].
Bng 5.7: Bng so snh kt qu phn lp trn c s d liu MIT Scene

Phng php Lazebnik et al. [18] Yang et al. [36] SPMK Phng php xut (HSMK)

chnh xc (%) 81.4 80.3 79.9 82.2

5.2.3 Th nghim Sparse Coding cho Hierarchical Spatial Matching Kernel (ScHSMK)
nng cao hiu qu phn lp, thay v s dng Kmeans thnh lp t in, v thng k t vng, ti th nghim s dng m hnh biu din tha (sparse coding) kt hp vi HSMK trn hai c s d liu phn loi i tng l Oxford Flower v CALTECH-101.
5.2.3.1 ScHSMK trn c s d liu Oxford Flower

i vi c s d liu Oxford Flower, ti s dng kch c ca t in l M=800, v trong qu trnh tnh HSMK, ti th nghim vi trng hp dng kernel tuyn tnh (tch ni) thay cho cng thc (4.2) intersection kernel v gi l Linear Hierarchial Spatial Matching Kernel (HSMK-L), cc tham s th nghim khc c s dng nh trong phn 5.2.1.1. Bng 5.8 cho thy kt qu phn lp s dng biu din tha
C s d liu MIT-Scene c cung cp a ch: http://people.csail.mit.edu/torralba/code/spatialenvelope/spatial_envelope_256x256_static_8outd oorcategories.zip 5 C s d liu UIUC-Scene c cung cp a ch: http://www-cvr.ai.uiuc.edu/ponce_grp/data/scene_categories/scene_categories.zip
4

51

(Sparse Coding) lun cho kt qu tt hn so vi khi s dng lng t ha vector (vector quantization) (v d nh s dng thut ton Kmeans). Trong trng hp s dng biu din tha (sparse coding) th HSMK cng tt hn so vi SPMK khong 2 phn trm c khi s dng kernel tuyn tnh hay intersection kernel cho tnh ton c bn trn cc vng con nh trong cng thc (4.2). Thm na t Bng 5.8, ta c th thy biu din tha (sparse coding) c xu hng lm cho cc c trng t c tnh tuyn tnh nhiu hn so vi lng t ha vector (vector quantization), nn khi ta thay intersection kernel bng kernel tuyn tnh, kt qu khng thay i ng k.
Bng 5.8: Bng so snh kt qu phn lp s dng Sparse Coding so vi s dng vector quantization (Kmeans) trn Oxford Flower

Phng php SPMK Sparse Code + SPMK + Linear kernel (ScSPMK-L) Sparse Code + SPMK + Intersection kernel (ScSPMK) HSMK Sparse Code + HSMK + Linear Kernel (ScHSMK-L) Sparse Code + HSMK + Intersection kernel (ScHSMK)

chnh xc (%) 69.12 71.18 73.09 72.94 73.82 75.00

5.2.3.2

ScHSMK trn c s d liu CALTECH-101

i vi c s d liu CALTECH-101, ti s dng hai loi kch c ca t in l M=400 v M=800. V th nghim cho hai trng hp v s lng mu hc l 15 v 30 mu hc cho mi lp, cc tham s khc nh trong th nghim phn 5.2.1.2 i vi c s d liu CALTECH-101 .Ti cng thc hin th nghim vi trng hp dng kernel tuyn tnh (tch ni) thay cho cng thc (4.2) intersection kernel nh trong th nghim phn 5.2.3.1. Bng 5.9 cho thy HSMK vi biu din tha (Sparse coding) t c kt qu ti u (state of the art) trn c s d liu CALTECH-101. HSMK lun tt hn SPMK khong 2 n 4 phn trm vi cng phng php to t vng lng t ha vector (vector quantization) hay biu din tha (sparse coding), s dng kernel tuyn tnh hay intersection kernel. Khi s dng

52

biu din tha (sparse coding) kt qu c ci thin so vi khi ch s dng lng t ha vector (vector quantization), iu ny c th gii thch qua cng thc ti u nh c trnh by trong phn 0. Biu din tha (Sparse coding) cng lm cho cc c trng tr nn tuyn tnh hn, nh c th thy kt qu phn lp khi s dng biu din tha (sparse coding) vi kernel tuyn tnh, ta c th thu c kt qu tt hn hoc ngang vi khi s dng lng t ha vector (vector quantization) vi intersection kernel.
Bng 5.9: Bng so snh kt qu phn lp s dng Sparse Coding so vi s dng vector quantization (Kmeans) trn CALTECH-101

SPM (M=400) HSMK (M=400) SPM (M=800) HSMK (M=800)

Vector quantization Sparse coding + linear kernel Sparse coding + intersection kernel Vector quantization Sparse coding + linear kernel Sparse coding + intersection kernel Vector quantization Sparse coding + linear kernel Sparse coding + intersection kernel Vector quantization Sparse coding + linear kernel Sparse coding + intersection kernel Boimain et al. [4]

30 mu hc 73.46 73.54 75.68 75.59 77.15 79.02 75.13 75.52 76.96 77.33 78.93
80.60 79.1

15 mu hc 65.34 67.91 66.84 68.95 72.14


73.44 72.8

53

Kt lun v kin ngh


Kt lun
Ti xut kernel tt v hiu qu c gi l hierarchical spatial matching kernel (HSMK). HSMK s dng m hnh th mn (coarse to fine C2F) trn vng con ci thin spatial pyramid matching kernel (SPMK), HSMK m t vng con tt hn da trn nhiu thng tin hn ca cc vng ln cn thng qua mt chui cc phn gii (resolution) khc nhau, do vy c th m t c thng tin tng qut resolution th, cng nh thng tin chi tit ca vng con phn gii (resolution) mn hn. Thm na, kernel HSMK c kh nng x l tt trn tp hp cc c trng khng th t nh SPMK v pyramid matching kernel (PMK) cng nh cc tp hp c s phn t khc nhau. S kt hp ca kernel xut vi c trng cc b c phn b dy (dense local feature) cho thy t c s hiu qu rt cao. M hnh trn cho php t kt qu t nht l tng ng hoc kt qu tt nht (state-of-the-art) so vi cc cch tip cn khc tn ti trn nhiu loi c s d liu t phn loi i tng nh Oxford Flower, CALTECH-101, CALTECH-256, n cc c s d liu phn loi cnh nh MIT Scene, UIUC Scene. Hn na, phng php xut n gin bi v n ch s dng mt loi c trng cc b vi SVM phi tuyn, trong khi cac phng php tip cn khc gn y phc tp hn rt nhiu m da trn multiple kernel learning (MKL) hoc s kt hp ca nhiu loi c trng (feature combinations). Trn cc c s d liu chun v phn loi i tng v phn loi cnh, cch tip cn xut cho kt qu tt hn SPMK. Thm na, SPMK l mt thnh phn quan trng trong nhiu h thng t kt qu tt nht hin nay, v d nh dng trong vic xy dng cc kernel c bn trong m hnh hc MKL. iu ny c ngha l ta c th thay th SPMK bng HSMK tng chnh xc ca h thng c xy dng da trn cc kernel c bn. Khi s dng biu din tha (Sparse coding) thay cho lng t ha vector (vector quantization) th tnh hiu qu ca HSMK c ci thin thm na, c th t kt

54

qu ti u trn c s d liu CALTECH-101 (c s d liu quan trong cho vic nh gi phn loi nh).

Kin ngh
Nghin cu v mt l thuyt s nh hng ca m hnh th mn (coarse to fine C2F) cho vic biu din nh v xy dng kernel. Nghin cu v l thuyt s tc ng ca biu din tha (sparse coding) ln nhiu phn gii (multi-resolution) trong HSMK.

55

Danh mc cng trnh ca tc gi6


Trong nc: [1] L Thanh Tm, Trn Thi Sn, Seiichi Mita (2009), Pht hin v phn loi bin bo giao thng da trn SVM trong thi gian thc, Hi ngh Cng Ngh Thng Tin v Truyn Thng (ICTFIT), Thnh ph H Ch Minh, Vit Nam. Quc t: [1] Tam T. Le, Son T. Tran, Seiichi Mita, Thuc D. Nguyen (2010), Realtime Traffic Sign Detection Using Color and Shape-Based Features, The 2nd Asian Conference on Intelligent Information and Database Systems, Lecture Notes in Artificial Intelligence 5991, Hue, Vietnam. [2] Tam T. Le, Yousun Kang, Akihiro Sugimoto, Son T. Tran, Thuc D. Nguyen (2011), Hierarchical Spatial Matching Kernel for Image Categorization, International Conference on Image Analysis and Recognition (ICIAR), Burnaby, BC, Canada. (accepted)

Cc bi bo trn c http://sites.google.com/site/lttamvn
6

lu

tr

trn

trang

web

nghin

cu

nhn:

56

Ti liu tham kho


Ting Anh [1] N. Aronszajn. (1950), Theory of reproducing kernels, Transaction American Mathematics Society, vol. 68:337-404. [2] S. Boyd, and L. Vandenberghe. (2004), Convex Optimization, Cambridge University Press, Cambridge, England. [3] C. Cortes, and V. Vapnik. (1995), Support Vector Networks, in Machine Learning, vol. 10(3):273-297. [4] O Boiman, E Shechtman, and M Irani. (2008),"In defense of nearestneighbor based image classiffication," in CVPR. [5] N Dalal and B Triggs. (2005),"Histograms of oriented gradients for human detection," in Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. [6] C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka. (2004),"Visual categorization with bags of keypoints," in ECCV International Workshop on Statistical Learning in Computer Vision. [7] L Fei-Fei, R Fergus, and P Perona. (2004),"Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories," in Workshop on Generative-Model Based Vision. [8] Li Fei-Fei and P Perona. (2005),"A bayesian hierarchical model for learning natural scene categories," in Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), Washington, DC, USA, p. Volume 2. [9] P. Felzenszwalb, D. Mcallester, and D. Ramanan. (June 2008),"A discriminatively trained, multiscale, deformable part model," in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska.

57

[10] R Fergus, P Perona, and A Zisserman. (2003),"Object class recognition by unsupervised scale-invariant learning," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2. [11] M Fischler and R Elschlager. (1973)"The representation and matching of pictorial structures," IEEE Transactions on Computers, pp. 67-92. [12] P Gehler and S Nowozin. (2009),"On feature combination for multiclass object classiffication," in ICCV, pp. 221-228. [13] K Grauman and T Darrell. (2005),"The pyramid match kernel: discriminative classiffication with sets of image features," in ICCV, pp. 1458-1465. [14] G Griffin, A Holub, and P Perona (2007)"Caltech-256 object category dataset," Technical Report 7694, California Institute of Technology, USA. [15] M Johnson. (2008), "Semantic Segmentation and Image Search," PhD Thesis, University of Cambridge, UK. [16] M Kloft, U Brefeld, P Laskov, and S Sonnenburg. (2008),"Non-sparse multiple kernel learning," in NIPS Workshop on Kernel Learning: Automatic Selection of Kernels. [17] R.I Kondor and T Jebara. (2003),"A kernel between sets of vectors," in ICML, pp. 361-368. [18] G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. Jordan. (2004), Learning the Kernel Matrix with Semidefinite Programming, in Journal of Machine Learning Research, vol. 5:27-72. [19] S. Lazebnik, C. Schmid, and J. Ponce. (2006),"Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories," in CVPR, vol. 2. [20] David G Lowe. (2004)"Distinctive Image Features from Scale-Invariant keypoints," International Journal of Computer Vision, vol. 60 (2): pp 91110.

58

[21] David G Lowe. (1999),"Object recognition from local scale-invariant features," in International Conference on Computer Vision, Corfu, Greece. [22] J Mairal, F Bach, J Ponce, and G Sapiro. (2009),"Online dictionary learning for sparse coding," in ICML, pp. 689-696. [23] S Maji, A Berg, and J Malik. (2008),"Classiffication using intersection kernel support vector machines is efficient," in CVPR, pp. 1-8. [24] F Moosmann, B Triggs, and F Jurie. (2008),"Randomized clustering forests for building fast and discriminative visual vocabularies," in NIPS Workshop on Kernel Learning: Automatic Selection of Kernels. [25] M.E Nilsback and A Zisserman. (2006),"A visual vocabulary for ower classiffication," in CVPR, vol. 2, pp. 1447-1454. [26] M.E Nilsback and A Zisserman. (2008),"Automated ower classiffication over a large number of classes," in ICVGIP. [27] A Oliva and A Torralba. (2001)"Modeling the shape of the scene: A holistic representation of the spatial envelope," in IJCV, pp. 145-175. [28] A. Rakotomamonjy, F. Bach, Y. Grandvalet, and S. Canu. (2008) SimpleMKL, in Journal of Machine Learning Research, vol. 9:24912521. [29] B Scholkopf, and A.J Smola. (2002) Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, USA. [30] S. Sonnenburg, G. Ratsch, C. Schafer, and B. Scholkopf. (2006) Large Scale Multiple Kernel Learning, in Journal of Machine Learning Research. [31] V. Vapnik, and A. Lerner. (1963), Pattern recognition using generalized portrait method, in Automation and Remote Control, 24, 774-780. [32] V. Vapnik, and A. Chervonenkis. (1964), A note on one class of perceptrons, in Automation and Remote Control, 25.

59

[33] M. Varma and D. Ray. (2007),"Learning the discriminative powerinvariance trade-off," in IEEE 11th International Conference on Computer Vision. [34] S.C Wang and Y.C.F Wang. (2010),"A multi-scale learning framework for visual categorization," in ACCV. [35] L Yang, R Jin, R Sukthankar, and F Jurie. (2008),"Unifying discriminative visual code-book generation with classiffier training for object category recognition," in CVPR, Los Alamitos, CA, USA, vol. 0, pp. 1-8. [36] J Yang, K Yu, Y Gong, and T Huang. (2009),"Linear spatial pyramid matching using sparse coding for image classiffication," in CVPR, pp. 1794-1801. [37] Q Yuan, A Thangali, V Ablavsky, and S Sclaroff. (2008),"Multiplicative kernels: Object detection, segmentation and pose estimation," in Computer Vision and Pattern Recognition.

You might also like