You are on page 1of 8

1.1.

Cc b phn loi c bn
Cy quyt nh
Mt cy quyt nh l mt cu trc cy, trong mi node trong biu
th cho mt php phn nhnh tng ng cho mt thuc tnh, mi nhnh biu
th cho mt kt qu ca mt php th, cc node l biu th cho lp hoc cc
phn b lp. Node trn cng trong mt cy c gi l gc. Minh ha cho cy
quyt nh, hnh 2-1 ly li v d phn lp t bo ung th vi node trong c
biu din bng hnh ch nht, node l c biu din bng hnh ellipse .
phn lp mt mu cha bit, nhng gi tr thuc tnh ca mu
c th ngc li trn cy quyt nh. Mt ng dn t gc n mt node
l l c s cho vic d on lp ca mt mu. Cy quyt nh c th d dng
chuyn i sang mt tp cc lut phn lp. C s ton hc ca cy quyt nh
l thut ton tham lam, thut ton ny xy dng cy quyt nh quy t
trn xung di, theo phng php chia tr.
Mng Bayes
Bayesian l phng php phn lp da vo thng k. Ta c th d on
xc sut ca cc lp trong tp d liu, da vo xc sut ny c th xp cc
mu vo cc lp ring bit. Thut ton phn lp Bayesian gi thit rng gi tr
cc thuc tnh ca mt lp c lp vi gi tr ca cc thuc tnh khc, gi thit
ny cn c gi l lp c lp c iu kin, n lm n gin cc tnh ton
sau ny. Mng Bayesian l mt th, trn th cho php biu din mi
quan h gia cc thuc tnh.
Support Vector Machine
SVM l mt phng php mi phn lp d liu. N d s dng hn
mng neural, tuy nhin nu khng s dng n chnh xc th d b b qua mt
s bc n gin nhng cn thit, dn n kt qu khng c tha mn. Mc
ch ca phng php SVM l pht sinh ra mt m hnh t tp mu hc, m
hnh ny c kh nng d on lp cho cc mu th. SVM tm ra mt hm
quyt nh phi tyn trong tp mu hc bng cch nh x hon ton cc mu
hc vo mt khng gian c trng kch thc ln c th phn lp tuyn tnh
v phn lp d liu trong khng gian ny bng cch cc i khong cch l
(geometric margin) v cc tiu li hc cng mt lc.
1.2.Cc vn ca gii thut phn loi
C 2 vn xy ra vi kt qu d on ca cc b phn loi l kt
qu d on b lch (bias), tc l kt qu c thin hng sai ging nhau v kt

qu d on qu khc bit nhau (variance). Tng tng rng chng ta c cc


tp d liu hun luyn khc nhau, v tt nh nhau. Mt thut ton c coi l
d on lch (bias) vi mt d liu u vo x nu khi hun luyn thut ton
vi mi tp d liu hun luyn, kt qu ca n s sai mt cch c h thng
khi d on nhn u ra ca x. Mt thut ton c coi l variance vi d liu
u vo x nu cc kt qu d on nhn u ra ca n khc nhau khi c
hun luyn vi cc tp d liu u vo khc nhau. Li d on ca mt b
phn loi chnh l tng li bias v variance ca thut ton hc my m n s
dng.

(a) Bias cao, variance


p th ( b) Bias p,
th variance

(c) Bias cao, variance cao ( d) Bias p,


th variancepth

Hnh 2.1. Biu din trc quan bias v variance


Hnh 2.1 biu din bias, variance bng cc mi tn trn bng tiu. Bias
c biu din bng nhng mi tn chch ch mt cch h thng v mt
pha. Cc mi tn cng c xu hng chch v mt pha th bias cng cao.
Variance c biu din bng s phn b ca cc mi tn. Cc mi tn c
phn b cng xa nhau th variance cng cao. Vy mc ch ca vic xy dng
mt b phn loi tt l tm ra mt phng php c bias v variance ca n
thp nht.
C mt phng php ph bin gim variance l xy dng mt tp
cc b phn loi n l, sau thc hin biu quyt da trn kt qu phn loi
ca chng vi d liu u vo. Ngha l vi mt thut ton, ta s xy dng
nhiu hn 2 b phn loi, v kt qu phn loi cui cng s l kt qu c
tm thy ca nhiu b phn loi nht. C rt nhiu cch to ra cc b phn
loi n l, v cng c nhiu m hnh biu quyt gia chng. Mt tng
c xut l la chn mt thut ton hc my c bias thp, sau s dng

phng php biu quyt lm gim variance ca n. Chng sau s m t


chi tit hn v cch thc to ra v biu quyt kt qu ca cc b phn loi
n l .
1.3. B kt hp cc b phn loi
Khi nim kt hp cc b phn loi
B kt hp cc b phn loi (Ensemble) l tp hp ca cc b phn loi
c bn, trong mi b phn loi c bn c th l mt mt b phn loi c
in nh: cy quyt nh, naives bayes, mng n-ron,... Khi mt v d mi
c phn loi, n c x l bi cc b phn loi c bn ca b kt hp m
kt qu ca chng c kt hp theo mt cch no a ra d on cui
cng ca b kt hp i vi v d . Chng ta mun c cc b phn loi c
bn phn loi tt v kt qu ca cc b phn loi c bn khng c tng
quan cao vi nhau.
Cc cch tip cn phng php kt hp cc b phn loi
C hai cch tip cn b kt hp:
Th nht l xy dng mi b phn loi mt cch c lp vi nhau,
sau s dng phng php biu quyt chn ra kt qu cui
cng ca b kt hp. Tc l mi b phn loi c bn s c xy
dng c lp vi cc b phn loi khc bng cch thay i tp d
liu hun luyn u vo, thay i cc c trng trong tp hun
luyn.
Th hai l xy dng cc b phn loi c bn v gn trng s cho cc
kt qu ca mi b phn loi. Vic la chn mt b phn loi c bn
nh hng ti vic la chn ca cc b phn loi c bn khc v
trng s c gn cho chng.
PHNG PHP BAGGING
M hnh hot ng ca Bagging
Bagging s dng cch tip cn th nht. Bagging to ra cc b phn
loi t cc tp mu con c lp t tp mu ban u v mt thut ton hc my,
mi tp mu s to ra mt b phn loi c bn.
Cc b phn loi s c kt hp bng phng php biu quyt theo s
ng. Tc l khi c mt v d cn c phn loi, mi b phn loi s cho ra

mt kt qu. V kt qu no xut hin nhiu nht s c ly lm kt qu ca


b kt hp.
Thut ton Bagging
Bagging to ra N tp hun luyn c chn c lp t tp d liu hun
luyn ban u. Trong cc v d hun luyn c th c chn hn mt ln
hoc khng c chn ln no. T mi tp hun luyn mi, Bagging cho chy
vi mt thut ton hc my Lb sinh ra M b phn loi c bn hm. Khi c
mt v d phn loi mi, kt qu ca b kt hp s l kt qu nhn c nhiu
nht khi chy M b phn loi c bn.

Hnh 2.3: M hnh hot ng ca Bagging


Trong hnh 2.3, b 3 mi tn bn tri m t vic ly mu 3 ln c lp.
B 3 mi tn tip theo m t vic gi thut ton hc m hnh trn 3 v d
to ra 3 m hnh c bn.
Bagging tr li hm h(x) c biu quyt ln nht trong cc h1,h2,
.,hM. phn lp cc v d mi bng vic tr li lp y trong tp cc lp c th
Y. Trong hnh 2.3, c 3 b phn loi c bn biu quyt ra p n cui cng.
Trong bagging, cc tp hun luyn M c to ra khc nhau. Nu s khc
nhau ny dn n s khc nhau ca M m hnh c bn trong khi hiu
nng ca cc m hnh tt th th b kt hp c hiu nng tt hn cc m
hnh c bn.
PHNG PHP BOOSTING
M hnh hot ng ca Boosting
Khc vi phng php Bagging, xy dng b phn loi kt hp vi cc
v d hun luyn c trng s bng nhau, phng php Boosting xy dng b
phn loi kt hp vi cc v d hun luyn c trng s khc nhau. Sau mi
bc lp, cc v d hun luyn c d on sai s c nh trng s tng
ln, cc v d hun luyn c d on ng s c nh trng s nh hn.

iu ny gip cho Boosting tp trung vo ci thin chnh xc cho cc v d


c d on sai sau mi bc lp.
Thut ton Boosting
Mt thut ton boosting ban u c nh ngha l mt thut ton
dng chuyn mt thut ton hc my yu thnh mt thut ton hc my
mnh. C ngha l n chuyn mt thut ton hc my gii quyt mt bi ton
phn loi 2 lp tt hn cch gii chn ngu nhin thnh mt thut ton gii
quyt rt tt bi ton . Thut ton boosting ban u ca Schapire l mt
thut ton quy. Ti bc cui ca quy, n kt hp cc gi thuyt c
to bi thut ton hc my yu. Xc sut li ca b kt hp ny c chng
minh l nh hn xc sut li ca cc gi thuyt yu.
Adaboost l mt thut ton kt hp mt tp cc b phn loi c lm
a dng bng vic chy thut ton hc my vi phn b khc nhau trn tp
hun luyn.
Thut ton AdaBoost
Thut ton AdaBoost l thut ton boosting chng ta dng to 1
chui cc m hnh c bn phn b trng s khc nhau trn tp d liu hun
luyn. Thut ton Adaboost c m t trong hnh 2.4.

Hnh 2.4: Thut ton AdaBoost


GII THUT RANDOM FOREST
Thut ton RF - Random Forest l mt thut ton c bit da trn k
thut lp ghp, V bn cht thut ton RF c xy dng da trn nn tng
thut ton phn lp cy phn loi v hi quy, s dng k thut c tn gi l
bagging. Thut ton ny cho php la chn mt nhm nh cc thuc tnh ti
mi nt ca cy phn chia cho mc tip theo ca cy phn lp. Bng cch
chia nh khng gian tm kim thnh cc cy nh hn nh vy cho php thut
ton c th phn loi mt cch rt nhanh chng cho d khng gian thuc tnh
rt ln. Cc tham s u vo ca thut ton kh n gin bao gm cc thuc
tnh c chn trong mi ln phn chia. Gi tr mc nh ca tham s ny l
cn bc hai ca p vi p l s lng cc thuc tnh. S lng cy c to ra l
khng hn ch v cng khng s dng bt k k thut no hn ch m
rng cy. Phi la chn tham s cho bit s lng cy s c sinh ra sao cho
m bo rng s mi mt thuc tnh s c kim tra mt vi ln. Thut ton
s dng k thut out of bag xy dng tp hun luyn v phng php
kim tra trn n.
Lch s ca thut ton Random Forest
Thut ton to mt rng ngu nhin c pht trin bi Leo Breiman
v Adele Cutler, thut ng Random Forest c ly lm tn ph bin cho
thut ton ny. Thut ng RF c xut ln u tin nm 1995, sau kt hp
vi phng php bagging trong la chn cc thuc tnh ngu nhin ca Leo
Breiman nm 1996 xy dng phng php chn cc cy quyt theo cc
thay i c th kim sot c. Nm 2001 Breiman xy dng thut ton RF
c b sung thm mt lp ngu nhin phn lp. Ngoi vic xy dng mi
cy s dng cc mu d liu khc nhau, cc rng ngu nhin c thay i
xy dng cc cy phn loi v hi quy khc nhau. Cc gi th vin ci t
thut ton RF c xy dng bng ngn ng Fortran bi Leo Breiman v
Cutler c th tham kho ti ng dn: (http://www.stat.berkeley.edu/).
Thut ton Random Forest
V c bn thut ton Random Forest (RF) rng ngu nhin da trn
k thut cy quyt nh. tng ca RF chng ta c th lin tng ti vic
bu c theo nguyn tc ph thng u phiu. Nu s dng mt cy quyt nh
chng khc no vic bu c m ch c 1 ngi b phiu. Vic sinh cc cy
quyt nh t mt mu d liu nhm a dng ho cc phiu bu (ging nh
vic mi thnh phn, tng lp, giai cp u c i b phiu) cho kt lun.
Vic p dng cc k thut sinh ra cc mu d liu hay vic la chn r nhnh

ngu nhin s to ra cc cy d tt trong rng (ging vic cho php cng dn


khng cn phn bit trnh hc vn, sc khe... i bu c). Cng nhiu loi
hnh, cng nhiu phiu bu s cung cp cho chng ta ci nhn a chiu, chi tit
hn v do kt lun s c tnh chnh xc, gn vi thc t hn.
nh ngha: Mt RF l mt b phn loi gm mt tp cc b phn loi
c cu hnh cy { h(x, ), k=1,} trong { } l cc vecto ngu nhin,
c lp, c cng phn b xc sut, mi cy bu c mt phiu cho lp ph bin
nht ti u vo x [5].
c tnh ca thut ton Random Forest
i vi rng ngu nhin, cn trn s bt ngun cho cc li pht sinh
di dng hai tham s, l cch xc nh tnh chnh xc (Strength - Accuracy)
v tnh tng quan (hay cn gi l nhy - Correlation) ca cc b phn loi
ring l c trong rng ngu nhin.
Hm tng quan nh sau:
mr(X, Y) = (h(X, ) = Y)

(h(X, ) = j)

Hm tnh chnh xc nh sau: s =

, mr(X, Y)

Gi s s 0 ta c bt ng thc sau:
var (mr)/
Cng thc th hin s dao ng ca mr nh sau:
Nu:
j (X, Y) = arg

(h(X, ) = j)

Th:
mr(X, Y)= (h(X, ) = Y) (h(X, ) = j(X, Y))
= [ I(h(X, )= Y) I( h (X, )= j (X, Y))]
Nh vy: trong rng ngu nhin, hai tiu ch nh gi phng php
phn loi l: tnh chnh xc ca tng cy v tnh tng quan gia cc cy
trong rng t l nghch vi nhau. Nu tnh tng quan gia cc cy trong rng
cng cao th chnh xc s gim. chnh xc v nhy nu ng tch
nhau th khng c ngha. Hai o ny c s tng quan nghch: chnh
xc cng cao th nhy cng thp v ngc li. Khi chnh xc hoc
nhy t gi tr ti thiu th cng l lc h thng mt kh nng phn loi. V
vy ngi ta phi kt hp hai o trn trong mt o thng nht, vn

t ra l lm sao c th cn bng hai tham s ny khi thc hin phn loi


t hiu qu cao nht. Theo cng thc chnh xc l t l phn trm cc lp
phn loi ng hoc cc lp phn loi khng li: (TP/ (TP + FP)). nhy l
t l phn trm cc lp phn loi sai hoc cc lp phn loi li: (TP/ (TP +
FN)). Ta c bng m t mi tng quan hai tiu ch ny nh di.

Thc t
M hnh d
on

Thuc

Khng thuc

Thuc

TPi

FPi

Khng thuc

FNi

TNi

Trong :
TPi l s lng mu c phn loi ng, v c phn vo lp ci.
FPi l s lng mu b phn loi sai, v c phn vo lp ci.
FNi l s lng mu c nhn thc l c i nhng khng c phn vo
lp ci.
TNi l s lng mu c nhn thc khc ci v khng c phn vo
lp ci.

You might also like