You are on page 1of 76

-1-

MC LC
M U .................................................................................................................................... 2 CHNG 1. MNG NRON V NG DNG TRONG HC MY ............................. 4 1.1 Mng nron................................................................................................................. 4 1.1.1 n v x l......................................................................................................... 5 1.1.2 Hm x l............................................................................................................ 7 1.1.3 Hnh trng mng................................................................................................. 9 1.2 Mng nron trong khai ph d liu .......................................................................... 10 1.2.1 Khai ph d liu ............................................................................................... 10 1.2.2 Khai ph d liu ti chnh ................................................................................ 13 1.3 Cc phng php hc s dng mng nron ............................................................. 15 1.3.1 Hc c gim st ................................................................................................ 16 1.3.2 Hc khng gim st .......................................................................................... 19 1.4 Kt lun chng 1..................................................................................................... 20 CHNG 2. THUT TON SOM VI BI TON PHN CM ................................. 21 2.1 Cc phng php phn cm ..................................................................................... 21 2.2 Dng mng nron trong phn cm ........................................................................... 22 2.2.1 Hc ganh ua.................................................................................................... 22 2.2.2 Thut ton SOM................................................................................................ 24 2.2.3 S dng SOM trong khai ph d liu ............................................................... 29 2.2.4 SOM vi bi ton phn cm ............................................................................. 31 2.2.5 Cc phng php phn cm khc .................................................................... 35 2.3 Mt vi ng dng ca SOM ..................................................................................... 38 2.3.1 La chn qu u t ......................................................................................... 39 2.3.2 nh gi ri ro tn dng gia cc nc ........................................................... 40 2.4 Kt lun chng 2..................................................................................................... 43 CHNG 3. NG DNG M HNH SOM TRONG BI TON NGN HNG ......... 45 3.1 Pht biu bi ton...................................................................................................... 45 3.2 Gii thiu cng c SOM Toolbox ............................................................................ 46 3.3 Cu trc chng trnh ............................................................................................... 47 3.3.1 Xy dng tp d liu......................................................................................... 47 3.3.2 X l d liu trc hun luyn ......................................................................... 52 3.3.3 Khi to SOM v hun luyn............................................................................ 52 3.3.4 M phng (trc quan ho)................................................................................ 56 3.3.5 Phn tch kt qu .............................................................................................. 59 3.4 Mt s nhn xt......................................................................................................... 60 3.4.1 phc tp tnh ton ....................................................................................... 60 3.4.2 Kt qu chy chng trnh ............................................................................... 63 3.4.3 So snh vi cc cng c khc ........................................................................... 71 3.5 Kt lun chng 3..................................................................................................... 73 KT LUN............................................................................................................................... 74 TI LIU THAM KHO ........................................................................................................ 75

-2-

M U
S pht trin mnh m ca Cng ngh ni chung v Cng ngh thng tin ni ring to nn nhiu h thng thng tin phc v vic t ng ho mi hot ng kinh doanh cng nh qun l trong x hi. iu ny to ra nhng dng d liu khng l tr thnh hin tng bng n thng tin. Nhiu h qun tr c s d liu mnh vi cc cng c phong ph v thun tin gip con ngi khai thc c hiu qu cc ngun ti nguyn d liu ln ni trn. Bn cnh chc nng khai thc c s d liu c tnh tc nghip, s thnh cng trong kinh doanh khng ch th hin nng sut ca cc h thng thng tin m ngi ta cn mong mun c s d liu em li tri thc t d liu hn l chnh bn thn d liu. Pht hin tri thc trong c s d liu (Knowledge Discovery in Databases - KDD) l mt qu trnh hp nht cc d liu t nhiu h thng d liu khc nhau to thnh cc kho d liu, phn tch thng tin c c nhiu tri thc tim n c gi tr. Trong , khai ph d liu (Data Mining) l qu trnh chnh trong pht hin tri thc. S dng cc k thut v cc khi nim ca cc lnh vc c nghin cu t trc nh hc my, nhn dng, thng k, hi quy, xp loi, phn nhm, th, mng nron, mng Bayes,... c s dng khai ph d liu nhm pht hin ra cc mu mi, tng quan mi, cc xu hng c ngha. Lun vn vi ti Hc mng nron theo m hnh SOM v ng dng trong bi ton qun l khch hng vay vn Ngn hng kho st lnh vc khai ph d liu dng mng nron. Lun vn tp trung vo phng php hc mng nron c gim st v khng c gim st, dng thut ton SOM gii quyt bi ton phn cm theo m hnh mng nron. Phng php nghin cu chnh ca lun vn l tm hiu cc bi bo khoa hc c xut bn trong mt vi nm gn y v khai ph d liu dng mng nron v p dng cng c SOM ToolBox gii quyt bi ton phn tch d liu khch hng vay vn trong Ngn hng.

-3-

Ni dung ca bn lun vn gm c phn m u, ba chng v phn kt lun. Chng 1 gii thiu v mng nron v cc thnh phn chnh trong mng nron (mc 1.1), dng mng nron trong khai ph d liu ni chung v d liu ti chnh ni ring (mc 1.2) v cc phng php hc s dng mng nron gm hc c gim st (mc 1.3.1) vi thut ton BBP (Boosting-Based Perceptron) v hc khng c gim st (mc 1.3.2). Chng 2 trnh by chi tit vic p dng mng nron trong khai ph d liu m c bit l phn cm d liu (mc 2.1 v 2.2), c lin quan n hai thut ton hc khng c gim st l thut ton hc ganh ua (mc 2.2.1) v thut ton SOM (2.2.2). Trn c s lun vn gii thiu mt s ng dng in hnh ca SOM trong lnh vc ti chnh (mc 2.3). Chng 3, p dng SOM gii quyt bi ton phn tch thng tin khch hng vay vn Ngn hng, gm vic tm hiu quy trnh lp h s khch hng vay vn (mc 3.1), tm hiu b cng c SOM Toolbox (mc 3.2 v 3.3) xy dng chng trnh cho bi ton ni trn. V cui cng l mt s kt qu chy chng trnh v nhn xt. Lun vn ny c thc hin di s hng dn khoa hc ca TS. H Quang Thy. Ti xin chn thnh cm n su sc ti Thy ch dn tn tnh gip ti c th hon thnh bn lun vn ny. Ti xin chn thnh cm n cc thy gio v cc bn trong b mn Cc H thng Thng tin c nhng gp hu ch trong qu trnh thc hin bn lun vn. Ti cng v cng cm n s gip v ng vin khch l ca ngi thn trong gia nh ti, bn b v cc ng nghip trong Ngn hng VPBank trong sut qu trnh thc hin lun vn. H ni, thng 03 nm 2004

Cm Vn

-4-

CHNG 1. MNG NRON V NG DNG TRONG HC MY


1.1 Mng nron B no con ngi cha khong 1011 cc phn t (c gi l nron) lin kt cht ch vi nhau. i vi mi nron, c khong 104 lin kt vi cc nron khc. Mt nron c cu to bi cc thnh phn nh t bo hnh cy, t bo thn v si trc thn kinh (axon). T bo hnh cy c nhim v mang cc tn hiu in ti t bo thn, t bo thn s thc hin gp (sum) v phn ngng cc tn hiu n. Si trc thn kinh lm nhim v a tn hiu t t bo thn ti t bo hnh cy ca cc nron lin kt.

Hnh 1. Nron sinh hc im tip xc gia mt si trc thn kinh ca nron ny vi mt t bo hnh cy ca mt nron khc c gi l khp thn kinh (synapse). S sp xp cc nron v mc mnh yu ca cc khp thn kinh do cc qu trnh ho hc phc tp quyt nh, s thit lp chc nng ca mng nron. Khi con ngi sinh ra, mt b phn cc nron c sn trong no, cn cc b phn khc c pht trin thng qua qu trnh hc, v trong qu trnh xy ra vic thit lp cc lin kt mi v loi b i cc lin kt c gia cc nron. Cu trc mng nron lun lun pht trin v thay i. Cc thay i c khuynh hng ch yu l lm tng hay gim mnh cc mi lin kt thng qua cc khp thn kinh.

-5-

Mt trong nhng phng php in hnh gii quyt bi ton hc my l thit lp cc mng nron nhn to. Mng nron nhn to cha tip cn c s phc tp ca b no. Tuy nhin, do m phng hot ng hc trong no m v c bn c hai s tng quan gia mng nron nhn to v nron sinh hc. Th nht, cu trc to thnh chng u l cc thit b tnh ton n gin (vi mng nron sinh hc l cc t bo thn cn vi mng nhn to th n gin hn nhiu) c lin kt cht ch vi nhau. Th hai, cc lin kt gia cc nron quyt nh chc nng hot ng ca mng. Mng nron, c xem nh hoc l m hnh lin kt (connectionist model), hoc l m hnh phn b song song (parallel-distributed model) v c cc thnh phn phn bit sau y: 1) 2) 3) 4) 5) 6) 7) 8) Tp cc n v x l; Trng thi kch hot hay u ra ca n v x l; Lin kt gia cc n v, mi lin kt c xc nh bi mt trng s wji cho ta bit hiu ng m tn hiu ca n v j c trn n v i; Lut lan truyn quyt nh cch tnh tn hiu ra ca n v t u vo ca n; Hm kch hot, xc nh mc kch hot khc da trn mc kch hot hin ti; n v iu chnh ( lch - bias) ca mi n v; Phng php thu thp thng tin (lut hc learning rule); Mi trng h thng c th hot ng.

1.1.1 n v x l Mt n v x l, cng c gi l mt nron hay mt nt (node), thc hin cng vic rt n gin: nhn tn hiu vo t cc n v khc hay mt ngun bn ngoi v s dng chng tnh tn hiu ra s c lan truyn sang cc n v khc.

-6-

x0
x1
. . .

w j0
w j1 w jn
n

i =1

aj

g (a j )

zj z j = g (a j )

xn

a j = w ji xi + j
Hnh 2. n v x l

trong : xi : cc u vo ca n v th j, wji : h s ni ti n v th j,

j : lch i vi n v th j,
aj : tng th j ca u vo mng (net input), tng ng vi n v th j, zj : u ra ca n v th j, g(x) : hm kch hot. Trong mt mng nron c 3 kiu n v: 1) 2) 3) Cc n v u vo (input unit), nhn tn hiu t bn ngoi; Cc n v u ra (output unit), gi tn hiu ra bn ngoi; Cc n v n (hidden unit), u vo (input) v u ra (output) ca chng u nm trong mng. Nh c th hin trong hnh 2, mi n v j c th c mt hoc nhiu u vo: x0, x1, x2, ..., xn, nhng ch c mt u ra zj. Mi u vo ca mt n v c th l d liu t bn ngoi mng, hoc u ra ca mt n v khc, hoc u ra ca chnh n v .

-7-

1.1.2 Hm x l 1.1.2.1 Hm kt hp Mi n v trong mng nron kt hp cc tn hiu a vo n thng qua cc lin kt vi cc n v khc, sinh ra mt gi tr gi l net input. Hm thc hin nhim v ny gi l hm kt hp, c nh ngha bi mt lut lan truyn c th. Trong phn ln cc mng nron, gi s rng mi n v cung cp mt u vo cho n v m n c lin kt. Tng u vo n v j n gin ch l tng theo trng s ca cc u ra ring l t cc n v kt ni ti n cng thm ngng hay lch j:

a j = wij xi + j
i =1

Trng hp wji >0, nron c coi l trong trng thi kch thch. Ngc li khi wji<0, nron c coi l trng thi kim ch. Chng ta gi n v vi lut lan truyn nh trn l n v tng (sigma unit). Trong mt vi trng hp ngi ta cng c th s dng cc lut lan truyn phc tp hn. Mt trong s l lut tng tch (sigma-pi rule), c dng sau:

a = w ji xik + j
i =1 k =1

Rt nhiu hm kt hp s dng lch tnh net input ti n v. i vi mt n v u ra tuyn tnh, thng thng, lch j c chn l hng s v trong bi ton xp x a thc j = 1. 1.1.2.2 Hm kch hot Phn ln cc n v trong mng nron chuyn net input bng cch s dng mt hm v hng gi l hm kch hot, nu kt qu ca hm ny l mt gi tr gi l

-8-

mc kch hot ca n v. Ngoi tr kh nng n v l mt lp ra, gi tr kch hot c a vo mt hay nhiu n v khc. Cc hm kch hot thng b p vo mt khong gi tr xc nh, do thng c gi l cc hm bp (squashing). Cc hm kch hot hay c s dng l: Hm ng nht (Linear function, Identity function)

g ( x) = x
Nu coi u vo l mt n v th s s dng hm ny. i khi mt hng s c nhn vi net input to ra mt hm ng nht.
g(x) 1 x -1 -1 1

Hnh 3. Hm ng nht Hm bc nh phn (Binary step function, Hard limit function)

Hm ny cng c bit n vi tn hm ngng (threshold function). u ra ca hm ny c gii hn vo mt trong hai gi tr.

1, if ( x ) g ( x) = 0, if ( x, )
Dng hm ny c s dng trong cc mng ch c mt lp. Trong hnh v sau c chn bng 1.
g(x) 1 x -1 0 1 2 3

Hnh 4. Hm bc nh phn

-9-

Hm sigmoid (Sigmoid function)

g ( x) =

1 1 + ex

Hm ny c bit thun li khi s dng cho cc mng hun luyn, bi n d ly o hm, do c th gim ng k tnh ton trong qu trnh hun luyn. Hm ny c ng dng cho cc chng trnh ng dng m cc u ra mong mun ri vo khong [0,1].
g(x)

x -6 -4 -2 0 2 4 6

Hnh 5. Hm Sigmoid 1.1.3 Hnh trng mng Hnh trng ca mng c nh ngha bi: s lp (layer), s n v trn mi lp, v s lin kt gia cc lp nh th no. Cc mng v tng th c chia thnh hai loi da trn cch thc lin kt cc n v. 1.1.3.1 Mng truyn thng
bias

x0
x1 x2
. . .

bias h0 y1 h1 h2
. . .
1 w(ji )

y2
. . .
( wkj2)

xn
Input Layer

hm
Hidden Layer

yn

Output Layer

Hnh 6. Mng nron truyn thng nhiu lp

-10-

Dng d liu gia n v u vo v u ra ch truyn thng theo mt hng. Vic x l d liu c th m rng ra thnh nhiu lp, nhng khng c cc lin kt phn hi. iu c ngha l khng tn ti cc lin kt m rng t cc n v u ra ti cc n v u vo trong cng mt lp hay cc lp trc . 1.1.3.2 Mng hi quy Trong mng hi quy, tn ti cc lin kt ngc. Khc vi mng truyn thng, thuc tnh ng ca mng hi quy c c t cc lin kt ngc nh vy c ngh rt quan trng. Trong mt s trng hp, cc gi tr kch hot ca cc n v tri qua qu trnh ni lng (tng gim s n v v thay i cc lin kt) cho n khi mng t n trng thi n nh v cc gi tr kch hot khng thay i na. Trong cc ng dng khc m cch chy to thnh u ra ca mng th nhng s thay i cc gi tr kch hot l ng quan tm.
bias h0
y1 h1 h2
. . . . . . . . .

x0 x1 x2

y2

xn

hm

yn

Input Layer

Hidden Layer

Output Layer

Hnh 7. Mng nron hi quy 1.2 Mng nron trong khai ph d liu 1.2.1 Khai ph d liu Mc ch quan trng ca cng vic khai ph d liu l hiu c ngha v ni dung su sc bn trong cc b d liu ln. Thng thng, cc gii php ph bin t c mc ch ny u lin quan n phng php hc my xy dng mt cch

-11-

quy np cc m hnh d liu trong tng lai. Mng nron c p dng trong hng lot cc ng dng khai ph d liu trong ti chnh ngn hng, d on t gi quy i, lp lch cho tu con thoi, ... Cc thut ton hc mng nron c ng dng thnh cng trong mt s lnh vc lin quan n hc c gim st v khng gim st. Hng pht trin mi hc mng nron l ci tin qu trnh hc cho d hiu hn v thi gian hc nhanh hn, m y l vn thng xuyn c n cp u tin trong khai ph d liu [12]. Hc quy np l mt trong nhng phng php ph bin trong khai ph d liu bi v n xy dng c cc m hnh din t vic thu thp d liu cho php hiu thu o bn trong d liu . Tu theo cng vic c th m c th s dng phng php hc c gim st hoc hc khng gim st cc m hnh. Trong c hai trng hp hc c gim st v khng gim st, cc thut ton hc l khc nhau thng qua cch th hin cc m hnh khc nhau. Cc phng php hc mng nron th hin cc gii php hc dng tham s thc trong mt mng gm cc n v x l n gin. Cc kt qu nghin cu chng t rng mng nron l cng c kh hiu qu trong khai ph d liu, c bit i vi khuynh hng hc theo quy np. Chng ta lt qua ni dung s b v thut ton c khuynh hng quy np trong khai ph d liu, m c th l thut ton hc theo quy np. Cho mt tp c nh cc v d hun luyn, thut ton hc c khuynh hng quy np quyt nh cc thng s ca mt m hnh bng cch tnh ton lp i lp li theo dng ca m hnh . C hai xu hng xc nh hng u tin ca thut ton. Khng gian gi thuyt gii hn cp n rng buc thut ton hc thay cho gi thuyt m n c th to ra. V d, khng gian gi thuyt ca mt b cm ng c gii hn bi cc hm tuyn tnh c bit. Hng u tin ca thut ton cp n vic sp xp u tin thay cho cc m hnh kt hp trong khng gian gi thuyt. V d, phn ln cc thut ton hc ban u c gng p ng mt gi thuyt n gin a ra mt tp hun luyn sau kho st dn cc gi thuyt phc tp cho n khi thut ton tm c hng c th chp nhn c.

-12-

Mng nron l phng php hc kh ph bin khng ch v lp cc gi thuyt do chng c th i din, m n gin l v chng em li gi thuyt khi qut hn so vi cc thut ton cnh tranh khc. Mt s cng trnh nghin cu xc nh rng c mt s lnh vc m trong mng nron cung cp d on chnh xc. Gi thuyt c th hin trong mng nron hun luyn bao gm: (1) (2) (3) Hnh trng ca mng; Hm chuyn i dng cho cc n v n v n v u ra; Cc tham s gi tr thc lin quan n kt ni mng (trng s kt ni).

Cc gi thuyt l rt a dng. u tin, cc mng tiu biu c hng trm hng nghn cc tham s gi tr thc, cc tham s m ho c lin quan n u vo x v gi tr ch y. Mc d, m ho cc tham s ca loi ny khng kh, song s chnh lch s lng cc tham s trong mng c th lm cho vic hiu chng tr nn kh khn hn. Th hai, trong mng a lp, cc tham s c th c mi quan h khng tuyn tnh, khng n iu gia u vo v u ra. V vy thng lm cho n khng th xc nh r s nh hng ca cc c im a ra trong cc gi tr mong mun. Qu trnh hc ca phn ln cc phng php hc mng nron u lin quan n vic dng mt s phng php ti u c bn gradient iu chnh cc tham s mng. Ging nh cc phng php ti u, hc mng nron thc hin lp i lp li hai bc c bn: tnh ton gradient ca hm li v iu chnh cc tham s mng theo hng tin b bi gradient. Vic hc c th l rt chm chp v tu thuc cc phng php khc nhau bi v th tc ti u thng bao gi mt s lng ln cc bc nh v chi ph tnh ton gradient cho mi bc c th l rt ln. Hng mong mun ca phng php hc mng nron l tm ra cc thut ton hc tuyn tnh, c ngha l chng c cp nhp cc gi thuyt sau mi v d. V cc tham s c cp nhp u n, cc thut ton hc mng nron tuyn tnh thng nhanh hn thut ton x l theo khi. y l mt c im c li cho tp d liu

-13-

ln. Mt gii php c gi l tt nu nh m hnh c th c pht hin ch trong mt ln duyt qua mt tp d liu ln. L do ny, chng t thi gian hun luyn ca cc phng php hc mng nron l chp nhn cho vic khai ph d liu. 1.2.2 Khai ph d liu ti chnh Theo nh gi ca Rao vo nm 1993 [4]: Cc kt qu ng ch trong mng nron trong sut my nm qua thu c t vic tng qut ho bng h hc cc v d (trng hp) c bn. Kt qu cng cho thy l cc mng c kh nng hnh thnh mt xp x ng tu cho bt k nh x khng tuyn tnh lin tc. Trong thc t, mng nron c dng kh ph bin trong lnh vc ti chnh. Nhng cng b t nhiu bi bo khoa hc xung quanh cc v d dng mng nron n gin, hi quy, v tin x l d liu cho thy s dng mng nron l c li hn nhiu so vi cc phng php khc. Cc tc gi [4] ch ra rng: (1) dng mng nron n gin rt thch hp i vi cc h thng ti chnh thng mi; (2) cc h thng mng nron m li thch hp cho vic xy dng m hnh ti chnh v d bo; (3) dng mng nron hi quy trong ti chnh d on li trong kinh doanh... Tin x l cng c dng ph bin trong tng qut ho cng nh trong cc ng dng mng nron trong ti chnh. Mt hng chung ca tin x l l dng hm sigmoid v cc cch bin i khc nhau lm thay i cc gi tr ln hn 1. Mc ch ca cng vic l nhm tng tc hun luyn mng. V d, i vi bi ton d bo gi c phiu, dng mng nron gp ba thiu st: (1) kh nng gii thch cha tht tt; (2) kh ph hp vi thi quen dng cc quan h logic; (3) kh khn khi chp nhn d liu b thiu ht. Tuy nhin, mng nron vn khng nh nhng li im ca n nh tc p ng nhanh, chp nhn s phc tp, tng i c lp vi c tnh chuyn mn ca lnh vc ng dng, tnh linh hot v c ng. Cc mng nron hi quy c dng trong mt s ng dng ti chnh kh in hnh [4]. c bit, mng nron hi quy c pht trin d on t gi hon i ngoi t hng ngy vi s kt hp vi cc k thut khc. Dng mng nron hi

-14-

quy v hai l do. Mt l, m hnh cho php xc nh cc quan h tm thi cng vi chui thi gian bng cch duy tr mt khong trng thi. Hai l, cc lut gii thch d hiu c th c rt ra t mng hi quy c hun luyn. C th, ngi ta dng mng nron gm: Ba nron u vo. Nron u tin c dng th hin c trng ca chui d liu theo thi gian x(t), x(t-1), x(t-2), ..., x(t-k) vi k l cc khong thi gian. Cc u vo sau c dng cho hai nron u vo, tng cng trong qu trnh hun luyn. Mt lp n vi nm lin kt cc nron y . Hai nron ra. Nron u tin c hun luyn d on kh nng ca thay i khng nh (positive change), v nron th hai c hun luyn d on kh nng ca ph nh (negative change).
Probability of negative changes of time series

Hidden Layer Input Layer

Probability of positive changes of time series

Output Layer

Hnh 8. Mt v d dng mng nron hi quy trong d bo ti chnh S m t c ng, coi nh mt ch s, c dng gi cho mng nron nh hn. Nm 1997 Kohonen s dng k thut SOM ly ra ch s. y l mt qu trnh hc khng gim st, hc s phn b ca mt tp cc mu khng c bt k s phn lp thng tin no. Chi tit thut ton SOM v cch phn lp thng tin cng nh ng dng ca SOM vo mt bi ton c th s l ch chnh ca bn lun vn ny v s c cp chi tit hn trong chng 2.

-15-

Cc bc trch lut t mng nron hi quy l: Bc 1: Phn cm cc gi tr kch hot tnh trng ca cc nron hi quy. Bc 2: Xc nh cc tnh trng cho cc cm. Bc 3: Chn cc bin i gia cc cm trong cc biu tng u vo thch hp. Kt qu ca thut ton trn l mt tp cc lut d on c gn bng cc biu tng c ngha c ly t mt chui thi gian. Hiu cch hot ng ca mng nron c th rt ra c cc lut. Di y l bng kt qu ca thut ton. Tp cc lut 1 Cc lut d bo c rt ra Lut 1. Nu thay i ln cui trong chui l ph nh, th thay i tip theo s l khng nh. Lut 2. Nu thay i ln cui trong chui l khng nh, th thay i tip theo s l ph nh Lut 1. Nu thay i ln cui trong chui l ph nh, th thay i tip theo s l khng nh. Lut 2. Nu thay i ln cui trong chui l khng nh, th thay i tip theo s l khng nh Lut 1. Nu thay i ln cui trong chui l khng nh, th thay i tip theo s l khng nh. Lut 2. Nu thay i ln cui trong chui l ph nh v cc ln thay i trc khng phi l khng nh, th thay i tip theo s l khng nh

1.3 Cc phng php hc s dng mng nron Chc nng ca mng nron c quyt nh bi cc nhn t nh: hnh trng mng (s lp, s n v trn mi tng, v cch m cc lp lin kt vi nhau) v cc trng s ca cc lin kt ni ti trong mng. Hnh trng ca mng thng l c nh cn cc trng s c quyt nh bi mt thut ton hun luyn. Tin trnh iu chnh cc trng s mng nhn bit c quan h gia u vo vi ch (u ra) mong mun c gi l hc hay hun luyn. Thut ton hc c chia lm hai

-16-

nhm chnh: Hc c gim st (supervised learning) v hc khng c gim st (unsupervised learning). 1.3.1 Hc c gim st

D liu hc
u vo u ra mong mun ch

Mng
u vo u ra

+ -

Li

Thay i trng s

Hm i tng

Thut ton hc
(phng php ti u)

Hnh 9. M hnh hc c gim st Mng c hun luyn bng cch cung cp cho n cc cp mu u vo v cc u ra mong mun. Cc cp mu c cung cp bi thy, hay bi h thng trn mng hot ng. Mc ch l xy dng mng i vi u vo trong tp hun luyn th kt qu u ra ca mng cho ng u ra mong mun m lm c iu phi iu chnh dn mng do tn ti s khc bit gia u ra thc t v u ra mong mun ( c bit trc) .S khc bit ny c thut ton hc s dng iu chnh cc trng s trong mng.Vic iu chnh cc trng s nh vy thng c m t nh mt bi ton xp x s - cho d liu hun luyn bao gm cc cp (mu u vo x, v mt ch tng ng t), mc ch l tm hm f(x) tho mn tt c cc mu hc u vo. Thut ton BBP (Boosting-Based Perceptron) Thut ton BBP (Jackson & Carven, 1996) [12] l thut ton hc c gim st c pht trin trn c s thut ton AdoBoost (Freund & Schapire, 1995) [11], l

-17-

phng php hc gi thuyt ni (hypothesis boosting). Thut ton hc mt tp cc gi thuyt v sau kt hp chng vo mt gi thuyt tng th. Thut ton gi thuyt ni l thut ton kt hp cho ra cc gi thuyt bng thut ton hc yu (weak learning) trong mt gi thuyt mnh. Gi thuyt yu l gi thuyt m d on ch tt hn khng ng k so vi phng on ngu nhin, ngc li gi thuyt mnh l gi thuyt m khi d on cho kt qu chnh xc cao. Thut ton BBP c dng nhiu cho cc ng dng khai ph d liu v n c nhng ng gp ng k trong cc mng hc. Phng php hc ny khng ging nh cc phng php mng nron truyn thng l v n khng lin quan n vic hun luyn bng mt phng php ti u da trn gradient (gradient-based). Tuy nhin do cc gi thuyt hc l cc b cm ng v vy chng ta xem n l mt phng php mng nron. tng chnh ca phng php l thm cc n v u vo mi cho mt gi thuyt hc, dng phn b xc sut trn ton b tp hun luyn chn lc ra mt u vo thch hp. V thut ton thm cc u vo c trng s cho cc gi thuyt nn phc tp ca cc gi thuyt c th kim sot c d dng. Cc u vo c kt hp cht ch trong mt gi thyt tng ng vi cc hm Boolean c nh x n {-1,+1}. Mt khc, cc u vo l cc n v nh phn c mt kch hot hoc 1 hoc +1. Cc u vo c th tng ng vi cc gi tr Boolean hoc chng c th tng ng vi cc gi tr th nghim nh danh hay s (v d, mu = , x1>0.8) hoc cc kt hp logic cc gi tr (v d, [mu = ]

[hnh = trn]). Hn na, thut ton cng c th kt hp mt u vo tng ng


hm true. Trng s gn vi mt u vo tng xng vi ngng ca b cm ng. Trong mi ln lp, u vo c la chn t mt tp cc kh nng c th v thm vo cc gi thuyt. Thut ton BBP o tng quan ca mi u vo vi hm mc tiu bng cch hc, v sau tm u vo c s tng quan ln nht. S tng

-18-

quan gia kh nng chn la v hm mc tiu c thay i qua mi ln lp do c iu chnh bng cch thay i mt phn b qua tp hun luyn. Ban u, thut ton BBP gi thit c phn b ng u trn tp hun luyn. Khi la chn u vo u tin, BBP n nh mc quan trng ngang nhau cho mi trng hp trong tp hun luyn. Mi khi mt u vo c thm vo, phn b c iu chnh theo hng l trng s ln hn c a ti cc v d m u vo khng d on chnh xc. iu c ngha l, thut ton hng ngi hc tp trung ch vo cc v d m gi thuyt hin ti khng gii thch ng. Thut ton dng vic thm trng s u vo cho cc gi thuyt sau khi thc hin lp mt s ln c xc nh trc, gp tnh hung khng cn li i vi tp hun luyn. V ch c mt u vo c thm vo mng trong mi ln lp, kch thc ca b cm ng cui cng c th kim sot theo bi s ln lp. Gi thuyt tr v ca BBP l mt b cm ng c trng s kt hp vi mi u vo l mt hm li ca u vo. B cm ng dng hm du xc nh lp tr v:

1 if x > 0 sign(x) = - 1 if x <= 0


Thut ton BBP c hai hn ch [12]: Mt l, n c thit k cho cc nhim v hc phn lp nh phn. Thut ton c th c p dng cho vn hc a lp bng cch mi lp hc mt b cm ng. Hai l, n gi s u vo l cc hm boolean, cho nn cc lnh vc p dng c gi tr thc cn phi x l bng cch ri rc ha cc gi tr nh ni trn. Thut ton Input: Tp S gm m v d, tp u vo C c nh x ti {-1,+1}, s cc tng tc T Output: Hm h(x)

-19-

Ni dung thut ton: for all xS /* Phn b ban u l nh nhau */ D1(x) := 1/m for t:=1 to T do /*Thm gi thuyt */ ht := argmaxciC| EDt [f(x).ci(x)] | /* Xc nh li */

t := 0
for all xS if ht(x) f(x) then t := t + Dt(x) /* Cp nhp li phn b */

t := t / (1-t)
for all xS if ht(x) = f(x) then Dt+1(x) := tDt(x) else Dt+1(x) := Dt(x) /* Cp nhp li */ Zt =

x Dt+1(x)
Dt+1(x) := Dt+1(x)/Zt

for all xS
T

Return: h(x) =

sign(

ln( )h ( x)) )
i =1 i i

1.3.2 Hc khng gim st Hc mng nron khng gim st l cch hc khng c phn hi t mi trng ch ra rng u ra ca mng l ng nh th no. Mng s phi khm ph cc c trng, cc iu chnh, cc mi tng quan, hay cc lp trong d liu vo mt cch

-20-

t ng. Trong thc t, i vi phn ln cc bin th ca hc khng gim st, cc ch trng vi u vo. Ni mt cch khc, hc khng gim st thc hin mt cng vic tng t nh mt mng t nhin lin hp, c ng thng tin t d liu vo. Mt s thut ton hc khng gim st c trnh by chi tit trong chng 2. 1.4 Kt lun chng 1 Chng ny lun vn trnh by nhng ni dung chnh yu v cu trc mng nron gm cc n v x l; trng thi kch hot; cc lin kt, lut lan truyn; hm kch hot; lch; lut hc v mi trng h thng c th hot ng c. V tng th, hnh trng mng nron c chia lm hai loi l mng nron truyn thng v mng nron hi quy. Cc thut ton hc mng nron lm cho qu trnh hc cho d hiu hn v chi ph thi gian hc t hn, y l vn thi s trong khai ph d liu. Thut ton hc mng nron c chia lm hai nhm chnh l hc c gim st v hc khng c gim st. Trong thut ton BBP l thut ton c trng cho hc c gim st mng nron n lp.

-21-

CHNG 2. THUT TON SOM VI BI TON PHN CM


Nh trnh by trong chng 1, hc khng gim st l mt trong hai nhm hc chnh ca mng nron. Hc khng gim st l cch hc khng c phn hi t mi trng. Chng ny s gii thiu mt thut ton hc khng gim st ph bin nht l hc ganh ua v sau cng s gii thiu mt thut ton s dng thut ton ganh ua v qua mt qu trnh t t chc (self - organizing ) sp xp u ra cho bi ton phn cm. 2.1 Cc phng php phn cm Mc ch ca phn cm l lm gim kch thc d liu bng cch phn loi hoc nhm cc thnh phn d liu ging nhau. Tn ti mt s k thut phn cm in hnh [9]: Phn cm theo phn cp c thc hin theo hai phng php. Phng php u tin l hp nht cc cm d liu nh hn thnh cc cm ln hn theo mt vi tiu chun (t di ln). Phng php th hai l lm ngc li, chia cc cm ln hn thnh cc cm nh (t trn xung). Kt qu ca c hai phng php l mt cy phn cm (c gi l dendrogram) ch ra cc cm c lin quan. Phn cm b phn phn tch d liu vo mt tp cc cm ri rc. Thut ton phn cm ti thiu mt hm chun. chun ny thng lin quan n vic ti thiu mt vi o ging nhau trong tp v d vi mi cm, trong khi vic ti a cc cm l khng ging nhau. tn ti mt vi phng php phn cm b phn, m in hnh nht l dng thut ton K thnh phn chnh. Phn cm da trn mt (density-base) l cc phng php phn cm da vo lin kt v cc hm mt . Phn cm da trn li (grid-base) s dng cu trc nhn a mc loang dn cc cm.

-22-

Phn cm da trn m hnh (model-base) c tin hnh bng cch dng ln mt m hnh gi nh cho mi cm v tng l chn m hnh tt nht trong s cc m hnh ca cc cm.

Cc phng php khc nh l tip cn mng nron v hc ganh ua.

Cc k thut phn cm v ang c p dng trong nhiu vn nghin cu. V d nh, trong lnh vc y t: phn loi bnh, cch cha bnh, hoc triu chng bnh; trong lnh vc ti chnh c bit l nghin cu th trng, la chn qu u t, c nh ri ro tn dng, ...; trong x l nh, nhn dng mu, ...; trong web nh phn lp ti liu, phn cm d liu Weblog pht hin ra cc nhm c mu truy cp ging nhau,... 2.2 Dng mng nron trong phn cm 2.2.1 Hc ganh ua Hc khng gim st lin quan n vic dng cc phng php quy np pht hin tnh quy chun c th hin trong tp d liu. Mc d c rt nhiu thut ton mng nron cho hc khng gim st, trong c thut ton hc ganh ua (competitive learning, Rumelhart & Zipser, 1985) [12]. Hc ganh ua c th coi l thut ton hc mng nron khng gim st thch hp nht trong khai ph d liu, v n cng minh ha cho s ph hp ca cc phng php hc mng nron mt lp. Nhim v hc xc nh bi hc ganh ua l s phn chia mt v d hun luyn cho trc vo trong mt tp cc cm d liu. Cc cm d liu s th hin cc quy tc biu din trong tp d liu nh cc minh ho ging nhau c nh x vo trong cc lp ging nhau. Bin th ca hc ganh ua m chng ta xt y i khi c gi l hc ganh ua n iu, lin quan n vic hc trong mng nron mt lp. Cc n v u vo trong mng c cc gi tr lin quan n lnh vc ang xt, v k n v u ra th hin k lp v d u vo c phn cm.

-23-

Gi tr u vo cho mi u ra trong phng php ny l mt t hp tuyn tnh ca cc u vo:

net j = w ji xi
i

Trong , xi l u vo th i, v wji l trng s lin kt u vo th i vi u ra th j. Tn thut ton xut pht t vic quyt nh s cc lp n. n v u ra c gi tr u vo ln nht c coi l chin thng, v kch hot c coi bng 1, cn cc kch hot khc ca u ra c cho bng 0.

1 if w ji xi > whi xi h j aj = i i 0 else


Qu trnh hun luyn cho hc ganh ua lin quan n hm chi ph:

C=

1 2 j

a
i

( xi w ji )

vi aj l kch hot ca u ra th j, xi l u vo th i, v wji l trng s t u vo th i vi u ra th j. Lut cp nhp cc trng s l:

w ji = Cw ji = a j ( xi w ji )
vi l h s t l hc.
net j = w ji xi
i

1 net j > neth aj = 0 otherwise


Wjn

Wj1

Wj2

Hnh 10. n v x l ganh ua

-24-

tng chnh ca hc ganh ua l i vi mi u ra l ly ra tin cy cho tp con cc v d hun luyn. Ch mt u ra l chin thng trong s v d a ra, v vect trng s cho n v chin thng c di chuyn v pha vect u vo. Ging nh qu trnh hun luyn, vect trng s ca mi u ra di chuyn v pha trung tm ca cc v d. Hun luyn xong, mi u ra i din cho mt nhm cc v d, v vect trng s cho cc n v ph hp vi trng tm ca cc nhm. Hc ganh ua c lin quan mt thit vi phng php thng k ni ting nh l phng php phn cm K thnh phn chnh. Khc nhau c bn gia hai phng php l hc ganh ua l phng php trc tuyn, ngha l trong sut qu trnh hc n cp nhp trng s mng sau mi v d c a ra, thay v sau tt c cc v d c a ra nh c lm trong phng php phn cm K thnh phn chnh. Hc ganh ua ph hp vi cc tp d liu ln, v cc thut ton trc tuyn thng c gii php nhanh hn trong mi trng hp. 2.2.2 Thut ton SOM

Hnh 11. Khng gian ban u v SOM

Thut ton SOM (SelfOrganizing Map) c gio s Teuvo Kohonen pht trin [10,11,13,15] vo nhng nm 80, l mt cng c rt thch hp trong khai ph d liu [9]. SOM thc hin mt nh x lm gim kch thc ca tp hun luyn. nh

-25-

x sinh ra hm phn b xc sut ca d liu v linh hot vi d liu cn thiu. N c gii thch d dng, n gin v quan trng nht l d hnh dung. M phng d liu a chiu l mt lnh vc p dng chnh ca SOM. SOM l mt k thut mng nron truyn thng s dng thut ton hc khng gim st (hc ganh ua) v qua qu trnh t t chc, sp xp u ra cho mt th hin hnh hc ca d liu ban u [10,11]. Thut ton Xt mt tp d liu l cc vect trong khng gian n chiu:

x = [x1 , x2 ,..., xn ] n
T

Thng thng SOM gm M nron nm trong mt li (thng c kch thc 2 chiu). Mt nron th i l mt vect mu c kch thc p:

mi = mi1 ,..., mip

Cc nron trong li c lin kt n cc nron ln cn bng mt quan h lng ging. Cc lng ging lin k l cc nron ln cn tu theo bn knh ln cn ca nron th i.

N i ( d ) = {j , d i , j d } vi d l bn knh ln cn
Cc nron ln cn tu thuc vo bn knh, c sp xp trong li theo hnh ch nht hoc hnh lc gic. S cc ln cn xc nh trng tm ca ma trn kt qu, c nh hng n chnh xc v kh nng sinh ma trn ca SOM.

Hnh 12. Cc ln cn

-26-

Trong thut ton SOM, cc quan h hnh hc v s cc nron l c nh ngay t u. S lng nron thng c chn ln nu c th, bng cch iu khin kch thc ln cn cho ph hp. Nu kch thc ln cn c la chn l ph hp th ma trn khng b mt mt thng tin nhiu ngay c khi s cc nron vt qu s cc vect u vo. Tuy nhin, nu kch thc ca ma trn tng, v d n mi nghn nron th qu trnh hun luyn tr nn nng n v vic tnh ton s khng cn hp l cho phn ln cc ng dng. Trc khi hun luyn cc gi tr ban u c a ra l cc vect trng s. SOM l khng ph thuc nhiu i vi d liu ban u (d liu c th b thiu), nhng thut ton SOM vn hi t nhanh. Dng mt trong ba th tc khi to in hnh sau : Khi to ngu nhin, vect trng s ban u c gn gi tr l cc gi tr ngu nhin nh. Khi to v d, vect trng s ban u c gn vi cc v d ngu nhin rt ra t tp d liu. Khi to tuyn tnh, vect trng s ban u c gn trong mt khng gian con tuyn tnh bi hai vect ca tp d liu ban u. Trong mi bc hun luyn, chn ngu nhin mt vect v d x trong tp d liu ban u. Tnh ton khong cch gia x n tt c cc vect mu, trong c l n v c mu gn x nht gi l BMU (Best Matching Unit), c xc nh nh sau:

x mc = min{ x mi
i

vi ||.|| l o khong cch. Sau khi tm c BMU, vect trng s ca SOM c cp nhp li. Vect trng s ca BMU v cc ln cn hnh thi ca n di chuyn dn n vect trong khng gian u vo. Th tc cp nhp ny tri di theo BMU v cc hnh trng ln cn ca n v pha vect v d.

-27-

SOM cp nhp lut cho vect trng s ca n v th i l:

mi (t + 1) = mi (t ) + (t )hci (t )[x mi (t )]
vi t: l thi gian, x: vect u vo ngu nhin rt ra t tp d liu u vo ti thi im t,

(t): h s t l hc, hci(t): nhn (kernel) ln cn quanh c ti thi im t, l hm ln cn Gaux.

Hnh 13 BMU Nhn ln cn xc nh vng nh hng m v d u vo c trong SOM. Nhn c th hin gm hai phn: hm ln cn h(t,d) v hm t l hc (t):

hci (t ) = h( rc ri , t ) (t )
rc, ri l cc v tr nron i v c. Hm ln cn n gin nht l hm ni bt: n gm ton b ln cn ca n v chin thng v bng khng nu ngc li (hnh 14). Ngoi ra, cn c hm ln cn Gaux:
2

) t ( 2 2
ir cr

e = ) t( ich

-28-

vi (t): l bn knh ln cn. Hm ln cn Gaux cho ra kt qu tt hn, nhng vic tnh ton li nng n hn. Thng th ban u bn knh ln cn ln v gim dn xung 1 trong sut qu trnh hun luyn. T l hc (t) l mt hm gim dn theo thi gian. Hai mu dng ph bin l hm tuyn tnh v hm nghch o theo thi gian:
(t ) =
A t+B

vi A v B l cc hng s.

(a) Ln cn Bubble

(b) Ln cn Gaux

Hnh 14. Hai hm ln cn c bn Vic hun luyn thng c tin hnh trong hai giai on. Giai on u, c lin quan n vic s dng gi tr ban u ln v cc bn knh ln cn. Trong giai on sau gi tr v bn knh ln cn nh ngay t khi bt u. Th tc ny ph hp vi vic iu chnh xp x ban u ca SOM trong cng mt khng gian ging nh d liu u vo v sau iu chnh tt trn ma trn. C nhiu bin th ca SOM. Mt ch khc ca SOM l dng t l hc mng nron v cc kch thc ln cn. Ngoi ra c th s dng cu trc ma trn mt cch

-29-

thch hp hoc ngay c cu trc ang pht trin. Mc ch ca cc bin i ny l thit lp SOM theo hnh trng tt hn trong khun kh ca tp d liu hoc thc hin kt qu lng t ho (quantization) tt hn. 2.2.3 S dng SOM trong khai ph d liu Thut ton SOM vi nhng u im ca n, tr thnh cng c c ch trong khai ph d liu. l, to ra hm phn b xc sut cho tp d liu ban u, d gii thch v quan trng nht l trc quan ho tt [8,10,11]. Tu theo vn cn gii quyt, cc chuyn gia khai ph d liu c th chn cc phng php khc nhau phn tch d liu a ra. Th nhng vi phung php SOM c th lm nhiu cng vic cng mt lc v cho kt qu tng ng vi vic kt hp nhiu phng php khc vi nhau. Nh trnh by, SOM rt hiu qu trong vic phn cm v rt gn kch thc d liu. Nu tch hp SOM vi cc phng php khc c th sinh lut. Trc quan ho rt c ngha trong khai ph d liu, l yu t quan trng trong bo co kt qu hoc to tri thc [10]. Cc minh ho trc quan dng hiu thu o tp d liu v tm tt cu trc tp d liu. C th khng nh im mnh ca SOM l phng php trc quan ho . Cc k thut trc quan ho dng SOM gm: Trc quan ho ma trn gm trc quan ho cc thnh phn (component planes) ca vect v s tng quan gia chng; trc quan ho ma trn hp nht khong cch U (unified distance matrix U Matrix) biu din cu trc cm ca d liu; nh x Sammon [11] th hin hnh nh ca ma trn trong khng gian u vo; cc biu d liu v phng php chiu tp d liu cho mc ch trc quan. Trc quan ho i tng thc cht l p dng SOM chn lc c tnh ni tri ca cc thnh phn d liu, bng cch nh mu t ng cho mi n v ca ma trn hoc n nh mu bng tay. Mi im ca i tng c nh du bng mu ph hp vi mu BMU ca im .

-30-

o ma trn (Map measures) l o cht lng ca SOM thng c c lng da trn phn gii ca n v cch bo ton tt hnh thi ca tp d liu trn ma trn. Cc o cht lng khc ca ma trn c th da vo s phn cm chnh xc ca ma trn , nhng li i hi cc v d u vo phi c gn nhn. Ngoi o trn, cht lng ca SOM c lin quan n kch thc tht ca tp d liu ban u. Nu kch thc ma trn SOM ln hn kch thc d liu u vo, th ma trn khng th th hin theo phn b ca tp d liu ban u. Nh vy s mu thun vi mc ch bo ton trng thi v phn gii ca ma trn. Mt ma trn vi phn gii khng ph hp c th ph v hnh thi ca n. Thng phn gii l mt o trung bnh li lng t trn ton b tp d th nghim:

q =

1 N

i =1

xi mc

Phn cm: cc thut ton phn cm d liu nh l K thnh phn chnh hoc ISODATA [9], thng ti thiu khong cch trong cm v cc i khong cch gia cc cm. o khong cch c th cn c vo lin kt n hoc lin kt y . Lin kt n l o khong cch t mt cm X n cm Y no bng cch cc tiu khong cch gia thnh phn cc cm qX (qX X) v qY (qY X), lin kt y l o khong cch bng cch cc i, thng c xc nh nh sau:

ds ( X , Y ) = min{d (qX , qY ) | qX X , qY Y } dc ( X , Y ) = max{d (qX , qY ) | qX X , qY Y }


Hn ch trong lin kt n l cc cm d tr thnh chui di do khng in hnh cho d liu. Mt khc, vi lin kt y i khi vt qu gii hn cho php. tng kt hp gia lin kt n v lin kt y hon ton c th thc hin c. Bng cch gn o cho cc im trong cm vi trng s ph hp. Nh vy,

-31-

o va gn c gi tr cho tt c cc im ging nh khong cch va gi c hnh thi ca cm d liu. Phng php SOM hon ton c th c dng nh mt php o. 2.2.4 SOM vi bi ton phn cm SOM l phng php phn cm theo cch tip cn mng nron v thut ton hc ganh ua. Vect trng s ca ma trn SOM chnh l trng tm cm, vic phn cm c th cho kt qu tt hn bng cch kt hp cc n v trong ma trn to thnh cc cm ln hn. Mt im thun li ca phng php ny l vng Voronoi ca cc n v ma trn l li, bng cch kt hp ca mt s n v trong ma trn vi nhau to nn cc cm khng li. Vic s dng cc o khong cch khc nhau v cc chun kt lin kt khc nhau c th to thnh cc cm ln hn. Ma trn khong cch: chin lc chung trong phn cm cc n v ca SOM l tm ma trn khong cch gia cc vect tham chiu v s dng gi tr ln trong ma trn nh l ch s ca ng bin cm [11]. Trong khng gian ba chiu, cc cm s c th hin nh cc thung lng. Vn l lm sao quyt nh cc n v trong ma trn thuc v mt cm no cho trc. gii quyt c vn ny, ngi ta thng s dng thut ton tch t (agglomerative algorithm), gm cc bc: 1.Quy cho mi n v trong ma trn mt cm ring. 2.Tnh ton khong cch gia tt c cc cm. 3.Ghp hai cm gn nht. 4. Nu s cm tn ti bng s cm do ngi dng nh ngha trc th dng, nu khng lp li t bc 2 .

-32-

SOM l thut ton phn cm v mi n v trong ma trn ngay t u l mt cm con gm cc v d trong tp Voronoi ca n. SOM c th c hiu nh cm m: mi v d l b phn ca mi cm vi thnh phn gi tr t l vi hm ln cn ti im BMU ca n. S gii thch ny c th ph hp nu s lng cc v d cho mi cm ban u l nh hoc phng php m c dng nh mt bc x l sau da vo kt qu u ra ca SOM. Mc d, khng ging hu ht cc phng php ly mu c bn, trng thi ti u i vi SOM l bng khng, khi s cc mu bng s cc cm. thay i trng thi ti u th s cc n v trong SOM phi ln hn s cc cm a ra. Hm ln cn th hin cc n v ln cn trong ma trn, v vy cc n v ny phi c thuc tnh ging nhau hn so vi cc n v trong cc cm khc. S di chuyn t mt cm ny sang cm khc trong ma trn din ra t t trn mt s n v trong ma trn. iu ny c ngha l nu s cm mong mun l nh th ma trn SOM cng phi c phn cm. Dng SOM nh mt bc trung gian phn cm, l cch tip cn gm hai mc: u tin phn cm tp d liu, v sau phn cm SOM. Vi mi vect d liu ca tp d liu ban u thuc cng mt cm c mu gn n nht. Mt u im ca cch tip cn ny l gim thi gian tnh ton, iu ny d dng phn bit c vi cc thut ton phn cm khc m in hnh l cy phn cp thm ch vi mt lng nh cc v d ban u cng tr nn nng n. Chnh v vy cch tip cn ny l hon ton ph hp cho vic phn cm mt tp cc mu hn l lm trc tip trn tp d liu. C th s dng cc phng php phn cm b phn hay phn cm theo phn cp phn cm SOM. Cc mu c th c phn cm trc tip hoc phn cm theo mt s c tnh xc nh trc ca SOM. Trong phn cm b phn cc n v ni suy c th b b qua khi phn tch [3]. Trong phn cm tch t quan h ln cn SOM c th c dng rng buc kh nng hp nht trong cu trc dng cy dendrogram.

-33-

Nu iu ny c dng kt hp vi cc rng buc ln cn, cc n v ni suy th hin ng bin trong ma trn m vn tun theo cu trc dendrogram. Ngoi ra, c th dng trc tip ma trn khong cch lm c s phn cm. V ma trn khong cch cho bit khong cch trung bnh ca mi vect mu n cc ln cn ca n v d on c phn b xc sut cc b, vic ti thiu cc b ca ma trn c dng lm trng tm hay im nhn ca cm. S phn chia c th c thc hin ngay sau bng cch xc nh n v trong ma trn gn tm nht hoc dng cch loang theo ti thiu cc b. SOM cng c p dng trong phn cm tp d liu khng chun ho. Dng quy tc ca hc ganh ua [5], vect trng s c th iu chnh theo hm phn b xc sut ca cc vect u vo. S tng ng gia vect u vo x v vect trng s w c tnh ton bng khong cch clit. Trong sut qu trnh hun luyn mt vect trng s wj tu c cp nhp ti thi im t l:

w j (t ) = (t ) hcj (t ) x (t ) w j (t )

Vi (t) l t l hc gim dn trong qu trnh hun luyn, v hci(t) l hm ln cn gia vect trng s chin thng wc, v vect trng s wj , hci(t) cng gim dn trong qu trnh hun luyn. Mi quan h ln cn c xc nh bng cu trc hnh hc v mi quan h ny c nh trong sut qu trnh hc. Kt thc qu trnh hc, iu chnh li bn knh ln cn nh cp nhp li cho cc vect trng s chin thng wc v cc ln cn gn chng nht. i vi cu trc mt chiu n c th c biu din bng lut hun luyn. Cng thc trn l mt sp x ca hm n iu ca phn b xc sut trn cc vect u vo. Trong cu trc hai chiu th kt qu tr v l mt s tng quan gia xp x v bnh phng li ti thiu ca vect lng t. Trong trng hp tn ti vng tho mn v tn ti phn b cc tm cm, vic c lng quan h chin thng ca cc nron l m phng trc quan cc cm. Hnh 15 th hin nm cm bng cch m ho mc xm cho histogram chin thng. D

-34-

liu hn hp Gaux c sinh ra bng vic c nh nm tm cm v nm ma trn khc nhau. Kch thc ca tp d liu sinh ra v tp d liu thc nghim l bng nhau, v d on tng th cc ma trn c xp x bng nhau. Cc n v c gn mu en trong hnh 15 l cc nron cht, cc nron ny d dng phn bit cc cm vi nhau.

Hnh 15. Vect chin thng lin tc i vi SOM c 30x40 nron cho d liu hn hp Gaux bo ton hnh thi ln cn trong ma trn, vect trng s trong khng gian u vo cng c t gn nhau trong khng gian u ra. nh x t khng gian u vo ti khng gian u ra hu nh lin tc, nhng ngc li th khng ng. V vy, hai vect trng s v mt hnh hc l gn nhau nhng khng phi cng th hin trn mt cm. Nu khong cch ca chng l nh, th chng c th l mt cm, nu ngc li chng xut hin cc cm khc nhau. Trc quan ho khong cch ln cn gia cc vect trng s c a ra trong ma trn hp nht khong cch.Vi mi vect trng s wxy, vi x v y l cc ch s hnh thi, khong cch clit dx v dy gia hai ln cn v khong cch dxy ti ln cn tip theo c tnh nh sau:

dx ( x, y ) = w x , y w x +1, y

dy ( x, y ) = w x , y wx , y +1
dxy ( x, y ) = wx , y +1 wx =1, y 1 wx , y wx +1, y +1 + 2 2 2

-35-

Khong cch du c tnh bng gi tr trung bnh ca tm khong cch bin xung quanh. Vi bn khong cch cho mi nron dx, dy, dxy v du, nh vy d dng xc nh ma trn hp nht v ma trn ny c kch thc l (2nx-1)(2ny-1).

Hnh 16. nh ngha mt U-Matrix Trong hnh 17 cc thnh phn ca U-matrix c m ho theo mc xm. Ch sng l cc gi tr thp v ch ti cho gi tr cao. Nh vy, cc cm trn ma trn l cc vng c khong cch nh gia cc trng s v gia cc cm vi nhau li c khong cch ln.

Hnh 17. U-Matrix ca SOM trong hnh 15

2.2.5 Cc phng php phn cm khc a. Cy phn cp [9] Mc ch l kt ni lin tip cc i tng vi nhau vo trong cc cm ln, dng mt s o nh khong cch hay thuc tnh ging nhau. Xt mt biu cy c th t v nm ngang, bt u t i tng bn tri ca biu , tng tng rng

-36-

trong mi bc chng ta ni lng dn cc tiu chun. Hay din t bng cch khc l gim dn ngng khi a ra quyt nh c hai hay nhiu i tng l cc thnh phn ca cng mt nhm. Bng cch ny chng ta c th kt ni ngy cng nhiu cc i tng li vi nhau v mt tp hp ngy cng ln cc cm khc nhau. Cui cng, tt c cc i tng c ni li vi nhau. Trong cc biu , trc honh xc nh khong cch lin kt. V vy mi nt trn th chng c th th hin khong cch tiu chun m cc thnh phn tng ng c lin kt vi nhau trong mt cm n. Khi cu trc d liu rng cc thnh phn ca trong cc cm ca i tng m ging nhau th cu trc s c th hin trong cy phn cp nh cc nhnh ring bit b. K thnh phn chnh (Hartigan, 1975) [9] y l phng php phn cm rt kh, gi s rng lun c cc gi thuyt lin quan n mt s nhm trong cc v d. iu mong mun l c th sp xp mt cch chnh xc cc cm ri rc nhau. Cc nghin cu cho thy rng ch c th thc hin c bi thut ton K thnh phn chnh. Tm li phng php K thnh phn chnh s a ra chnh xc k cm tch bit ln nht c th. Cho mt c s d liu ca n i tng v k l s cc cm cho trc, thut ton t chc phn chia cc i tng vo k phn (kn). Cc cm c thit lp theo mt tiu chun phn chia khch quan, thng c gi l hm tng ng (similarity function), dng khong cch xc nh cc i tng trong mt cm l ging nhau v khc nhau v tnh cht d liu. Thut ton K thnh phn chnh c thc hin theo bn bc sau: Xc nh thnh phn cc i tng vo trong k tp con khc rng. Tnh cc im nhn ca cm trong cc thnh phn hin ti.

-37-

Chia i tng vo cm khi i tng c khong cch gn im nhn nht. Lp li bc 2, v dng khi khng cn s phn chia mi.

Thut ton: Input: Output: s cc cm k v mt d liu gn n i tng. Mt tp gm k cm v ti thiu tiu chun bnh phng li.

Phng php: (1) (2) (3) (4) (5) Chn tu k i tng v coi l cc nhn cm ban u; Lp Xc nh li mi i tng vo cm sao cho i tng l ging nhau nht, da vo gi tr trung bnh ca cc i tng trong cm; Cp nhp li cc nhn cm, bng cch tnh gi tr trung bnh ca cc i tng cho mi cm; Cho n khi khng cn thay i no.

c. Cc i k vng (Expectation Maximization)[9] y l phng php gn ging nh K thnh phn chnh, k thut ny tm cm trong s cc i tng quan st hoc cc bin th v n nh cc i tng vo cc cm. Mt v d ng dng nhiu nht cho phn tch ny l nghin cu th trng bit thi ca ngi tiu dng c lin quan n i tng nghin cu. Mc ch ca nghin cu ny l tm ra cc mng th trng. Trong khi thut ton K thnh phn chnh a ra mt s c nh k cc cm, th cc i k vng m rng cch tip cn ny phn cm bng hai cch sau: Thay th vic xc nh cc trng hp hoc cc quan st n cc cm cc i ho s khc nhau cho cc bin th tip theo, cc i k vng tnh

-38-

ton cc kh nng ca cc thnh phn trong cm da trn phn b xc sut. Mc tiu ca thut ton phn cm sau ny l cc i ton b xc sut hoc cc kh nng c th xy ra ca d liu, cui cng mi a ra cc cm. Khng ging nh phn cm K thnh phn chnh, thut ton tnh cc i k vng c th c p dng cho c cc bin thay i lin tc v cc bin c nh (trong khi K thnh phn chnh c th cng c iu chnh ph hp vi cc bin c nh). 2.3 Mt vi ng dng ca SOM Thut ton SOM c s dng trong nhiu lnh vc khc nhau vi trn 5000 ng dng [13], SOM khng nh c cc u im sau: SOM rt c hiu qu trong qu trnh phn tch i hi tr thng minh a ra quyt nh nhanh chng trn th trng. N gip cho ngi phn tch hiu vn hn trn mt tp d liu tng i ln. C kh nng biu din d liu a chiu dng trong trnh by v lm bo co. V y cng l mt vn chnh c cp n nhiu trong lun vn ny. Xc nh cc cm d liu (v d cc nhm khch hng) gip cho vic ti u phn b ngun lc (qung co, tm kim sn phm, ...). C th dng pht hin s gian ln trong th tn dng, v cc li d liu.

Lun vn cp n cc vn v ti chnh v ngn hng do chng ta s cha cp n cc ng dng ca SOM trong cc ngnh khc.Trong phn ny gii thiu hai ng dng ca SOM trong lnh vc ti chnh, n chng sau s trnh by cc cch thc xy dng mt ng dng c th ca SOM trong phn cm vi mt bi ton c th trn d liu ca mt Ngn hng Vit Nam.

-39-

2.3.1 La chn qu u t Khi chn la cc qu cho mc ch u t, nh u t thng phi xem xt n nhiu ch tiu: kt qu bo co ti chnh trong nhng nm gn y; cc ri ro; nng lc ti chnh ca qu; t l doanh thu; chi ph; thi gian b nhim ca ngi qun l. Phn ln trong thc t cc chng trnh c thng lm vic trn hai hoc ba ch tiu; hay cc chng trnh c minh ho hnh v cng b gii hn cch th hin trong khng gian. Vi SOM n c th kt hp tt vi bt k kch thc no ca tp d liu v a ra cch th hin thu gn ca d liu trong ma trn hai chiu cng vi vic ly ra cc tnh cht tu hoc nh trng s cho cc ct, xy dng ch s hp nht hoc cho mc ch tng th. minh ho cho vn ny, chng ta s dng c s d liu ca MorningstartTM [7] tm kim v phn tch thng tin trong mt tp hp cc qu. Trong v d ny chng ta tp hp cc qu c u t vo th trng chng khon th gii. Cc tiu chun dng la chn l (1) nhim k qun l ca nh lnh o l >= 3 nm;(2) S c ng >= B+ (B l mt s >=3); (3) T l Morningstart >=4; (4) T l chi ph >=1%. Da vo y chng ta a ra khong 50 qu c u t chng khon. Cn c vo c s d liu ca cc qu chn ra cc bin chnh. Tp d liu u vo c chn lc sao cho gi tr ca mi ct l bnh ng. Mt ma trn gm 50 qu c th hin trong hnh v. SOM th hin s khc nhau gia cc qu vi t l 4 hoc 5. SOM thu c da vo s m t d liu cho bit s khc nhau gia cc qu c phn cm theo cng mt tn loi. Thng tin tt hn, trong s khc bit chnh cc qu vi nhau gip cho vic la chn tt thnh phn cc danh mc vn u t c nh hng tt hn mong mun ca nh u t. Tm li, t 50 qu ca th trng chng khon th gii, t kt qu ca SOM chng ta c 3 nhm chnh. T kt qu ny s h tr cho vic ra quyt nh nn chn nh qun l no

-40-

Nhm 1: l tp hp cc qu c ngi qun l c t hn 3 nm nhim k, danh mc vn u t ca h c doanh thu cao hn v t l ph tn cng cao. Nhm 2: chim phn ng, gm cc qu c ngi qun l c nhiu hn s nm nhim k, doanh thu ca vn u t t hn v t l ph tn thp hn.

Hnh 18. M phng SOM cho 50 qu c u t chng khon


Nhm S lg nhm Nh qun l Gi tr ti sn T l D/Thu Front Load Defer Load T l ph tn

1 2 3

5 36 6

2.8 3.3 7.2

658.2 272.4 6638.3

80.8 70.7 52.7

0 2.2 4.8

4.6 0.1 0

2.3 1.7 1

Nhm 3: l cc nhm c ngi qun l c s nm nhim k cao hn (gp 2 ln nhm 1), doanh thu vn u t t hn nhm 2 v t l ph tn cng t nht 2.3.2 nh gi ri ro tn dng gia cc nc Mt v d khc lin quan n vic phn tch cc c hi u t cho th trng mi pht trin. Trong v d ny tp trung vo cc ri ro lin quan trong u t vo trong

-41-

cc th trng chng khon. SOM c dng phn tch cc ri ro v nhm cc nc c ri ro gn ging nhau. Vic phn tch da vo mt bi bo ca Greg Ip pht hnh trong bi bo ca Ph Wall (WSJ) nm 1997 [7]. Trong bi mc ch u t: tr chi ri ro Greg Ip sp xp 52 quc gia trn th gii da vo hiu qu kinh t ; chnh tr, kinh t v ri ro ca th trng; kh nng thanh ton ca cc th trng chng khon; s iu chnh v hiu qu trn th trng ca cc quc gia. Cc quc gia c phn chia thnh nm nhm: (1) cc nc ging M nht; (2) cc nc pht trin khc; (3) cc th trng mi v pht trin; (4) cc th trng mi hon ton; (5) cc th trng ranh gii. Trong US c coi l mt im chun phn lp cc quc gia; cc quc gia c chia thnh nm nhm; ch tiu c dng phn chia khng c cung cp r rng; cc quc gia thuc nhm s nm c rt nhiu d liu b thiu. Cng mt d liu v cng mt cng vic phn tch trong mt cch tng tc v cch trc quan to ra mt SOM chng ta nhn c kt qu hon ton khc. Trong hnh 20 cc ca s thnh phn ca t l gi hin ti v t l gi forward, li tc, chim dng vn th trng, s cc cng ty v tnh cht khng n nh c th hin. i vi mi ca s thnh phn, mu mi nt th hin khong gi tr ca mi thnh phn, gi tr thp hn c nh mu xanh v gi tr cao c nh mu ; cc gi tr gia mu xanh sng n mu xanh l cy, n mu vng, mu cam. So snh cc gi tr thnh phn trong s cc vng c th th hin c s ph thuc khng tuyn tnh v v vy nhn din trc quan ngha ca cc cm. Mt ma trn vi cc rng buc cho bit s lng cc cm nh sau US, n v Nht cc cm khc nhau, US v Nht c th trng vi nh hng ln, n c s lng ln cc cng ty lit k trong thanh ton hi phiu; Th Nh K v Phn Lan xc nh mt nhm v cc quc gia cn li cha c phn ho. R rng gi s

-42-

gii hn ca SOM v nm cm trong trng hp ny l khng cung cp c mt lng ln cc thng tin mi.

Hnh 19. M phng cc quc gia c ri ro tn dng theo d liu ca WSJ

Hnh 20. M phng cc quc gia c ri ro tn dng theo d liu ca WSJ

-43-

Nu cc rng buc gi s c thay i chng ta thu c cc nhm quc gia hon ton khc da trn cc ri ro ca quc gia: Cm1: c, New Zealand, Canada v phn ln cc nc Chu u. Cm2.: Phn ln cc nc M La tinh, v ng u. Cm3: Mexico, Philippines, Bc Phi v Cng ho Sc. Cm4: Nam Triu Tin, Malaysia, Thi Lan v Indonesia. Cm5: Singapore v Hng Kng. Cm6: Hungary v Venezuela. Cm7: Brazil. Cm8: Phn lan. Cm9: n v Pakistan. 2.4 Kt lun chng 2

Phng php hc mng nron khng gim st c trng l thut ton hc ganh ua l phng php thch hp trong khai ph d liu. Trong chng ny tp trung chnh vo thut ton SOM vi bi ton phn cm. SOM l mt k thut mng nron truyn thng s dng thut ton hc khng gim st (hc ganh ua) v qua mt qu trnh t t chc, sp xp u ra cho trong th hin hnh hc ca d liu u vo. S dng SOM trong khai ph d liu nh mt bc trung gian gii quyt bi ton phn cm d liu. M trc tin l dng SOM phn cm tp d liu u vo, sau SOM thu c li c phn cm bng phng php phn cm theo phn cp hoc phn cm b phn. So snh SOM vi mt s phng php phn cm c dng nh phn cm theo cy phn cp, K thnh phn chnh, cc i k vng, ... thy rng phng php SOM c nhiu u im nh: D liu u vo c th ln, khng hn ch kch thc ca d liu.

-44-

M phng trc quan d liu chnh xc t hiu c cu trc ca d liu. Tit kim c thi gian v khi lm vic trn cc mu th nhanh hn so vi d liu trc tip.

Trong chng ny cng cp n hai ng dng in hnh ca SOM trong khai ph d liu ti chnh l bi ton la chn qu cho u t ti chnh v nh gi ri ro tn dng ca cc quc gia trn th trng chng khon th gii.

-45-

CHNG 3. NG DNG M HNH SOM TRONG BI TON NGN HNG


3.1 Pht biu bi ton C rt nhiu phng php cho vic khm ph tri thc v khai ph d liu trong ti chnh v kinh t c s dng mng nron khng gim st. c bit, phng php s dng SOM c th trc quan ho tt hn i vi d liu c kch thc ln; to ra biu din cc mi quan h phc tp; ci thin cm v rt gn d liu; to iu kin thun li cho vic khm ph tri thc qua vic xc nh cc cu trc v mu mi trong d liu. Nhiu ng dng ca SOM c s dng lm cng ngh v cc lnh vc khoa hc khc. Cc ng dng ca SOM trong ti chnh, kinh t v th trng hu ht cn mi l. Trong phn ng dng ca SOM trn nu ra hai v d in hnh trong ti chnh c p dng phn la chn cc qu u t cho cc d n v nh gi ri ro tn dng ca cc nc trong lnh vc chng khon. Cn c vo quy trnh thc t ca phng tn dng ti cc Ngn hng, ti mun xut p dng SOM trong vic nh gi khch hng l cc doanh nghip c nhu cu vay vn. Bng cc th hin trc quan ca SOM iu ny c th gip cho cn b tn dng cng nh ban lnh o c nhng quyt nh khi duyt n vay ca khch hng. Quy trnh cho vay tn dng ti Ngn hng c thc hin theo cc bc sau: Khch hng n vay vn ti ngn hng phi cung cp y thng tin gm: ch tiu thanh khon (kh nng thanh ton), ch tiu hot ng (vng quay hng tn kho, k thu tin bnh qun, doanh thu trn tng ti sn), ch tiu cn n n (n phi tr), ch tiu thu nhp, kinh nghim trong ngnh ca ban gim c,...Ngoi ra, khch hng s phi trnh by phng n kinh doanh (s dng tin vay) v trn c s cn b tn dng d kin t l kh thi ca phng n.

-46-

Cn b tn dng nhp d liu vo chng trnh qun l khch hng ca ngn hng v thc hin phn loi khch hng. Nhn vin tn dng thay mt khch hng bo v k hoch vay vn ca khch hng trc hi ng tn dng. Cc thnh vin hi ng tn dng c/khng chp thun cho khch hng vay vn cn c vo h s ca khch hng c nhp vo chng trnh qun l khch hng.

p dng SOM trong vic phn tch thng tin khch hng vay vn trn mt khi lng ln cc d liu v khch hng c nhu cu (c th cha phi l khch hng chnh thc hoc cng c th l khch hng tng vay vn) va c ch cho nhn nh ca nhn vin tn dng lm c s bo v khch hng trc hi ng tn dng. Ngoi ra, n cn tr gip cho cc thnh vin trong hi ng a ra quyt nh ng hay khng ng cho khch hng vay vn. S dng cng c SOM ToolBox th hin trc quan cc thng tin khch hng. D liu c ly t chng trnh qun l chung ca Ngn hng. 3.2 Gii thiu cng c SOM Toolbox Cng c SOM Toolbox, mt sn phm ca nhm SOM Toolbox thuc trng i hc K thut Helsinki, l mt th vin gm cc hm vit bng Matlab. y l mt b cng c d s dng xy dng SOM cho cc mc ch nghin cu. c bit, trong lnh vc khai ph d liu, cc nh nghin cu coi y l mt cng c c c th ring, v chnh v vy SOM Toolbox nh hung trc tip n cc hm trc quan. Cng c c th c dng x l d liu, khi to v hun luyn SOM trn mt lot cc loi trng thi hnh hc, SOM th hin trc quan bng nhiu cch khc nhau, v phn tch cc thuc tnh ca SOM v d liu ban u, v d nh l c tnh ca SOM, cc cm trn ma trn v s lin quan gia cc thuc tnh. Trong khai ph

-47-

d liu, cng c Toolbox v SOM ni chung l mt cp ph hp nht cho vic hiu d liu mt cch tng qut, mc d n cng c th c dng cho xy dng hnh mu. 3.3 Cu trc chng trnh Cng c SOM Toolbox gm cc hm c vit bng Matlab. S dng cng c ny xy dng h thng phn tch thng tin khch hng, theo cc bc sau: c d liu; Xy dng cu trc d liu; X l d liu trc khi a vo hun luyn; Khi to mu v hun luyn theo thut ton SOM; M phng kt qu; Phn tch kt qu.

3.3.1 Xy dng tp d liu u tin, d liu phi c a vo trong Matlab. D liu c ly t chng trnh qun l ca Ngn hng lu trong h qun tr c s d liu SQL. D liu chnh l mt bng thng tin khch hng c lc, ch ly cc thuc tnh c xt nh sau: Bng 1: Thng tin khch hng ( d liu ca 30 khch hng)
Kh nng thanh ton Vng quay hng tn kho K thu tin bnh qun Doanh thu trn tng ti sn N phi tr/tng ti sn Thu nhp trc thu/doanh thu Kinh nghim ca ban G T l kh thi ca PA kinh doanh

4.86 2.9 2.3 1.7

10 7 6.5 6

40 32 37 43

0.46 4 3.5 3

19.3 25 35 45

21.3 8 7.5 7

2.2 10.5 5.8 2.7

0.56 0.85 0.78 0.35

-48-

1.4 1.3 1.17 1.14 4.7 1.25 1.7 2.3 2.9 1.4 1 0 6.4 0.85 2.5 1 1.8 1.3 1 1.2 2.3 0.9 0.75 1.4 2.5 7.6

5.5 5.2 7 4.21 18 1.83 6 6.5 7 5.5 4 2.3 0 3 4.3 3.4 4 3.7 2 3 3.5 1 0.8 14.2 4.2 1.3

50 60 30.7 28.9 11 43 43 37 32 50 60 58 102 60 30 55 40 50 55 50 40 60 71 8 10 61

2.5 1.5 4.03 2.1 4 0.76 3 3.5 4 2.5 2 1.02 0.75 1.2 4.2 1.5 3.5 2.5 3.5 4.2 5 2.5 2.3 1.8 1.5 0.2

55 61 85.3 61 19 72 45 35 25 55 65 0 15 75 40 55 45 50 55 50 45 60 66 28 40 7

6.5 4.8 2.3 2.8 8.3 0.03 7 7.5 8 6.5 5 15 22 3.5 6.5 4 6 5 8 9 10 7 6.8 0.7 1.1 17

1.8 0.9 5 11 12 6 2.8 6 13 1 1 2.6 2.5 0.5 11 0.9 8 4.5 3.6 7.5 10.2 1 0.9 12 10.5 1.5

0.33 0.60 0.75 0.81 0.55 0.44 0.38 0.45 0.97 0.23 0.13 0.8 0.72 0.25 0.98 0.36 0.57 0.64 0.46 0.67 0.69 0.31 0.3 0.8 0.74 0.65

Mi dng d liu l mt v d hay mt vect, cc gi tr trong dng l cc thnh phn ca vect hay cc bin th ca tp d liu. Cc bin th c th l cc thuc

-49-

tnh ca d liu hoc l mt tp hp cc gi tr ti cng mt thi im phn tch. Mt vi gi tr c th b thiu. Cng c SOM Toolbox gm cc cu trc d liu sau: Data struct: gm tt c cc thng tin lin quan n tp d liu. Kiu string string matrix matrix [m x n] [m x k] (k<n) [n x 1] [n x1] [k x 1] (k<n) Kch thc ngha nh danh kiu cu trc (som_data) nh danh tp d liu Bng d liu ban u Cc lable Tn thuc tnh/thnh phn Cu trc chun ho cho mi thuc tnh .lable_name matrix Tn cc lable

Tn trng .type .name .data .lables

.comp_name matrix .comp_norm matrix

Map struct: gm cc thng tin y v SOM. Kiu string string matrix topology struct matrix string vector struct [n x 1] [* x 1] [munits x *] [munits x n] Kch thc ngha nh danh kiu cu trc (som_map) nh danh ca ma trn Ma trn tn hiu. Mi dng tng ng vi vect trng s ca mt map. Cu trc hnh hc ca map: kch thc, kiu li v hnh dng. Cc nhn trn ma trn Tn hm ln cn (gaussian, cutgaussian, bubble, ep)

Tn trng .type .name .codebook .topol .lables .neigh .mask .trainhist

Mt n tm kim BMU Cu trc mng ca cc cu trc hun

-50-

luyn .comp_name matrix .comp_norm matrix [n x 1] [n x 1] Tn thuc tnh/thnh phn Cu trc chun ho cho mi thuc tnh Topology struct: gm cc thng tin v cu trc hnh hc ca ma trn. Kiu string vector string string [* x 1] (*>2) Kch thc nh danh (som_topol) .msize .lattice .shape Kch thc ca ma trn Kiu li, mc nh l lc gic Hnh dng tng qut ca ma trn ngha kiu cu trc

Tn trng .type

Normalization struct: thng tin chun ho. Kiu string string varies string Kch thc nh danh (som_norm) ngha kiu cu trc

Tn trng .type .method .params .status -

Phng php chun ho (var, range, log, logistic, histD, histC) Tu theo phng php khc nhau c tham s khc nhau Trng thi chun ho

Traing struct: gm cc thng tin khi khi to v hun luyn. Kiu string string Kch thc nh danh (som_train) ngha kiu cu trc

Tn trng .type .algorithm

Thut ton hun luyn/khi to

-51-

.data_name .mask .neigh .radius_ini .radius_fin .alpha_ini .alpha_type .trainlen .time -

string vector string scalar scalar scalar string scalar string [n x 1]

Tn d liu hun luyn Mt n tm kim BMU Tn hm ln cn (gaussian, cutgaussian, bubble, ep) Bn knh ln cn ban u Bn knh ln cn cui cng T l hc ban u ti thi im bt u hun luyn Kiu hm xc nh t l hc di hun luyn Ngy v gi thc hin hun luyn

Grid struct: gm cc thng tin trc quan ho SOM. Kiu string string string vector matrix [1 x 2] hoc [munits x 3] Kch thc Kiu li Hnh dng tng qut ca ma trn Kch thc ca ma trn C kch thc l 2 hoc 3 Kiu ng thng dng cho cc ng lin kt Mu ng thng nt ca ng thng Kiu du cho cc n v trong ma trn Kch thc kiu du Mu kiu du [munits x 2] To cc n v trong ma trn. ngha nh danh kiu cu trc (som_grid)

Tn trng .type .lattice .shape .msize .coord

.line .linecolor .linewidth .marker .markersize

string string scalar string scalar

.markercolor string

-52-

.surf

empty vector RGB

Mc nh l rng. Nu c gi tr th l nt v thm vo, nu l RGB th n l ch s mu trang tr Nhn cho mi Mu cho nhn Kch thc ch trn nhn

.label .labelcolor .labelsize

string string scalar

3.3.2 X l d liu trc hun luyn Mt cch tng qut khi tin x l d liu c th ch l s chuyn i n gin hoc thc hin chun ho trn s liu, sng lc loi b cc gi tr v l, tnh ton cc gi tr mi thay th chng. Cn bng cc gi tr trong b cng c ny l c bit quan trng, v thut ton SOM dng o clit tnh ton khong cch gia cc vect. Nu ch c mt gi tr nm trong khong [0,...,1000] v cc gi tr khc nm trong khong [0,...,1] th s nh hng n t chc ca ma trn v tc ng ca n n o khong cch. Ni chung, chun ho d liu mc ch l lm cho cc gi tr l ngang bng nhau. Cch thc mc nh thc hin vn ny l cn bng tuyn tnh tt c cc gi tr sao cho mi chnh lch khc bit bng mt. iu ny c th thc hin n gin bng hm sD = som_normalize(sD,var) hoc D = som_normalize(D,var). Mt iu thun li cho vic dng cc cu trc d liu thay th cho cc ma trn d liu l cu trc d liu th hin c thng tin chun ho trong trng .com_norm. Dng hm som_denormalize(sD) c th khi phc li gi tr ban u. 3.3.3 Khi to SOM v hun luyn C hai cch khi to SOM l khi to mt cch ngu nhin v khi to tuyn tnh, s dng hai thut ton hun luyn l thut ton hun luyn tun t v hun luyn theo khi.

-53-

a. Thut ton hun luyn tun t SOM c hun luyn lp i lp li. Trong mi bc hun luyn, chn ngu nhin mt vector v d x ly t tp d liu u vo v tnh khong cch gia x vi tt c cc vect trng s ca SOM theo mt vi o. Nron c vector trng s gn vi vector u vo x nht c gi l BMU, xc nh bi c:
x mc = min{ x mi
i

||.|| l o khong cch clit. y vic tnh ton khong cch l n gin hn l v Cc gi tr thiu: cc gi tr ny c thay th bng gi tr NaN trong vect hoc ma trn d liu. Cc thnh phn thiu c x l mt cch n gin bng cch loi tr (v d, gi s rng khong cch ||x-mi|| l bng 0). V cc gi tr ging nhau b b qua trong mi ln tnh khong cch, iu ny hon ton l hp l. Mt n (mask): Mi bin c mt phn kt hp ph, c nh nghi trong trng .mask ca ma trn v cu trc hun luyn. Trng ny c dng ch yu trong mu nh phn loi tr cc gi tr no t tin trnh tm BMU (1 gi li, 0 loi b). Tuy nhin, mt n c th ly bt k gi tr no, nn n c th c dng cho cc gi tr i km theo mc quan trng ca chng. Vi mi ln thay i, o khong cch l:

xm

= wk ( x k mk ) 2
kK

Vi k l tp cc gi tr (khng c gi tr thiu) ca vector v d x, xk v mk l cc thnh phn th k ca v d v vect trng s v wk l gi tr mt n th k.

-54-

Sau khi tm BMU, cc vect trng s ca SOM c cp nhp sao cho BMU c di chuyn gn n vector u vo hn trong khng gian u vo. Cc ln cn ca BMU c xem l nh nhau. S m phng ny th hin loang ca BMU v hnh thi cc ln cn ca chng v pha vect v d. SOM cp nhp quy tc cho vect trng s ca n v i l:

mi (t + 1) = mi (t ) + (t )hci (t )[x(t ) mi (t )]
t: x(t): thi im, mt vect u vo ly ngu nhin t tp d liu u vo ti thi im t,

hci(t): ln cn kernel quanh n v chin thng c,

(t): t l hc ti thi im t.
Ln cn Kernel l mt hm khng tng ca thi gian v khong cch ca n v i t n v chin thng c. N xc nh vng nh hng ca v d u vo c trong SOM. Vic hun luyn thng din ra hai giai on. Giai on u, c lin quan n t l hc ban u 0 v bn knh ban u 0. Giai on sau, gim t l hc v bn knh va nh so vi ban u. y l th tc ph hp iu chnh xp x SOM ti khng gian tng ng vi d liu u vo v sau iu chnh ma trn cho ng. b. Thut ton hun luyn khi Thut ton hun luyn khi cng l thut ton lp, nhng thay v ch dng mt vect ti mt thi im, m ton b tp d liu c th hin trn ma trn trc khi c bt k iu chnh no. Trong mi bc hun luyn, tp d liu c phn chia theo vng Voronoi ca cc vect trng s. Sau , cc vect trng s c tnh ton nh sau:

-55-

mi (t + 1) =

h
j =1 n j =1

ic

(t ) x j
ic

(t )

vi c = argmink{||xj-mk||} l ch s BMU ca d liu v d xj. Vect trng s l mt gi tr trng s trung bnh ca cc v d, vi trng s ca mi v d l gi tr hm ln cn hic(t) ti BMU ca n. Ging nh thut ton hun luyn tun t, cc gi tr thiu c b qua trong khi tnh ton gi tr trng s trung bnh. Ch rng trong thut ton x l khi ca K thnh phn chnh, cc vect trng s n gin ch l gi tr trung bnh ca tp d liu Voronoi. C kh nng, c th tnh ton trc tng cc vect trng s trong mi Voronoi:

si (t ) = x j
j =1

nvi

vi nvi l s cc v d trong tp Voronoi ca n v i. Sau , cc gi tr mi ca vect trng s c th c tnh ton nh sau:

mi (t + 1) =

h
j =1 m j =1

ij

(t ) s j (t )
j ij

nv h

(t )

vi m l s cc n v trong ma trn. Hm som_make la chn kch thc ma trn v cc tham s t ng, mc d n c mt s cc tham bin. Nu mun gim st cht ch bng cc tham s hun luyn, th c th s dng s khi to ph hp v cc hm hun luyn trc tip dng cc hm som_lininit, som_randinint, som_seqtrain v som_batchtrain. Ngoi ra, cc hm

-56-

som_topol_struct, som_train_struct c h c dng ly cc gi tr mc nh cho hnh dng ma trn, v cc tham s hun luyn tng ng. 3.3.4 M phng (trc quan ho) SOM c th c dng nh mt nn tng thch hp cho vic th hin cc c im khc nhau ca SOM (hay ca d liu). Trong cng c SOM Toolbox, c mt s hm m phng SOM, c chia lm 3 loi theo trc quan ban u: a. M phng (cell) da vo cch trnh by ma trn li trong khng gian u ra. M phng th hin SOM trong khng gian u ra: mt li hnh ch nht ca cc thuc tnh th hin cc gi tr lin quan. Ch rng, m phng ch lm vic vi cc ma trn 1-2 chiu v cc hnh cell v toroid v mc nh l sheet. Cng c c bn l hm som_show: som_show(sM); mc nh th hin ban u l ma trn hp nht khong cch c tnh ton da trn tt c cc gi tr v sau th hin cc mt phng thnh phn Ma trn hp nht khong cch m phng khong cch gia cc n v trong ma trn ln cn v h tr th hin cu trc cm ca ma trn: cc gi tr ln ca ma trn hp nht khong cch cho bit ranh gii cc cm, cc vng ging nhau c gi tr thp xc nh cm. Mi mt phng thnh phn th hin cc gi tr ca mi n v trong ma trn. Cc gi tr th hin dng ch s bng mu. Vi cc mu khc nhau, SOM Toolbox s dng cu lnh colormap, jet, hot, gray. Ngoi ra, cc kiu khc ca mt phng c th l:

-57-

Mt li rng ch th hin mt phn (edges) ca cc n v. iu ny c th c dng nh mt c s cho vic gn nhn hoc cc m phng khc vi mu nn c th lm nht hn.

Trong plane mu ca mi n v u l c nh mu. iu ny c th c dng th hin cho v d phn cm hoc thng tin nhn dng khc cho vic lin kt cc trc quan khc nhau. C cc cng c c bit nh som_colorcode v som_clustercolor l cc cng c v mu sc.

Trong hm som_show c nhiu tham bin u vo m c th c dng iu khin cc loi plane th hin v sp xp chng. Cc gi tr cn bng c th c chun ho li thnh d liu ban u (nu c th) v c nhiu tham s thay i cch nhn ca s m phng ni chung, ging nh s nh hng ca bng mu. Mt hm lin quan trong som_show_add thit lp cc thng tin thm vo mt con s c to ra bi som_show nh l: nhn, biu (hit histogram), qu o (trajeactories). Gn nhn, c thc hin bi hm som_autolabel, c dng cho cc loi n v (hoc mt vi n v), bng cch ghi tn ca chng. Biu c nh du th hin phn b ca cc n v ph hp nht cho mt tp d liu a ra. Nhiu biu c th c v v chng c nhn dng bi cc mu khc nhau v/hoc cc du khc nhau. Nh vy c th so snh cc tp d liu bng phn b hits ca chng trn mt ma trn. Cc biu c th c tnh ton dng hm som_hits. Qu o th hin cc n v ph hp nht i vi mt tp d liu th hin l chui thi gian (time series) (hoc bt k chui c sp). N c th l mt ng kt ni lin tc cc n v ph hp nht hoc mt vt qu o gia n v ph hp nht hin ti (d liu v d u tin) c du ln nht v n v ph hp nht cui cng (d liu v d cui cng) c du nh nht. Hm som_trajectory c dng tc ng qu o phn tch v thm

-58-

tr cho phn iu khin ma trn v chui thi gian trong sut qu trnh nghin cu qu o. Som-show dng th tc som_cplane lm c s. Th tc ny c th c dng xy dng tu bin cc kiu m phng . Cc tham s tu chn gm: Mu ca cc n v, Kch thc cn bng cc n v, V tr cc n v, Hnh mu ca n v (a gic tu ), Mu ca cc n v (bng cch cn bng v tr ca cc nh).

b. M phng hnh nh th hin mt hnh nh n gin trong mi n v ca ma trn. M phng hnh nh phn ln l v codebook ca SOM, l mt tp cc hnh nh thng thng. tng l mi n v ca codebook c th hin bng biu hnh trn, v cc biu c b tr cng mt cch nh l cc n v trong cc m phng . Biu hnh trn (som_pieplane) l tng th hin cc gi tr t l. Mu sc v kch thc cc phn chia c th c thay i bng cch dng cc tham s khc nhau. Biu khi (som_barplane) ph hp vi vic th hin cc gi tr cc loi khc nhau. Mu sc ca mi khi v khong trng c th c xc nh trc. Hnh du (som_plotplane) th hin cc vect codebook nh cc hnh hc n gin. Mu sc ca nt v c th c xc nh i vi mi ng ring bit.

-59-

c. M phng li th hin ma trn nh mt li hay th phn tn (scatter plot) Hm som_grid c th c dng v li kiu li. Hm ny xut pht t tng m phng li tp d liu ch n gin gm mt tp cc i tng, vi mi mt v tr, mu sc v hnh nh. Hn na, cc lin kt gia cc i tng, v d quan h ln cn, c th c th hin dng cc ng thng. Vi som_grid ngi s dng c th n nh tu cc gi tr cho mi thuc tnh ca chng. V d cc to x, y, z, kch thc i tng v mu sc c th mi trng thi cho mt bin, v th c th m phng ng thi nm bin. Cc la chn khc nhau l: V tr ca i tng c th c kch thc l 2-3. Mu sc ca cc i tng c th la chn tu t vect RGB, s dng ch s mu c th. Hnh nh ca i tng c th l bt k du ca matlab (.,+). Hn na cc i tng kt hp vi cc nhn l c th c th hin. B mt gia cc n v trong ma trn c th c v thm vo li.

3.3.5 Phn tch kt qu phn tch nh lng ca SOM th ch c mt vi cng c. Tuy nhin, dng cc hm nh som_neighborhood, som_bmus, v som_unit_dists, th cng d dng thc hin mt s phn tch. Nhiu nghin cu ang c thc hin trong lnh vc ny, v nhiu hm mi cho vic phn tch s c a thm vo cng c SOM Toolbox trong tng lai, v d cc cng c phn cm v phn tch cc thuc tnh ca cm. Ngoi ra, s dng hm som_quality(sMap,D) xc nh o cht lng ca ma trn SOM trong d liu ban u. Hm tr v hai kt qu, mt l khong cch trung bnh ca mi vect d liu vi BMU ca chng (li lng t ho), v hai l t l ca tt c cc vect d liu i vi BMU th nht v th hai khng lin k (li hnh thi).

-60-

3.4 Mt s nhn xt 3.4.1 phc tp tnh ton Mi mt giai on ca thut ton hun luyn tun t c th c thc thi nh sau: for (j=0; j<n; i++) { bmu=-1; min=1000000; for (i=0; i<m; i++){ dist=0; for (k=0; k<d; k++) { diff=X[j][k] M[i][k]; dist+=diff*diff;} if (dist<min) {min=dist; bmu=i;} } for (i=0;i<m; i++) { h = alpha*exp(U(bmu,i)/r); for (k=0; k<d; k++) M[i][k]-=h*(M[i][k] X[j][k]); } } Vi X[j][k] l thnh phn th k ca v d th j, M[i][k] l thnh phn th k ca n v th i v U l mt bng khong cch ma trn li bnh phng gia cc n v trong ma trn c tnh ton trc. Gi s dng hm ln cn Gaux v bn knh r tng ng vi 2r(t)2. Do , mi giai on cho thut ton hun luyn theo khi s l: for (i=0; i<m; i++){ vn[i] = 0; for (k=0; k<d; k++) S[i][k] = 0;} /* khi to */ for (j=0; j<n; i++) { bmu=-1; min=1000000; for (i=0; i<m; i++){ dist=0; for (k=0; k<d; k++) { diff=X[j][k] M[i][k]; dist+=diff*diff;} if (dist<min) {min=dist; bmu=i; vn[bmu]++;}

-61-

} for (k=0; k<d; k++) S[bmu][k] += X[j][k]; } for (i=0; i<m; i++) for (k=0; k<d; k++) M[i][k] = 0; for (i1=0; i1<m; i1++) { htot = 0; for (i2=0; i2<m; i2++) { h = exp(U[i1][i2]/r); for (k=0; k<d; k++) m[I1][K] += H*S[i2][k]; htot += h*vn[i2]; } for (k=0; k<d; k++) M[i1][k] /=htot; } C 6nmd + 2nm cc ton t (cng, tr, nhn, chia hoc lu tha) trong thut ton hun luyn tun t v 3nm + (2d +5)m2 + (n+m)d cc ton t trong thut ton hun luyn khi. V vy, phc tp tnh ton cho mi ln hun luyn ca thut ton tun t l O(nmd) v nu n>=m, phc tp tnh ton cho hun luyn khi ch bng mt na ca thut ton tun t. Nu s dng cc tham s mc nh i vi cc hm trong ToolBox th cng c th tnh ton c phc tp trong ton b qu trnh hun luyn. S cc n v ca m ma trn l t l vi cn bc hai ca n v s lng cc ln hun luyn t l vi m/n. Vy phc tp tnh ton cho ton b qu trnh to SOM l O(nd) nn c th p dng cho cc tp d liu ln, mc d kch thc cc ma trn ln i hi tn nhiu thi gian hn. Tt nhin, trong mt vi trng hp s lng cc n v trong ma trn cn c la chn l khc nhau, v d m=0.1n th trong mt vi trng hp phc tp li l O(n2d). Tuy nhin, trn thc t cng c mt vi s khc bit ng k ca SOM. V c bn c nhng nghin cu ch trong mt s lng nh cc n v trong ma trn lm tng tc tm kim phn t chin thng nn phc tp ch l O(md) n O(log(m)d).

-62-

Sau y l mt s kt qu so snh gia thut ton hun luyn tun t v thut ton hun luyn theo khi. Bng 2 th hin cc ch s ban u Tham s Kch thc d liu di d liu S cc n v trong ma trn Hm hun luyn Hm ln cn K hun luyn Gi tr 10,30,50,100 300,1000,3000,10000,300 00 30,100,300,1000 som_batchtrain som_seqtrain gaussian' 10 k

Bng 3: Kt qu thi gian tnh ton (10 k hun luyn) D liu [300 x 30] [1000 x 30] [3000 x 30] [10000 x 30] [3000 x 10] [3000 x 30] [3000 x 50] [3000 x 100] [3000 x 30] [3000 x 30] [3000 x 30] [3000 x 30] S n v Thut ton trong ma trn som_batchtrain 100 0.4 s 100 100 100 300 300 300 300 30 100 300 1000 1.0 s 2.6 s 8.6 s 5.4 s 7.7 s 9.8 s 16 s 14 s 26 s 1.1 min 4.5 min Thut ton som_seqtrain 4s 13 s 40 s 2.3 min 43 s 1.3 min 1.8 min 3.8 min 4.4 min 6.7 min 13 min 34 min

-63-

3.4.2 Kt qu chy chng trnh Mt s kt qu chy chng trnh trn s liu c cu trc nh bng 1. B d liu c ly ngu nhin t chng trnh qun l thng tin khch hng (gm 150 khch hng).
% BUOC 1: DOC DU LIEU TU FILE % ====================== try, sD = som_read_data('custbank4.data'); data read ok end pause % An phim bat ky de tiep tuc...

-64-

% BUOC 2: XU LY DU LIEU % ========================== sD = som_normalize(sD,'var'); x = sD.data(1,:) x = 0.7042 1.6677 -0.1638 -0.9779 0.8998 -0.3327 -0.0307 2.6831

orig_x = som_denormalize(x,sD) orig_x = 2.5000 1.0000 4.3000 30.0000 4.2000 40.0000 6.5000 15.0000

pause % An phim bat ky de huan luyen...

-65-

% BUOC 3: HUAN LUYEN DU LIEU % ==================== sM = som_make(sD); Determining map size... kich thuoc cua dlen: 150 kich thuoc cua munits: 62 kich thuoc cua munits: 62 kich thuoc cua sTopol.msize: 8 kich thuoc cua sTopol.msize: 8 map size [11, 6] Initialization... kich thuoc cua munits: 100 kich thuoc cua sTopol.msize: 10 kich thuoc cua sTopol.msize: 10 Training using batch algorithm... Rough training phase... kich thuoc cua munits: 66 kich thuoc cua dlen: 150 kich thuoc cua mpd: 4.400000e-001 kich thuoc cua traninlen: 5 Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Finetuning phase... kich thuoc cua munits: 66 kich thuoc cua dlen: 150 kich thuoc cua mpd: 4.400000e-001 kich thuoc cua traninlen: 18 Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Training: 0/ 0 s Final quantization error: 1.071 Final topographic error: 0.033 pause % An phim bat ky de tiep tuc...

-66-

% BUOC 4: TRUC QUAN HOA SELF-ORGANIZING MAP: SOM_SHOW % ===================================================== colormap(1-gray) som_show(sMap,'norm','d') pause % An phim bat ky de tiep tuc...

-67-

% BUOC 4: TRUC QUAN HOA SELF-ORGANIZING MAP: SOM_SHOW % ===================================================== h=zeros(sMap.topol.msize); h(1,2) = 1; som_show_add('hit',h(:),'markercolor','r','markersize',0.5,'subplot','all') pause % An phim bat ky de tiep tuc...

-68-

% BUOC 4: TRUC QUAN HOA SELF-ORGANIZING MAP: SOM_SHOW % ===================================================== som_show(sMap,'umat','all','empty','Labels') pause % An phim bat ky de tiep tuc...

-69-

% BUOC 4: TRUC QUAN HOA SELF-ORGANIZING MAP: SOM_SHOW % ===================================================== som_show_add('label',sMap,'Textsize',8,'TextColor','r','Subplot',2) pause % An phim bat ky de tiep tuc...

Kt qu trn cho thy thng tin khch hng s dng cng c SOM ToolBox c 03 cm: Cm 1: c khch hng BR Cm 2: gm cc khch hng A, D, FA, CE, B, CA, BD, CH, BA, CD, BN, BC, CK, CS, BJ. Cm 3: gm cc khch hng AL, AW, AX, CB, AP, AV, DO, AQ, CZ, CP, BF, BY, T, AJ, EQ, X, AH, CM, BE, H, P, CN, CL, BQ, J, S, BW, BM, CX, CU.

-70-

% % sM = kich kich

STEP5: CLUSTERING OF THE MAP ===================== som_autolabel(sM,sD,'vote'); thuoc cua bmu: 150 thuoc cua Labels: 66

subplot(1,3,1) [c,p,err,ind] = kmeans_clusters(sM, 7); %Chia SOM thnh 07 cum n_max: 7 c_max: 5 plot(1:length(ind),ind,'x-') [dummy,i] = min(ind) dummy = 0.7652 i = 5 %So cum co duoc tu thuat ton subplot(1,3,2) [Pd,V,me,l] = pcaproj(sD,2); Pm = pcaproj(sM,V,me); Code = som_colorcode(Pm); hits = som_hits(sM,sD); U = som_umat(sM); Dm = U(1:2:size(U,1),1:2:size(U,2)); Dm = 1-Dm(:)/max(Dm(:)); Dm(find(hits==0)) = 0; som_cplane(sM,Code,Dm); subplot(1,3,3) som_cplane(sM,cl) pause % Strike any key to continue...

-71-

3.4.3 So snh vi cc cng c khc Cho n nay phn ln cc ng dng ca SOM c xy dng bng cc phn mm bi cc nh nghin cu. Cng c SOM ToolBox v SOM_PAK l cc cng c c sn v khng cn bn quyn. Trong phm vi ca lun vn s dng cng c SOM ToolBox p dng cho bi ton phn loi khch hng tn dng ca Ngn hng. Cng c SOM_PAK, mt cng c ca c gi tr ca trng i hc K thut Helsinki. SOM_PAK c bit ph hp vi cc nghin cu khoa hc chy trn my UNIX, khng dng cho cc h iu hnh ca Microsoft (MS DOS, WINDOWS). Ngoi ra, cn c cc cng c phn mm thng mi cho SOM c trn th trng [8]. V c bn cc phn mm ny cng c xy dng l nh nhau. Tuy nhin cc phn mm thng mi c thit k ph hp vi cc h iu hnh chun v c thm bc x l trc v x l sau d liu. Sau y l mt danh sch cc phn mm thng mi hin c: 1. SAS Neural Network Application 2. Professional II+ from NeuralWorks 3. MATLAB Neural Network Toolbox 4. NeuroShell2/NeuroWindows 5. NeuroSolutions v3.0 6. NeuroLab, A Neural Network Library 7. havFmNet++ 8. Neural Connection 9. Trajan 2.1 Neural Network Simulator 10. Viscovery Mt cng c mi nht hin nay l Viscovery, mt sn phm ca Eudaptics Software GmbH, l cng c c giao din thn thin, linh hot v l cng c mnh cho vic to SOM. Viscovery cung cp mt s c im quan trng cn thit

-72-

trong cc ng dng ti chnh, kinh t v marketing m cc cng c khng c bn quyn khng c c. Di y l mt s so snh gia cc cng c SOM vi nhau [8] Cc ch tiu
H iu hnh

Viscovery
Windows 95 Windows NT 4.0

SOM_PAK
UNIX Ms DOS

SOM Toolbox
MatLab Version 5.0 tr ln

NeNet
Windows

Tin x l c im SOM

C 4 chn la

khng c

Thut ton

Thut ton chun

Thut ton chun Khng gii hn. Ch nht, lc gic.

Thut ton chun Thut ton chun Khng gii hn. Khng gii hn. Ch nht, lc gic. Tuyn tnh, ngu nhin Bt k giai on no T ng, bng tay Ch nht, lc gic. Tuyn tnh, ngu nhin Bt k giai on no T ng, bng tay

Kch thc ma
trn

Khng gii hn. Lc gic.

Khi to ma
trn

Hun luyn

Mt phng chnh

Tuyn tnh, ngu nhin Bt k giai on no T ng, bng tay

Gn nhn

nh ngha trc

X l thnh
phn thiu

T ng, bng tay, ko th

Tc

C th x l, Nhanh

C th x l, Nhanh Khng

C th x l, Va phi Khng

C th x l, Nhanh Ti a [100x100]

Gii hn u

Khng

-73-

vo

Trc quan ho

U-matrix, component planes, trajectories, Iso-contours

U-matrix, component planes, trajectories

U-matrix, component planes, trajectories, hit historgrams

U-matrix, component planes, trajectories, hit historgrams

X l sau Giao din

C Thn thin. Giao din OLE: MS Excel, Text file, SQL & DB2

C Cu lnh C. Text file

Khng GUI (Matlab) Text file

Khng GUI (Windows 95) Giao din OLE: Text file

Giao din OLE: Giao din OLE:

3.5 Kt lun chng 3 p dng phng php SOM vo bi ton c th trong Ngn hng, bi ton phn tch thng tin khch hng l cc Doanh nghip c nhu cu vay vn. Ni dung chnh trong chng ny l: -

Tm hiu v quy trnh tc nghip ti phng Tn dng ca Ngn hng gii quyt bi ton. Tm hiu b cng c SOM ToolBox, t xy dng chng trnh gii quyt bi ton. Mt s kt qu thu c khi chy chng trnh. nh gi, so snh b cng c SOM Toolbox vi cc cng c khc trn th trng.

-74-

KT LUN
Mng nron l mt phng php rt thch hp trong khai ph d liu vi m hnh hc my, c bit l hc khng gim st. Vi trn 5000 ng dng trn nhiu lnh vc, thut ton hc mng nron theo SOM rt hu dng trong cc bi ton ti chnh kinh t. Nhiu cng trnh nghin cu khng nh thut ton SOM l ph hp vi cc ng dng c khi lng d liu ln nh d liu trong Ngn hng. 1. Lun vn thc hin c kt qu sau: Trnh by mt cch tng qut v m hnh mng nron v ng dng mng nron trong khai ph d liu. Trnh by mt cch h thng cc gii php hc mng nron khng gim st v c gim st. Nghin cu, phn tch vic s dng thut ton SOM gii quyt bi ton phn cm theo m hnh mng nron. Nghin cu cu trc hot ng ca b cng c SOM Toolbox v phng php s dng cng c gii quyt bi ton phn cm d liu. Xy dng bi ton phn tch thng tin khch hng ti Ngn hng v s dng cng c SOM Toolbox gii quyt bi ton c xut. Cc kt qu th nghim l ph hp vi cc phn tch ca cc nh chuyn mn trong lnh vc Ngn hng. 2. Trong qu trnh nghin cu hon thnh lun vn, thng qua vic tng hp v phn tch mt hot ng ct yu ca Ngn hng l phn tch thng tin khch hng vay vn, ti nhn thy vic pht trin ni dung lun vn l rt cn thit s dng mng nron trong khai ph d liu Ngn hng. m rng kt qu ni dung ca lun vn ny, hng nghin cu v pht trin tip theo l tm hiu cc phng php sinh lut t mng nron (phn ny c cp trong chng 1) v ng dng h tr quyt nh trong u t ti chnh.

-75-

TI LIU THAM KHO


TI LIU TING VIT

[1]. Nguyn nh Thc (2000), Tr tu nhn to Mng nron phng php & ng dng, Nh xut bn Gio Dc. [2]. Trn c Minh (2002), Mng nron truyn thng v thut ton lan truyn ngc, Lun vn Thc s cao hc, Khoa Cng ngh, Trng i hc Quc gia H Ni.
TI LIU TING ANH

[3]. Bart De Ketelaere, Demitrios Moshou, Peter Coucke, Josse De Baerdemaeker (1997), A herachical Self-Organizing Map for classification problems. [4]. Boris Kovalerchuk & Evgenii Vityaev (2001), Data mining in finance advances in Relational and Hybrid Methods, Kluwer Academic Publishers. [5] David Sommer & Martin Golz (2001), Clustering of EEG-Segments Using Heirarchical Agglomerative Methods and Self-Organizing Maps, University of Applied Sciences Germany, Department of Computer Science. [6].Ed Guido Deboeck & Teuvo Khohonen (1998), Visual Intelligence in Finance using Self-organizing Maps, Chapter 7: Self-organizing Maps for Initial Data Analysis: let Financial Data Speak for Themselves, Speinger Verlag. [7]. Guido Deboeck, Ph.D (1999), Self-Organizing Maps facilitate knowleadge discovery in finance. [8]. Guido Deboeck, Ph.D (2000), Public domain versus commercial tools for creating Self-Organizing Maps. [9]. J. Han and M. Kamber (2001), Data Mining - Concepts and Techniques, Chapter 8: Cluster Analysis. Morgan Kaufmann. [10]. Juha Vesanto (1997), Data Mining techniques based on the Self-Organizing Map, Thesis for the degree of Master in Engineering, Helsinki University of Technology.

-76-

[11]. Juha Vesanto (2000), Using SOM in Data Mining, Licentiates thesis, Helsinki University of Technology. [12]. Mark W.Craven & Jude W.Shavlik (2000), Using Neural Networks for Data Mining, Submitted to the Future Generation Computer Systems specical issues on Data Mining.
[13] Merja Oja, Samuel Kaski, and Teuvo Kohonen (2003), Bibliography of Self-

Organizing Map (SOM) Papers: 1998-2001 Addendum, Neural Computing Surveys, 3: 1-156. [14]. Mark W.Craven (1996), Extracting comprehensible models from trained neural networks, Chapter 7: The Boosting Based Perceptron learning algorithm, Doctor of philosophy (Computer Sciences). [15].Tom Gemano (1999), Self Organizing Maps. [16]. Usama M.Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth & Ramasamy Uthrusamy (1996), Advanes in Knowledge Discovery and Data mining,AAAI Press/The MIT Press.

You might also like