You are on page 1of 164

Pattern Recognition

ThS.Trn thanh hng 0988565130 hungttbk@gmail.com

Chng I. Tng quan v nhn dng


nh ngha: Nhn dng i tng l mt qu trnh phn hoch i tng thnh cc i tng con, chng c gn vo tng lp nhn c i snh vi mu v i snh theo cc quy lut bit trc no .

Nhng gii hn ca bi ton nhn dng


Qu trnh Nhn dng:
Dng 1: Nhn dng cc i tng c th: k t, hnh nh, ... Dng 2: Nhn dng cc i tng tru tng: vn , khi nim, ... Ch kho st bi ton thuc dng 1; Nhn dng i tng a v nghin cu cc c im:khng gian v thi gian: Cc i tng theo c trng khng gian: k t, du vn tay, bn , i tng vt l; Cc i tng theo c trng thi gian: ting ni, tn hiu in t, chui thi gian.

Gii hn 1:

Nhng gii hn ca bi ton nhn dng


Nhng hng nghin cu ca lnh vc Nhn dng:

Gii hn 2: Ch quan tm ti cch tip cn th hai gii quyt bi ton nhn dng Bi ton nhn dng: a ( phn chia ) nhng d liu vo cc phn nhm xc nh da trn nhng tnh cht hoc du hiu c trng cho d liu.

Nghin cu kh nng nhn dng ca cc h thng sinh hc (ngi, ng vt): lin quan ti cc lnh vc nh tm l, sinh hc, sinh l hc, ; Pht trin cc phng php v l thuyt xy dng nhng thit b, chng trnh gii quyt bi ton nhn dng trong nhng lnh vc c th.

Nhng bi ton Nhn dng c bn


V d bi ton nhn dng:
Phn loi c trong dy chuyn thnh hai loi Cc bc thc hin: Tin x l nh c nhn c t camera Phn vng i tng(c) Trch chn cc c trng: sng, chiu di, chiu rng Phn loi i tng ( c )

Nhng bi ton Nhn dng c bn


M hnh ca qu trnh nhn dng

Qu trnh xy dng h thng nhn dng

Nhng bi ton Nhn dng c bn


Nhng c s:
Khng gian o c; Cc c trng; B phn loi; Cc bin gii quyt nh; Cc mu hun luyn; Xc sut li

Gi thit: trong cc d liu c quan st tn ti tnh quy lut t vn : da trn cc quan st b nhiu, xc nh tnh quy lut trong d liu. Cc hng tip cn: thng k v cu trc.

Nhng bi ton Nhn dng c bn


Tip cn thng k v cu trc
Tip cn thng k: cc i tng c c trng bng s tng hp thng k.
V d cc k t 2 u c chung mt s thuc tnh thng k.

Tip cn cu trc: cc cu trc c biu din bng cng mt c ch xy dng.


V d cc cu trong mt ngn ng.

Nhng bi ton Nhn dng c bn


Nhng kh khn c bn:
Phn vng i tng; Xc nh cc c trng; Tnh bt bin; Ng cnh; Nhiu.

Minh ha nhng kh khn:

Nhng bi ton Nhn dng c bn


Nhng bi ton c bn: Thu nhn v tin x l d liu vo; Biu din d liu:
Qu trnh m ha; Biu din khng gian vector;

Trch chn c trng:


Xc nh cc c im ca i tng; Gim s chiu biu din ca vector i tng;

Xy dng b phn loi:


Xc nh nhng th tc ti u trong vic xc nh v phn loi i tng M hnh:
Phn loi N lp i tng 1, 2, , N; M hnh b phn loi theo hm quyt nh

Nhng bi ton Nhn dng c bn


H thng nhn dng thch nghi

Nhng bi ton Nhn dng c bn


Bi ton hun luyn ( learning )
t vn :tn ti hay khng mt phng php gii quyt trit bi ton phn loi, nhn dng? Kt lun: kinh nghim 40 nm nghin cu cho thy: khng tn ti mt phng php nh vy. Nguyn nhn: bi ton t ra thuc nhm ill-posed Cch tip cn: learning: gii quyt vn da trn qu trnh hun luyn qua mu Hun luyn:phng php s dng nhng thng tin kinh nghim t mi trng kt hp vi nhng tri thc c sn xy dng cc b phn loi v qu trnh hiu chnh tng bc hiu qu phn loi. Nhng thng tin kinh nghim: cc mu hun luyn; Nhng tri thc: cc bt bin, hm tng quan; Qu trnh hiu chnh tng bc hiu qu phn loi.

Nhng bi ton Nhn dng c bn


Cc phng php hun luyn:
Hc c gim st: c thng tin phn loi km theo mu; Hc khng gim st: hon ton khng c thng tin t mi trng; Hc tng cng: c thng tin phn hi tng phn t mi trng.

Nhng bi ton Nhn dng c bn

Nhng nguyn l Nhn dng


Cc nguyn l nhn dng
c trng ca cc phn lp:
c trng ca cc phn lp:
Lit k nhng phn t tham gia vo phn lp; Nhng thuc tnh chung nht ca cc phn t thuc phn lp. Nh vy h nhn dng c xy dng trn nguyn l tng qut ha c im.

Khi kho st cc phn lp, nu d liu c xu hng to thnh cc phn cm ( cluster ) trong khng gian i tng. Khi nhn dng s dng nguyn l phn cm.

Nhng nguyn l Nhn dng


Nguyn l lit k phn t
Cc phn lp c c trng bng lit k cc i tng tham gia vo phn lp; Qu trnh t ng nhn dng a v bi ton i snh mu; Tp hp i tng ca mt phn lp c lu tr trong b nh; Khi xut hin i tng mi, h thng nhn dng ln lt i snh i tng vi cc mu lu tr. Qu trnh quyt nh: i tng s thuc phn lp cha mu ging nht vi i tng. Phng php n gin. Vn : la chn tp hp mu ti u.

Nhng nguyn l Nhn dng


Nguyn l xc nh cc tnh cht chung
Cc phn lp c c trng bng nhng c im chung ca cc phn t tham gia; Qu trnh nhn dng bao gm vic tm nhng c im chung trn i tng; Gi thit quan trng: Nhng i tng trong mt phn lp phi c nhng c im chung. Do cn gi thit v tnh tng ng gia nhng i tng trong nhm. Cc tnh cht chung c lu tr trong b nh. Khi c mt i tng xut hin, h thng cn phi m ha i tng v so snh vi nhng c im c lu tr; u im: so vi phng php lit k, c u im v mt lu tr, nhn dng v bn vng hn vi cc bin dng

Nhng nguyn l Nhn dng


Nguyn l phn cm
Cc i tng ca phn lp c biu din bng cc vector c thnh phn thc; Cc phn lp c coi l phn cm: nhng phn t trong mt lp th co cm, nhng phn t khng thuc lp th gin cch; H thng nhn dng da trn nguyn l ny c xc nh bng nhng phn b tng i gia cc lp. V d: phn loi bng khong cch nh nht Nu cc nhm phn tch: khng c vn ln ny sinh; Nu cc nhm khng phn tch: cn c nhng tiu ch phn loi phc tp hn; S che ph gia cc nhm l kt qu ca vic thiu thng tin do nhiu.

Nhng phng php Nhn dng


Phng php Heuristic (mo)
C s ca phng php heuristic l s dng kinh nghim ca ngi gim st; S dng nguyn l lit k phn t v tm cc c im chung ca i tng; Xy dng thut ton ph thuc vo tng bi ton c th; Ph thuc nhiu vo kinh nghim. Cc lut phn loi da trn nhng biu din ton hc ca cc c im i tng; Da vo nguyn l tm cc c im chung v nguyn l phn nhm; c im khc bit vi phng php heuristic: li gii c xc nh bng cc quy tc lin quan cht ch vi thuc tnh ca bi ton;

Phng php ton hc

Nhng phng php Nhn dng


C hai dng phng php ton hc: Phng php thng k: s dng cng c ca thng k ton hc nh lut Bayes; Phng php tt nh: khng s dng cc thuc tnh thng k ca lp i tng c nghin cu nh phng php phn tch tuyn tnh, perceptrron, i tng c m t qua cc thnh phn c s v mi lin quan gia nhng thnh phn c s ; Biu din i tng bng cc xu v quy tc sinh xu Qu trnh nhn dng tr thnh bi ton phn tch c php v da vo nguyn l tng qut ha tnh cht; Vn : trch chn c nhng thnh phn c bn v xc nh c quan h gia cc thnh phn .

Phng php cu trc

Chng II. Phn nhm da trn l thuyt quyt nh Bayes


ng dng Bayes Theorem trong phn lp (Using Bayes Theorem in Classification)
1. Gii thiu Bayes Theorem

Trong lnh vc Data Mining, Bayes Theorem (hay Bayes Rule) l k thut phn lp da vo vic tnh xc sut c iu kin. Bayes Rule c ng dng rt rng ri bi tnh d hiu v d trin khai.
Bayes' Rule (CT1): Trong :

D : Data h : Hypothesis (gi thuyt) P(h) : Xc sut gi thuyt h (tri thc c c v gi thuyt h trc khi c d liu D) v gi l prior probability ca gi thuyt h. P(D| h): Xc sut c iu kin D khi bit gi thuyt h (gi l likelihood probability). P(D): xc sut ca d liu quan st D khng quan tm n bt k gi thuyt h no. (gi l prior probability ca d liu D) T s : Ch s lin quan (irrelevance index) dng o lng s lin quan gia 2 bin A v B. Nu irrelevance index =1, c ngha A v B khng lin quan nhau. P(h|D) :Xc sut c iu kin h khi bit D (gi l posterior probability ca gi thuyt h)

1. Gii thiu Bayes Theorem


Trong rt nhiu ng dng, cc gi thuyt hi c th loi tr nhau v v d liu quan st D l tp con ca tp gi thuyt cho nn chng ta c th phn r P(D) nh sau (CT2): V nn (CT1) c th vit li nh sau (CT3) Thay P(D) trong (CT2) vo (CT1) ta c (CT4)

1. Gii thiu Bayes Theorem


(CT4) gi l Bayess Theorem

V d sau y m t cch tnh Bayess Theorem


Gi s ta c d liu quan st v 250 i tng tm hiu mi quan h gia 2 bin thu nhp (income: Low(D1), Medium(D2), High(D3)) v loi xe hi (Car: Second hand (h1), New (h2)) m h mua.

V d sau y m t cch tnh Bayess Theorem


By gi gi s rng ta ch bit phn trm theo dng (Percentage by Row) v phn trm theo cc bin (Marginal Percentage hay Percentage by Total) nh sau.

Cu hi t ra l c th tnh phn trm theo ct (percentage by column) ch da vo thng tin t 2 bng trn hay khng?.

V d sau y m t cch tnh Bayess Theorem

V d sau y m t cch tnh Bayess Theorem


Bng phn trm theo ct (Percentage by Column) c biu din nh sau:

S dng Bayes Rule chng ta c th d dng tnh cc phn trm theo ct. Chn hn

V d sau y m t cch tnh Bayess Theorem


Tng t nh trn, ta tnh c tt c cc gi tr trong bng phn trm theo ct nh sau:

2. ng dng Bayes Theorem trong phn lp d liu (Nave Bayes Classifier)


Cc v d sau y minh ha vic s dng Bayes Theorem trong vic phn lp d liu. B phn lp d liu da trn Bayes theorem cn gi l Nave Bayes Classifier. V d 1: C training data v thi tit nh sau

2. ng dng Bayes Theorem trong phn lp d liu (Nave Bayes Classifier)


S dng Nave Bayes Classifier xc nh kh nng n chi th thao (Play = yes hay no) vi thi tit ca ngy quan st c nh sau:

T Training data ta c d liu nh sau:

2. ng dng Bayes Theorem trong phn lp d liu (Nave Bayes Classifier)


V thuc tnh phn lp Play ch c 2 gi tr l yes (ngha l c n chi th thao) v no(khng n chi th thao) nn ta phi tnh Pr(yes|E) v Pr(no|E) nh sau. Trong E l d liu cn phn lp (d on)

2. ng dng Bayes Theorem trong phn lp d liu (Nave Bayes Classifier)

2. ng dng Bayes Theorem trong phn lp d liu (Nave Bayes Classifier)

V P(no) > P(yes) nn kt qu d on Play =no

2. ng dng Bayes Theorem trong phn lp d liu (Nave Bayes Classifier)


V d 2: C Training Data v Unseen data nh sau

2. ng dng Bayes Theorem trong phn lp d liu (Nave Bayes Classifier)


S dng Nave Bayes Classifier phn lp cho Unseen data (X) Class: C1:buys_computer =yes, C2:buys_computer =no Tnh P(X|Ci) cho mi class
X=(age<=30, income =medium, student=yes,credit_rating=fair)
P(age=<=30 | buys_computer=yes) = 2/9=0.222 P(age=<=30 | buys_computer=no) = 3/5 =0.6 P(income=medium | buys_computer=yes)= 4/9 =0.444 P(income=medium | buys_computer=no) = 2/5 = 0.4 P(student=yes | buys_computer=yes)= 6/9 =0.667 P(student=yes | buys_computer=no)= 1/5=0.2 P(credit_rating=fair | buys_computer=yes)=6/9=0.667 P(credit_rating=fair | buys_computer=no)=2/5=0.4 P(X|buys_computer=yes)= 0.222 x 0.444 x 0.667 x 0.667 =0.044 P(X|buys_computer=no)= 0.6 x 0.4 x 0.2 x 0.4 =0.019

Tnh P(X|Ci) :

2. ng dng Bayes Theorem trong phn lp d liu (Nave Bayes Classifier)

P(X|Ci)*P(Ci ):
P(X|buys_computer=yes) * P(buys_computer=yes)=0.044*9/14= 0.028 P(X|buys_computer=no) * P(buys_computer=no)=0.019*5/14 = 0.007 Do ta c X thuc lp buys_computer=yes

Exercise 1
Bnh quy t hp no? Gi s c hai hp ng y bnh quy, Hp th nht c 10 chic bnh quy s-c-la v 30 chic bnh quy b. Hp th hai ng mi loi bnh 20 chic. B Khoai chn ngu nhin mt hp, ri nht i mt chic bnh. Ta c th gi thit rng b Khoai cn rt nh nn khng phn bit hp ny hp kia, v b thch tt c cc loi bnh ko nn bnh loi no vi b cng vy. V chic bnh m b Khoai chn t ra l mt chic bnh quy b. Vy kh nng Khoai nht chic bnh t trong hp th nht l bao nhiu?

Exercise
Mt cch trc quan, c v r rng l cu tr li phi ln hn 1/2, do trong hp 1 c nhiu bnh quy b hn. Cu tr li chnh xc c tnh theo nh l Bayes. Gi s H1 tng ng vi hp 1, v H2 tng ng vi hp 2. Ta bit rng i vi b Khoai, hai hp l nh nhau, do ,P(H1) = P(H2), v tng ca chng phi bng 1, do c hai u bng 0,5. D liu D l quan st v chic bnh quy b. T ni dung ca hai hp bnh, ta bit rng P(D|H1)= 30/40 = 0,75 v P(D| H2) = 20/40 = 0,5. Khi , cng thc Bayes cho ra kt qu:

Trc khi quan st thy chic bnh m b Khoai nht, xc sut cho vic Khoai chn hp 1 l xc sut tin nghim, P(H1), c gi tr 0,5. Sau khi nhn thy chic bnh, ta chnh li xc sut thnh P(H1|D), c gi tr 0,6.

Xc nh li dng bayes
V d, mt xt nghim y hc cho mt bnh c th tr v mt kt qu dng tnh vi hm rng bnh nhn c mc cn bnh ngay c nu bnh nhn khng h mc cn bnh . Ta dng nh l Bayes tnh xc sut m mt kt qu dng tnh thc ra li l mt dng tnh sai. Kt qu l nu mt cn bnh him gp th a s cc kt qu dng tnh c th l dng tnh sai, ngay c nu xt nghim c chnh xc cao. Gi s rng mt xt nghim cho mt cn bnh cho ra cc kt qu sau:
Nu ngi c xt nghim qu thc mc bnh , xt nghim tr v kt qu dng tnh trong 99% cc trng hp, hoc ni cch khc l vi xc sut 0,99 Nu ngi c xt nghim thc ra khng c bnh, xt nghim tr v kt qu m tnh trong 95% cc trng hp, hoc ni cch khc l vi xc sut 0,95 Gi s rng ch c 0,1% dn s mc cn bnh ny, ngha l nu chn ngu nhin mt ngi th vic ngi mc bnh c xc sut tin nghim l 0,001.

Exercise2
Ta c th dng nh l Bayes tnh xc sut cho vic mt kt qu xt nghim dng tnh l mt dng tnh sai. Gi s A l tnh hung ngi bnh mc cn bnh , v B biu din bng chng - mt kt qu xt nghim dng tnh. Khi , xc sut ngi bnh thc s mc bnh khi bit rng kt qu xt nghim l dng tnh l

v do , xc sut cho vic kt qu xt nghim dng tnh l mt dng tnh sai l khong (1 0,019) = 0,981.

The Nearest Neighbors Rule (KNN)

S dng thut ton K-Nearest Neighbors d on Phn lp

Xt trng hp phn lp: bin ph thuc Y l bin phn loi (categorical variable) Trng hp d on: bin ph thuc Y c gi tr nh lng (Quantitative value) Trc tin hiu vn ta xt ti trng hp dng KNN d on.

Trng hp d on:
Di y trnh by tng bc cch s dng KNN trong vic d on vi bin ph thuc nh lng 1. Xc nh tham s K (s lng ging gn nht) 2. Tnh khong cch (Distance) gia Query point v tt c training samples 3. Sp xp khong cch v xc nh K lng ging gn nht vi Query point 4. Ly gi tr ca bin ph thuc Y tng ng ca K lng ging gn nht 5. S dng gi tr trung bnh (average) ca bin ph thuc Y ca K lng ging gn nht l gi tr d on ca Query point.

1. 2.

3. 4.

Example (KNN for prediction) C 5 training samples (X,Y) nh sau Vn l s dng KNN d on gi tr ca bin ph thuc Y ca query point X=6.5 Xc nh s lng ging gn nht K (Gi s rng K=2) Tnh khong cch gia Query Point vi tt c training samples Trong v d ny, d liu v query point l 1 chiu (X) nn khong cch c tnh n gin l ly tr tuyt i ca hiu gia X v cc gi tr X trong training samples Chn hn, vi X=5.1, khong cch c tnh l | 6.5 5.1 | = 1.4, vi X = 1.2 khong cch l | 6.5 1.2 | = 5.3 ,vv. Sp xp khong cch xc nh K lng ging gn nht (trong v d ny K=2) Ly gi tr ca bin ph thuc Y ca K (=2) lng ging gn nht
Y=27 v Y=8

5. Gi tr d on l trung bnh ca cc gi tr Y ca K (=2) lng ging gn nht. Trong v d ny, gi tr d on l (27+8)/2 = 17.5

Bi tp v d 2
Chn k=5, hy dng KNN a ra kt qu

Bi tp v d 2

Bi tp
Xt: vi k=5, hy xc nh kt qu play???

Gii thiu thut ton K-Nearest Neighbors trong phn lp


K-NN l phng php phn lp cc i tng da vo khong cch gn nht gia i tng cn xp lp (Query point) Mt i tng c phn lp da vo K lng ging ca n. K l s nguyn dng c xc nh trc khi thc hin thut ton. Ngi ta thng dng khong cch Euclidean tnh khong cch gia cc i tng. Thut ton K-NN c m t nh sau: 1. Xc nh gi tr tham s K (s lng ging gn nht) 2. Tnh khong cch gia i tng cn phn lp (Query Point) vi tt c cc i tng trong training data (thng s dng khong cc Euclidean) 3. Sp xp khong cch theo th t tng dn v xc nh K lng ging gn nht vi Query Point 4. Ly tt c cc lp ca K lng ging gn nht xc nh 5. Da vo phn ln lp ca lng ging gn nht xc nh lp cho Query Point

Gii thiu thut ton K-Nearest Neighbors trong phn lp


hiu K-NN c dng phn lp th no ta xem minh ha di y. Trong hnh di y, training Data c m t bi du (+) v du (-), i tng cn c xc nh lp cho n (Query point) l hnh trn . Nhim v ca chng ta l c lng (hay d on) lp ca Query point da vo vic la chn s lng ging gn nht vi n. Ni cch khc chng ta mun bit liu Query Point s c phn vo lp (+) hay lp (-) Ta thy rng: 1-Nearest neighbor : Kt qu l + (Query Point c xp vo lp du +) 2-Nearest neighbors : khng xc nh lp cho Query Point v s lng ging gn nht vi n l 2 trong 1 l lp + v 1 l lp (khng c lp no c s i tng nhiu hn lp kia) 5-Nearest neighbors : Kt qu l - (Query Point c xp vo lp du v trong 5 lng ging gn nht vi n th c 3 i tng thuc lp nhiu hn lp + ch c 2 i tng).

THUT TON K-MEAN V NG DNG

K-Mean v ng dung

50

NI DUNG CHNH
I. Phn cm II. Thut ton K-Mean 1. Khi qut v thut ton 2. Cc bc ca thut ton 3. V d minh ha Demo thut ton 4. nh gi thut ton 5. Tng qut ha v Cc bin th III. ng dng ca thut ton K-Mean
K-Mean v ng dung

51

I. PHN CM
1. Phn cm l g?

Qu trnh phn chia 1 tp d liu ban u thnh cc cm d liu tha mn: Cc i tng trong 1 cm tng t nhau. Cc i tng khc cm th khng tng t nhau. Gii quyt vn tm kim, pht hin cc cm, cc mu d liu trong 1 tp hp ban u cc d liu khng c nhn.

52

K-Mean v ng dung

I. PHN CM
K-Mean v ng dung

Nu X : 1 tp cc im d liu Ci : cm th i X = C1 Ci

Cj =

Ck

Cngoilai
53

I. PHN CM
2. Mt s o trong phn cm

Minkowski

K-Mean v ng dung

Vi xi,yi i = vector l 1 2

(|| yi xi || )

1 p

Euclidean: p = 2 o tng t (gn nhau): cosin hai vect cos =

v.w || v || . || w ||
54

I. PHN CM
3. Mc ch ca phn cm
K-Mean v ng dung

Xc nh c bn cht ca vic nhm cc i tng trong 1 tp d liu khng c nhn. Phn cm khng da trn 1 tiu chun chung no, m da vo tiu ch m ngi dng cung cp trong tng trng hp.

55

I. PHN CM
5. Mt s phng php phn cm in hnh

Phn cm phn hoch Phn cm phn cp Phn cm da trn mt Phn cm da trn li Phn cm da trn m hnh Phn cm c rng buc

56

K-Mean v ng dung

II.PHN CM PHN HOCH


Phn 1 tp d liu c n phn t cho trc thnh k tp con d liu (k n), mi tp con biu din 1 cm. Cc cm hnh thnh trn c s lm ti u gi tr hm o tng t sao cho: Cc i tng trong 1 cm l tng t. Cc i tng trong cc cm khc nhau l khng tng t nhau. c im: Mi i tng ch thuc v 1 cm. Mi cm c ti thiu 1 i tng. Mt s thut ton in hnh : K-mean, PAM, CLARA,
K-Mean v ng dung

57

II.2. Thut ton K-Means


Pht biu bi ton: Input Tp cc i tng X = {xi| i = 1, 2, , N}, S cm: K Output Cc cm Ci ( i = 1 K) tch ri v hm tiu chun E t gi tr ti thiu.
58
K-Mean v ng dung

xi R

II.1. KHI QUT V THUT TON


Thut ton hot ng trn 1 tp vect d chiu, tp d liu X gm N phn t:
X = {xi | i = 1, 2, , N}
K-Mean v ng dung

K-Mean lp li nhiu ln qu trnh: Gn d liu. Cp nht li v tr trng tm. Qu trnh lp dng li khi trng tm hi t v mi i tng l 1 b phn ca 1 cm.

59

II.1. KHI QUT V THUT TON


Hm o tng t s dng khong cch Euclidean E=

i =1 xi C j

(|| xi c j || )
2

K-Mean v ng dung

trong cj l trng tm ca cm Cj Hm trn khng m, gim khi c 1 s thay i trong 1 trong 2 bc: gn d liu v nh li v tr tm.

60

II.2. CC BC CA THUT TON


Bc 1 - Khi to Chn K trng tm {ci} (i = 1K). Bc 2 - Tnh ton khong cch

( t) = { i

x j :|| x j c
(t + 1) i

(t ) i

|| || x j i*c

(t for )all

|| = 1,*, k} i

Bc 3 - Cp nht li trng tm

1 c = (t) Si Bc 4 iu kin |dng |

xj i S

x
(t)

Lp li cc bc 2 v 3 cho ti khi khng c s thay i trng tm ca cm.

61

II.2. CC BC CA THUT TON


Bt u
S cm K
K-Mean v ng dung

Trng tm
Khong cch cc i tng n cc trng tm Nhm cc i tng vo cc cm

Khng c i tng chuyn nhm

Kt thc

62

II.3 V D MINH HA
i tng A B C D 4.5
4 3.5 3 2.5 2 1.5 1 0.5 0 0 2 4 6

Thuc tnh 1 (X) 1 2 4 5

Thuc tnh 2 (Y) 1 1 3 4


K-Mean v ng dung

63

II.3 V D MINH HA
Bc 1: Khi to Chn 2 trng tm ban u: c1(1,1) A v c2(2,1) B, thuc 2 cm 1 v 2
4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0 2 4 6

64

K-Mean v ng dung

II.3 V D MINH HA
Bc 2: Tnh ton khong cch d(C, c1) = (4 1) 2 + (3 1) 2 = 13 d(C, c2) = (4 2)2 + (3 1) 2 =8 d(C, c1) > d(C, c2) C thuc cm 2

K-Mean v ng dung

d(D, c1) = (5 1) 2 + (4 1) 2 = 25 d(D, c2) = (5 2) 2 + (4 1) 2 = 18 d(D,c1) > d(D, c2)


65

D thuc cm 2

II.3 V D MINH HA
Bc 3: Cp nht li v tr trng tm Trng tm cm 1 c1 A (1, 1)
2 + 4 + 5 1+ 3 + 4 Trng tm cm 2 c2 (x,y) = ( 3 , 3 )
4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0 2 4 6

66

K-Mean v ng dung

II.3 V D MINH HA
Bc 4-1: Lp li bc 2 Tnh ton khong cch d(A, c1 ) = 0 < d(A, c2 ) = 9.89 A thuc cm 1 d(B, c1 ) = 1 < d(B, c2 ) = 5.56 B thuc cm 1 d(C, c1 ) = 13 > d(C, c2 ) = 0.22 C thuc cm 2 d(D, c1 ) = 25 > d(D, c2 ) = 3.56 D thuc cm 2
67
K-Mean v ng dung

II.3 V D MINH HA
Bc 4-2: Lp li bc 3-Cp nht trng tm c1 = (3/2, 1) v c2 = (9/2, 7/2)
K-Mean v ng dung

68

II.3 V D MINH HA
Bc 4-3: Lp li bc 2 d(A, c1 ) = 0.25 < d(A, c2 ) = 18.5 A thuc cm 1 d(B, c1 ) = 0.25 < d(B, c2 ) = 12.5 B thuc cm 1 d(C, c1 ) = 10.25 < d(C, c2 ) = 0.5 C thuc cm 2 d(D, c1 ) = 21.25 > d(D, c2 ) = 0.5 D thuc cm 2
69
K-Mean v ng dung

II.3 V D MINH HA
K-Mean v ng dung

70

Bi tp
C 1 mu khong sn c d on c vng, bc v ng, ta chia lm 5 im ta xt phn vng cc kim loi: Point A B C D E X 1 3 4 3 2 Y 2 1 2 3 1

II.4 NH GI THUT TON U IM


1. phc tp: O( K .N .l ) vi l: s ln lp 2. C kh nng m rng, c th d dng sa i vi nhng d liu mi. 3. Bo m hi t sau 1 s bc lp hu hn. 4. Lun c K cm d liu 5. Lun c t nht 1 im d liu trong 1 cm d liu. 6. Cc cm khng phn cp v khng b chng cho d liu ln nhau. 7. Mi thnh vin ca 1 cm l gn vi chnh cm hn bt c 1 cm no khc.
K-Mean v ng dung

72

II.4 NH GI THUT TON NHC IM


1. Khng c kh nng tm ra cc cm khng li hoc cc cm c hnh dng phc tp. 2. Kh khn trong vic xc nh cc trng tm cm ban u - Chn ngu nhin cc trung tm cm lc khi to - hi t ca thut ton ph thuc vo vic khi to cc vector trung tm cm 1. Kh chn ra c s lng cm ti u ngay t u, m phi qua nhiu ln th tm ra c s lng cm ti u. 2. Rt nhy cm vi nhiu v cc phn t ngoi lai trong d liu. 3. Khng phi lc no mi i tng cng ch thuc v 1 cm, ch ph hp vi ng bin gia cc cm r. 73

K-Mean v ng dung

II.5 TNG QUT HA V CC BIN TH


B. Cc bin th 1. Thut ton K-medoid:
K-Mean v ng dung

Tng t thut ton K-mean Mi cm c i din bi mt trong cc i tng ca cm. Chn i tng gn tm cm nht lm i din cho cm . K-medoid khc phc c nhiu, nhng phc tp ln hn.

74

II.5 TNG QUT HA V CC BIN TH


2.

Thut ton Fuzzy c-mean (FCM): Chung chin lc phn cm vi K-mean. Nu K-mean l phn cm d liu cng (1 im d liu ch thuc v 1 cm) th FCM l phn cm d liu m (1 im d liu c th thuc v nhiu hn 1 cm vi 1 xc sut nht nh). Thm yu t quan h gia cc phn t v cc cm d liu thng qua cc trng s trong ma trn biu bin bc ca cc thnh vin vi 1 cm.
K-Mean v ng dung

FCM khc phc c cc cm d liu chng nhau trn cc tp d liu c kch thc ln hn, nhiu chiu v nhiu nhiu, song vn nhy cm vi nhiu v cc phn 75 t ngoi lai.

III. NG DNG CA THUT TON


Phn cm ti liu web. 1. Tm kim v trch rt ti liu 2. Tin x l ti liu: Qu trnh tch t v vecto ha ti liu: tm kim v thay th cc t bi ch s ca t trong t in.Biu din d liu di dng vect. 3. p dng K-Mean Kt qu tr v l cc cm ti liu v cc trng tm tng ng. Phn vng nh
K-Mean v ng dung

76

TI LIU THAM KHO


Ti liu chnh: [WKQ08] Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, Philip S. Yu , Zhi-Hua Zhou, Michael Steinbach, David J. Hand, Dan Steinberg (2008). Top 10 algorithms in data mining, Knowl Inf Syst (2008) 14:137 Pavel Berkhin (). Survey of Clustering Data Mining Techniques http://en.wikipedia.org/wiki/K-means_clustering http://en.wikipedia.org/wiki/Segmentation_(image_processing) Slide KI2 7 Clustering Algorithms - Johan Everts http://vi.wikipedia.org/wiki/Hc_khng_c_gim_st http://people.revoledu.com/kardi/tutorial/kMean/NumericalExample.htm

77

K-Mean v ng dung

Chng 3: Phn loi theo khong cch Trong k thut ny, cc i tng nhn dng l cc i tng nh lng. Mi i tng c biu din bi mt vct nhiu chiu. Trc tin, ta xem xt mt s khi nim nh: phn hoch khng gian, hm phn bit sau s i vo mt s k thut c th.

Phn hoch khng gian


Gi s khng gian i tng X c nh ngha : X = {Xi, i=1, 2,...,m}, Xi l mt vct. Ngi ta ni P l mt phn hoch ca khng gian X thnh cc lp Ci, Ci X nu: Ci Cj = vi i j v Ci = X Ni chung, y l trng hp l tng: tp X tch c hon ton. Trong thc t, thng gp khng gian biu din tch c tng phn. Nh vy phn loi l da vo vic xy dng mt nh x f: X ---> P. Cng c xy dng nh x ny l cc hm phn bit (Descriminant functions).

Hm phn lp hay hm ra quyt nh


phn i tng vo cc lp, ta phi xc nh s lp v ranh gii gia cc lp . Hm phn lp hay hm phn bit l mt cng c rt quan trng. Gi {gi} l lp cc hm phn lp. Lp hm ny c nh ngha nh sau: nu i k, gk(X) > gi(X) th ta quyt nh X lp k. Nh vy phn bit k lp, ta cn k-1 hm phn bit. Hm phn bit g ca mt lp no thng dng l hm tuyn tnh, c ngha l: g(X) = W0 + W1X1 + W2X2+. . . +Wk Xk trong : - Wi l cc trng s gn cho cc thnh phn Xi. W0 l trng s vit cho gn. Trong trng hp g l tuyn tnh, ngi ta ni l vic phn lp l tuyn tnh hay siu phng (hyperplan).

Hm phn lp hay hm ra quyt nh


Cc hm phn bit thng c xy dng da trn khi nim khong cch hay da vo xc sut c iu kin. Khong cch l mt cng c rt tt xc nh xem i tng c "gn nhau" hay khng. Nu khong cch nh hn mt ngng no y ta coi 2 i tng l ging nhau v gp chng vo mt lp. Ngc li , nu khong cch ln hn ngng , c ngha l chng khc nhau v ta tch thnh 2 lp.

Hm phn lp hay hm ra quyt nh


Trong mt s trng hp, ngi ta da vo xc sut c iu kin phn lp cho i tng. L thuyt xc sut c iu kin c Bayes nghin cu kh k v chng ta c th p dng l thuyt ny phn bit i tng. Gi : P(X/Ci) l xc sut c X bit rng c xut hin lp Ci P(Ci /X) l xc sut c iu kin X thuc lp Ci. vi X l i tng nhn dng, Ci l cc lp i tng. Qu trnh hc cho php ta xc nh P(X/Ci) v nh cng thc Bayes v sc xut c iu kin p dng trong iu kin nhiu bin, chng ta s tnh c P(Ci/X) theo cng thc: P(Ci /X) =
P ( X / Ci ) P ( Ci ) P ( X / Ci ) P (Ci ) = n P( X ) P ( C / Xi ) P ( C i ) i =1

Nu P(Ci/X) > P(Ck /X) vi i # k th X Ci. Tu theo cc phng php nhn dng khc nhau, hm phn bit s c cc dng khc nhau.

Mt s thut ton nhn dng tiu biu trong t hc


Thc t c nhiu thut ton nhn dng hc khng c thy. y, chng ta xem xt 3 thut ton hay c s dng:
Thut ton nhn dng da vo khong cch ln nht thut ton K- trung bnh (K mean) thut ton ISODATA

Thut ton da vo khong cch ln nht


a) Nguyn tc Cho mt tp gm m i tng. Xc nh khong cch gia cc i tng v khong cch ln nht ng vi phn t xa nht to nn lp mi. S phn lp c hnh thnh dn dn da vo vic xc nh khong cch gia cc i tng v cc lp. b) Thut ton Bc 1 Chn ht nhn ban u: gi s X1C1 gi l lp g1. Gi Z1 l phn t trung tm ca g1. Tnh tt c cc khong cch Dj1 = D(Xj,Z1) vi j =1, 2,..., m Tm Dk1 = maxjDj1 . Xk l phn t xa nht ca nhm g1. Nh vy Xk l phn t trung tm ca lp mi g2, k hiu Z2. Tnh d1 = D12 = D(Z1,Z2).

Thut ton da vo khong cch ln nht


Bc 2
Tnh cc khong cch Dj1 , Dj2 . Dj1 =D(Xj,Z1),Dj2 =D(Xj,Z2). t Dk(2)= maxjDj Nu Dk(2) < d1 ->kt thc thut ton. Phn lp xong. Nu khng, s to nn nhm th ba. Gi Xk l phn t trung tm ca g3, k hiu Z3. Tnh d3 = (D12 + D13 + D23 )/3

Nguyn tc chn

vi l ngng cho trc v D13 = D(Z1,Z3), D23 = D(Z2,Z3). Qu trnh c lp li nh vy cho n khi phn xong. Kt qu l ta thu c cc lp vi cc i din l Z1, Z2 ,..., Zm.

Thut ton K trung bnh (K-mean)


Gi thit c K lp a) Nguyn tc Khc vi thut ton trn, ta xt K phn t u tin trong khng gian i tng, hay ni mt cch khc ta c nh K lp. Hm nh gi l hm khong cch Euclide: Jk = (1) k Jk l hm ch tiu vi X gk D ( Vic phn vng cho) k ht nhn u tin c lp Ck. X , Zk ) = D 2 ( Xj, Zk tin hnh theo nguyn tc khong cch cc tiu. y, ta dng phng j =1 php o hm tnh cc tiu. vi Zk l bin. Ta d dng c (1) min khi: Xt
J = y l Zgi0tr trung bnh ca lp Ck v iu ny l gii tn ca phng Nc N
k

php.

1 Nc i Zk 0 ==> ( XX Z)k )== 0 ==>Zk ==N1 ZZ j ( i Zk j i =1 Nj=1=1 i =1 j


N

( 2) ( 2)

Thut ton K trung bnh (K-mean)


b)Thut ton Chn Nc phn t (gi thit c Nc lp) ca tp T. Gi cc phn t trung tm ca cc lp l: X1, X2,..., XNc v k hiu l Z1, Z2, ..., ZNc . Thc hin phn lp X Ck nu D(X,Zk) = Min D(X,Zj)(1) , j =1,..., Nc. Vi j=1 l ln lp th nht. Tnh tt c Zk theo cng thc (2). Tip tc nh vy cho n bc q: X Gk(q-1) nu D(X,Zk(q-1) ) = minlD(X,Zl(q-1) ). Nu Zk(q-1) = Zk(q) thut ton kt thc Nu khng ta tip tc thc hin phn lp.

Thut ton ISODATA


ISODATA l vit tt ca t Iteractive Self Organizing Data Analysis. N l thut ton kh mm do, khng cn c nh cc lp trc. Cc bc ca thut ton c m t nh sau: La chn mt phn hoch ban u da trn cc tm bt k. Thc nghim chng minh kt qu nhn dng khng ph thuc vo phn lp ban u. Phn vng bng cch sp cc im vo tm gn nht da vp khong cch Euclide. Tch i lp ban u nu khong cch ln hn ngng t1. Xc nh phn hoch mi trn c s cc tm va xc nh li v tip tc xc nh tm mi. Tnh tt c cc khong cch n tm mi. Nhm cc vng vi tm theo ngng t2. Lp cc thao tc tc trn cho n khi tho tiu chun phn hoch.

Chng 4: Phn loi theo hm hp l

Chng 4. Mng Neuron


M hnh ha thng k truyn thng v mng neuron l cc lnh vc c lin quan mt thit vi nhau Khc bit chnh:
Thng k truyn thng: tp trung vo cc bi ton tuyn tnh Mng neuron: Tp trung vo bi ton phi tuyn

Phn giao nhau gia 2 lnh vc ny l k thut lan truyn ngc


Lan truyn ngc l k thut trng tm ca mng neuron, nhng thc cht n li l mt cong c m hnh ha thng k

M hnh ha thng k
Xt bi ton Bui tic thng nin ca CLB u vo: Cc ngi tham gia bo trc vi CLB trc 1 thi gian u ra: D on s ngi thc s n da trn c s s ngi bo trc Gii php: Xt mi lin h thng k khong 10 nm: s ngi n=0.85*S ngi bo trc Tp mu: D liu ca cc bui tic thng k trong 10 nm Bin c lp: s ngi bo trc Bin ph thuc: S ngi n Phng trnh hi quy tuyn tnh ca thng k: Y=a0+Ii= aixi 1 Bin ph thuc: y I l s cc bin c lp xi a0 ,, ai l cc tham s c xc nh bng phng php hi quy Vi bi ton trn c PT Tuyn tnh l: y=0.85x => Hm nh x c dng ng thng

M hnh ha thng k
Nu bi ton c 2 bin c lp v 1 bin ph thuc => PT hi quy s xc nh 1 mt phng trong khng gian 3 chiu Bi ton n bin c lp => khng gian n+1 chiu => xc nh bin ph thuc Do bin ph thuc xc nh qua bin c lp qua thng k=> c sai s
Cch tnh sai s: Trung bnh bnh phng ca cc lch

E=(1/2n=1 N (yn-tn)2)/N
E: Sai s, N: S mu trong tp mu, yn, tn l gi tr thc v gi tr ch mu th n

M hnh ha thng k
S ngi n d S ngi n d
Hm nh x Hm ch

Nguyn nhn gy sai s:


SS bnh phng

Y=0.85x
S ngi bo trc

Nhiu: D liu thu thp khng chnh xc hoc d liu khng y nh x khng cng dng vi hm ch: nh x khng phi chnh xc nh mong mun => Hm ch c dng khng tuyn tnh (Phi tuyn)

S ngi bo trc

Lan truyn ngc


Mng lan truyn l 1 hm phi tuyn c th xp x gn ng nht mt hm ch c cho qua mt s mu trong tp mu Hnh minh ha mt mng lan truyn: Hot ng: Mi nt trong lp nhp nhn gi tr ca 1 bin c lp v chuyn vo mng, d liu t tt c cc nt trong lp nhp c tch hp ta gi l tng trng ha v chuyn kt qu cho cc nt trong lp n Cc nt trong lp xut cng nh lp nhp, cc tn hiu tng trng ha t cc nt n Mi nt trong lp xut tng ng vi 1 bin ph thuc

Lp vo (Input)

Lp n (Hiden)

Lp ra (Output)

Lan truyn ngc


Mng lan truyn tng qut l mt mng c n (n>2) lp: lp th nht l lp nhp, lp th n l lp xut, v n-2 lp n S nt ca lp nhp v lp xut do bi ton quy nh S nt ca lp n do ngi thit k mng quy nh Mi nt ca lp th i lin kt vi mi nt lp th i+1, cc nt trong cng lp khng lin kt vi nhau Mi cung ca mng c gn mt trng s w R Mng lan truyn ch c th 1 trong 2 trng thi: trng thi nh x v trng thi hc

Lan truyn ngc


Trng thi nh x:
Thng tin lan truyn t lp nhp n lp xut v mng thc hin nh x tnh gi tr ca cc bin ph thuc da vo cc gi tr bin c lp c cho: Y=NN(X). Trong trng thi hc, thng tin lan truyn theo 2 chiu nhiu ln hc cc trng s Mng x l mi ln 1 mu tnh Y=NN(X)

Lan truyn ngc


Trc tin, gi tr ca cc bin c lp c chuyn cho lp nhp ca mng. Cc nt nhp khng tnh ton g c. Mi nt nhp chuyn gi tr ca n cho tt c cc nt n Mi nt n tnh tng trng ha ca tt c cc d liu nhp bng cch cng dn tt c cc tch gia gi tr nt n vi trng s ca cung lin kt gia nt nhp v nt n. Tip theo, mt hm truyn c p dng trn tng trng ha ny cng vi mt ngng ca nt n cho ra gi tr thc ca nt n Hm truyn ch n gin nn gi tr vo 1 min gi hn no , nh minh ha trong hnh sau:

Lan truyn ngc


Sau khi nn tng trng ha ca n, n lt mnh, mi nt n s gi kt qu n tt c cc nt xut Mi nt xut thc hin cc thao tc tng t nh thc hin trong nt n cho gi tr kt xut ca nt xut. Gi tr ca cc nt xut chnh l gi tr thc, tc l gi tr cc bin ph thuc cn xc nh

th hm truyn

Lan truyn ngc


Bn cht nh x do mng thc hin ty thuc vo gi tr cc trng s trong mng. Lan truyn ngc l 1 phng php cho php xc nh tp trng s tt nht ca mng gii mt bi ton c cho. Vic p dng phng php lan truyn ngc l 1 qu trnh lp i lp li nhiu ln 2 tin trnh chnh: nh x v lan truyn ngc sai s Hai tin trnh ny c p dng trn 1 tp mu xc nh Ta gi tin trnh ny l luyn mng hay gi l hc

Lan truyn ngc


Qu trnh luyn mng c bt u vi cc gi tr trng s ty c th l cc s ngu nhin v tin hnh lp i lp li, mi ln lp c gi l 1 th h. Trong mi th h, mng hiu chnh cc trng s sao cho sai s gim dn (sai s l lch gia cc kt xut thc v cc kt xut ch). Tin trnh iu chnh nhiu ln gip cho trng s dn dn t c tp hp cc gi tr ti u Thng thng mng cn thc hin nhiu th h trc khi vic luyn mng hon tt

Lan truyn ngc


cp nht trng s trong mi th h, mng phi x l tt c mu trong tp mu. i vi tng mu, mng phi thc hin php ton sau y:
Trc tin, mng thc hin qu trnh lan truyn tin, ngha l mng nh x cc bin nhp ca mu hin hnh thnh cc gi tr xut, nh trnh by trn, s dng cc gi tr ca cc trng hin hnh. cc th h u, cc kt xut thng cha chnh xc v cc trng ban u cng cha chun. Tip theo, sai s c tnh da trn gi tr ca kt xutv gi tr ch. Trn c s sai s tnh ton c, mng s cp nht cc trng s theo nguyn tc lan truyn ngc sai s - gi l qu trnh lan truyn ngc

Lan truyn ngc


K thut c bn trong lan truyn ngc l cp nht trng s theo hng gim gradient mt hnh thc leo i vi thng tin hng dn l o hm bc nht Gim gradient cng l k thut ph bin trong thng k hc, v lan truyn ngc c th c xem nh mt phng php m hnh ha thng k nh x c thc hin trong giai on lan truyn tin. Trong gia on ny, mng tnh gi tr cc bin ph thuc l cc nt xut ca mng da trn gi tr cc bin c lp l cc nt nhp ca mng Cc trng ca mng l cc h s ca m hnh

Lan truyn ngc


Phng php gim gradient c dng cp nht nhng h s ny sao cho gim thiu sai s ca m hnh Sai s c o bng phng php sai s trung bnh bnh phng Thut ton c th c biu din nh sau: DO FOR n=1 TO examples //Duyt qua ton b mu GOSUB forward //Lan truyn tin GOSUB back //Tnh sai s v lan truyn ngc sai s NEXT n GOSUB changeWeights //Cp nht cc trng s LOOP Chng trnh l 1 vng lp v hn => phi c iu kin dng?

Chng 5: Tip cn perceptron

Gii thiu
Perceptron l mng neuron n tng Cch lan truyn tn hiu ca perceptron tng t vi neuron McCulloch-Pitts Cc gi tr u vo v cc mc kch hot ca perceptron l -1 hoc 1 Trng s l cc s thc Mc kch hot c xc nh qua tng wixi.

Gii thiu
Cho trc cc gi tr u vo xi, cc trng s wi, v mt ngng t, hm ngng f ca perceptron s tr v:

Gii thiu
S dng hm tc ng gii hn cht i vi mt v d x, gi tr u ra ca perceptron l

Perceptron Gii hn
Gii thut hc cho perceptron c chng minh l hi t (converge) nu:
Cc v d hc l c th phn tch tuyn tnh (linearly separable) S dng mt tc hc nh

Gii thut hc perceptron c th khng hi t nu nh cc v d hc khng th phn tch tuyn tnh (not linearly separable)

Perceptron Gii hn
Khi , p dng quy tc delta (delta rule)
m bo hi t v mt xp x ph hp nht ca hm mc tiu Quy tc delta s dng chin lc gradient descent tm trong khng gian gi thit (cc vect trng s) mt vect trng s ph hp nht vi cc v d hc

Hm nh gi li (Error Function)
Xt 1 mng neuron nhn to c n nron u ra i vi 1 v d hc (x,d), gi tr li hc (Training Error) gy ra bi vector trng s (hin ti) w:

Li hc gy ra bi vector trng s (hin ti) w i vi ton b tp hc D:

K thut Gradient
Gradient ca E (k hiu l E) l mt vect C hng ch i ln (dc) C di t l thun vi dc Gradient E xc nh hng gy ra vic tng nhanh nht (steepest increase) i vi gi tr li E trong N l tng s cc trng s (cc lin kt) trong mng V vy, hng gy ra vic gim nhanh nht (steepest decrease) l gi tr ph nh ca gradient ca E Yu cu: Cc hm tc ng c s dng trong mng phi l cc hm lin tc i vi cc trng s, v c o hm lin tc

K thut Gradient Minh ha

Gii thut hc Perceptron


Vi mt tp cc v d hc D= {(x,d)} x l vect u vo d l gi tr u ra mong mun (-1 hoc 1) Qu trnh hc ca perceptron nhm xc nh mt vect trng s cho php perceptron sinh ra gi tr u ra chnh xc ( 1- hoc 1) cho mi v d hc Vi mt v d hc x c perceptron phn lp chnh xc, th vect trng s w khng thay i Nu d=1 nhng perceptron li sinh ra -1 (Out=-1), th w cn c thay i sao cho gi tr Net(w,x) tng ln Nu d=-1 nhng perceptron li sinh ra 1 (Out=1), th w cn c thay i sao cho gi tr Net(w,x) gim i

Gii thut hc perceptron


Perceptron s dng lut nh sau: vi c l mt hng s cho trc, hng s ny th hin tc hc d l gi tr u ra mong mun perceptron s iu chnh trng s trn thnh phn th i ca vect u vo mt lng wi: wi = c(d f(wixi)) xi f(wixi) chnh l gi tr u ra ca perceptron, n c gi tr +1 hoc -1

Gii thut hc perceptron


Hiu gia d v f(wixi) l 0, 2 hoc -2. V vy, vi mi thnh phn ca vect u vo: - Nu gi tr u ra mong mun v gi tr u ra tht bng nhau, th khng lm g c. - Nu gi tr u ra thc l -1 v 1 l gi tr mong mun, th tng trng s ca ng th i ln 2cxi. - Nu gi tr u ra thc l 1 v -1 l gi tr mong mun, th gim trng s ca ng th i -2cxi c c gi l hng s th hin tc hc v nu c ln th cc gi tr iu chnh wi s ln, nh vy, y nhanh qu trnh wi hi t v gi tr ng ca n.

S dng mng perceptron cho bi ton phn loi

Hnh: Mt h thng phn loi y

S dng mng perceptron cho bi ton phn loi


Trong dy chuyn ny, mng perceptron ni ring v mng neuron ni chung ng vai tr nh mt my phn loi. Mt d liu u vo s c biu din nh mt vect gm n thnh phn (th hin cho n c trng) x1, x2, , xn. Cc d liu ny c th thuc 1 trong m lp class1, class2, classm. My phn loi s c nhim v xc nh xem d liu u vo thuc v lp no.

V d perceptron
Bng di th hin d liu rn luyn ca perceptron gm c 2 c trng (x1 v x2) mang gi tr thc kt qu mong mun (output) gm hai gi tr 1 hoc -1, th hin cho hai lp phn loi ca d liu.

V d perceptron

Tp d liu cho bi ton phn loi ca perceptron.

th hai chiu ca cc im d liu trong bng trn. Perceptron cung cp mt php tch tuyn tnh ca cc tp hp d liu

V d perceptron
Perceptron dng phn loi bi ton ny c thit k nh sau:
Tn hiu u vo: 2 tham s x1 v x2, cng vi mt u vo thin lch (bias) lun c gi tr 1. Mc kch hot: net = w1x1 + w2x2 + w3 Hm ngng f(net) l mt hm du, hay cn gi l hm Mng perceptron cho v d ca bng trn ngng hai cc tuyn tnh

V d perceptron

By gi chng ta s dng cc im d liu ca bng trn luyn tp perceptron ca hnh sau.

V d perceptron
Gi thit ban u cc trng s wi c gi tr ngu nhin ln lt l: [0.75,0.5,-0.6] S dng gii thut hc perceptron trnh by trn vi tc hc c c chn l mt s dng nh 0.2 Chng ta bt u bng v d u tin trong bng trn: f(net)1 = f(0.75 * 1 + 0.5 * 1 0.6 *1 ) = f(0.65) = 1 Ta thy f(net)1 c gi tr ng vi gi tr u ra mong mun, nn ta khng iu chnh trng s

V d perceptron
Cho nn W2 = W1 , W l vect trng s i din cho 3 trng s w1, w2,w3. n v d th 2: f(net)2 = f(0.75 * 9.4 + 0.5 * 6.4 0.6 *1 ) = f(9.65) = 1 Nhng gi tr mong i y l -1, v vy, ta cn iu chnh trng s theo lut hc: Wt = Wt-1 + c( dt-1 f(net)t-1 )Xt-1

V d perceptron
Trong : - c l hng s hc - W, X: l vect trng s, v vect d liu u vo - t l thi im Trong trng hp ny: c = 0.2, d2 = -1, v f(net)2 = 1

V d perceptron
p dng lut hc trn, ta c:

By gi chng ta xt v d th 3: f(net)3 = f(-3.01 * 2.5 -2.06 * 2.1 1.0 *1 ) = f(-12.84) = -1 Trong khi gi tr mong i ca v d ny l 1, nn cc trng s tip tc c iu chnh

V d perceptron

tip tc nh th, sau 10 ln lp, ng phn cch tuyn tnh nh trong hnh di xut hin Sau khi lp li vic hun luyn perceptron trn tp d liu cho, khong 500 ln lp tng cng, vect trng s hi t v gi tr [-1.3, -1.1, 10.9]. V y chnh l cc h s ca phng trnh ng phn cch tuyn tnh trong hnh: -1.3 * x1 1.1 * x2 + 10.9 = 0.

Chng 6: Vc t h tr my (SVM)
L thuyt SVM bt u t thp k 1970 do Vapnik, Chervonenkis. Tuy nhin, s ch bt u t thp k 1990. SVM l tp cc phng php hc c gim st dng phn lp Tnh cht ni tri ca SVM l ng thi cc tiu Khong cch l gia cc lp u th ca SVM so vi cc thut gii hc khc nh Neural Net, cy quyt nh l gii quyt rt tt bi ton qu khp. SVM c s dng nhiu trong phn loi v nhn dng (ch vit tay, mt ngi, vn tay, ..). SVM gn y c p dng nhiu vo sinh hc nh phn loi gene v protein.

Support Vector Machine


Ban u, SVM l mt gii thut hun luyn phn lp tuyn tnh. Sau n c s dng phn tch s hi quy, phn tch cc nhn t thit yu, cng nh cc trng hp khng tuyn tnh. SVM phn chia kh nng hm phn lp bi cc tiu khong cch l gia cc mu hun luyn v ranh gii quyt nh Gii php c a ra nh mt t hp tuyn tnh ca cc mu h tr, l tp con ca cc mu hun luyn cht t ranh gii quyt nh. c gi l cc vector h tr Vi trng hp khng tuyn tnh, SVM nh x cc tp d liu ca khng gian u vo vo trong khng gian c trng s chiu cao hn, l nh x tuyn tnh v gii thut hun luyn l ln c p dng y.

Support Vector Machine


Tuy nhin nh x c th c thc hin bi cc hm kernel. Trong khng gian c trng s chiu cao, s phn lp tuyn tnh v n gin l l ti a gia nhng lp c th tn ti.

L thuyt SVM
Dng ban u ca SVM c pht trin cho d liu c th phn tch tuyn tnh. Trong phn lp mu, gi thit chng ta c N d liu hun luyn: {(x1,y1), (x2,y2),, (xn,yn)}, y xi Rd v yi { 1}. Trong SVM tuyn tnh, chng ta mong mun hun luyn mt lp ng phn tch tuyn tnh: f(x) = sign(w.x + b). Chng ta cng mun siu vng ny c l phn cch ln nht ca 2 lp

Khong cch l ln nht (Maximal margin)

L thuyt SVM
c bit, chng ta mun tm ra ng: H: y=w.x+b=0 v 2 ng song song vi khong cch t n l: H1: y=w.x+b=+1 v H2:y=w.x+b=-1 Vi iu kin l khng c im d liu nm gia khng gian H1 v H2 v Khong cch l M gia H1 v H2 l ln nht Khong cch gia H1 ti H l: V nh vy gia H1 ti H2 l:

L thuyt SVM
Do lm cc i l, chng ta cn ti thiu ha |w| = wTw vi iu kin rng khng c im d liu no nm gia H1 v H2, tha mn: wx+b +1 cho cc v d dng yi=1 v wx+b -1 cho cc v d m yi=-1 Hai iu kin trn c th c kt hp tha mn iu kin yi(wx+b)1 V th bi ton ca chng ta c th c cng thc ha nh sau: ti yi(wx+b)1 y l 1 bi ton lp trnh li, bc 2 (w v b), trong 1 tp li, bi ton c th c gii quyt bng cch s dng cc nhn t Lagrange 1, 2, N0 cho mi d liu hun , luyn

L thuyt SVM
Nh vy chng ta c cng thc Lagrange sau: By gi chng ta c th cc i ha: L(w,b,) i vi , ty thuc vo rng buc l hng ca L(w,b,) v cc bin ban u w,b. V d: Khi chng ta c:

Chng 7: Tin x l v la chn c trng


Mt i tng thng c rt nhiu c trng => kh x l, gy nhiu => chng ta phi gim s c trng xung mc ti thiu, nhng khng lm mt i tnh cht ca i tng Mc ch chnh ca chng ny: tm ra mt s nt c trng, la chn nhng ci quan trng nht cng nh gim bt s c trng xung v ti cng thi im lu gi thng tin phn bit nh th no? Nu chng ta la chn cc c trng vi t sng sut, thit k sau l 1 s phn lp vi hiu qu khng cao

Tin x l (PREPROCESSING)
Loi b ngoi lai-Outlier Removal Ngoi lai: nh 1 im khng hp l rt xa t trung bnh cc bin ngu nhin tng ng Khong cch c o vi 1 ngng c th c a ra, thng thng l lch tiu chun ti 1 vi thi im Vi 1 bin ngu nhin thng thng, mt khong cch gia 2 thi im vi lch tiu chun chim 95% cc im, v khong cch gia 3 thi im vi lch tiu chun chim 99% cc im. Cc im vi nhng gi tr rt khc nhau vi gi tr trung bnh to ra nhng li ln trong thi gian hun luyn v c th c nhng hiu qu tai hi. Nhng hiu ng c th sai khi cc ngoi lai l kt qu ca cc php o nhiu Nu ngoi lai rt t, chng thng c b qua Nguwocj li cn phi loi b ngoi lai v ngoi lai s lm sai lch kt qu tnh.

TRCH CHN C TRNG BNG PHNG PHP PCA (Principal Components Analysis Phn tch thnh phn chnh )

2.1. s thng k
tng l chng ta c 1 tp d liu ln, v mun phn tch tp mu ca cc mi quan h gia cc im c bit trong tp d liu => chng ta a ra 1 vi php o trn tp d liu.
=> a ra cc TP chnh

2.2. lch chun


hiu v khi nim ny, chng ta xt thng k trn tp mu dn s. Tp mu l phn thng k ca dn s Xt tp mu: X=[X1 X2 Xn]
Xk Tng ng vi thnh phn th k ca X X1 l phn t bt u ca X N l s phn t ca X

2.2. lch chun


Cng thc tnh trung bnh ca tp mu: V d xt 2 tp mu c trung bnh l 10, nhng khc bit nhau kh r: S khc bit nh th no? thy c s khc bit ny chng ta thng qua vic tnh ton lch chun lch chun l g?
L trung bnh khong cch t trung bnh ca tp d liu ti 1 im. Cng thc tnh lch chun: S l lch chun ca mu X

2.2. lch chun

2.2. lch chun


Vi b mu: x=[10 10 10 10]
C gi tr trung bnh tp mu l 10 Nhng c lch chun l 0 Bi v tt c cc s l ging nhau, cho nn khng c lch chun t trung bnh.

Bi tp: Tnh trung bnh, lch chun ca cc tp d liu sau

2.3. Tnh hip bin (Covariance)


Vic xt lch chun trn vi d liu 1 chiu, nu vi d liu 2 chiu th nh th no? Cng thc tnh hip bin:
Cng thc tnh hip bin trn 1 chiu:

Cng thc hip bin trn 2 chiu:

2.3. Tnh hip bin (Covariance)


Bng di y ch ra hip bin gia thi gian sinh vin nghin cu v im nhn c.

2.3. Tnh hip bin (Covariance)

2.4. Ma trn hip bin


Tnh hip bin c tnh trn vi cc d liu 1 hoc 2 chiu, th vi d liu ln hn 2 chiu, tnh hip bin c tnh nh th no? Ta c th nh ngha ma trn hip bin vi tp d liu n chiu nh sau:
Cnxn l ma trn vi n hng, n ct Dimx l chiu th x

2.4. Ma trn hip bin


Xt tp d liu vi 3 chiu thng k: x,y,z. Ma trn hip bin l 1 ma trn vi kch thc 3x3, vi gi tr tnh nh sau:

2.4. Ma trn hip bin


Bi tp: 1. Tnh ma trn hip bin vi tp d liu 2 chiu x,y nh sau:

2. Tnh ma trn hip bin vi tp d liu 3 chiu x,y,z nh sau:

2.5. Ma trn s hc
Cc php ton trong ma trn s hc thng c dng trong PCA, ta nhc li 1 s php tnh ton trn ma trn s hc

V d 2 cho thy t l ca 1 eigenvector cng l 1 eigenvector

2.6. EigenVectors (Vector ring) v EigenValues (Tr ring)

Cc v d tm eigenvectors v eigenvalues

Bi tp
Tm vector ring v gi tr ring ca ma trn sau:

2.6. EigenVectors (Vector ring) v EigenValues (Tr ring)


Vi EigenVectors, l 1 vector 2 chiu. V d EigenVector c gi tr (a/b) => Ta v c 1 khe (plot) qua im (0,0) v (a,b) V d 1 EigenVector c gi tr l (2,3) => Ta v c 1 plot nh sau:

(0,0) plot

3. Phn tch thnh phn chnh Principal Components Analysis -PCA

3.1. phng php


Step1: Tp hp d liu
Trong v d ny chng ta s lm vic trn tp d liu 2 chiu.

Step2: Tr cho trung bnh


Tr cho trung bnh ng vi mi chiu d liu

3.1. phng php


Xt tp d liu:

3.1. phng php


Phn b d liu khi to:

3.1. phng php


Step3: Tnh ton ma trn hip bin
Do xt trn d liu 2 chiu, cho nn ma trn hip bin l 1 ma trn c kch thc 2x2. Chng ta s a ra c kt qu:

Do cc phn t trn ma trn hip bin l dng => cc bin x v y tng cng nhau.

3.1. phng php


Step4: Tnh ton eigenvectors v eigenvalues ca ma trn hip bin
Chng ta a ra c eigenvectors v eigenvalues ca ma trn hip bin trn.

Step5. La chn cc thnh phn v m hnh mt vector c trng


T cc Eigenvectors ca ma trn hip bin v cc d liu sau khi bin i ta c th sau:

Vector c trng
Vector c trng c dng: T v d vi tp d liu cho trn, ta c 2 eigenvector => ta c 2 la chn vector c trng.
La chn 1: Ma trn cc eigenvectors

La chn 2: n ct eigenvector

Step5: To ra tp d liu mi
Chn 1 eigenvector => vector c trng Chuyn v vector c trng Tnh ton d liu mi, y l d liu c chiu trn trc ta mi l eigenvector

Kim tra: phn tch thnh phn chnh ca b d liu sau9h25-10h15

Lm trn n 2 ch s sau fy

You might also like