You are on page 1of 3

L c Hiu CH0401016

Cao hc kha II

THUT TON K-MEANS TRONG GOM CM D LIU


Sau khi c ti liu ca Thy Phc v mt s cc ti liu khc em c rt ra c mt s im v Gom cm d liu bng thut ton K-Means nh sau: Gom cm nhn t gc t nhin l mt vic ht sc bnh thng m chng ta vn lm v thc hin hng ngy v d nh phn loi hc sinh kh, gii trong lp, phn loi t ai, phn loi ti sn, phn loi sch trong th vin Vic phn loi ny l thc hin gom cc i tng c cng tnh cht hay c cc tnh cht gn ging nhau thnh nhm. thc hin phn loi cc i tng no , chng ta bao gi cng t cu hi, chng ta phn nhm da trn yu t no? Hoc chng ta nh phn thnh bao nhiu nhm? V d : Hy phn cc sn phm thnh nhm theo cc yu t sau : tht, c, u v rau c. Hoc hy sp xp cc sn phm mt cch khoa hc thnh 5 gian hng trong siu th. Khi chng ta p dng my tnh vo phn cm d liu th chng ta cng phi cho cc thut ton bit chng ta mun phn cm nh th no. i vi bn thn chng ta nu c ai (Xp ) ni vi bn hy phn loi cc sn phm trong siu th th chc chn bn s phi hi Xp mun phn loi nh th no?. i vi gom cm, d liu l c nh nhng kt qu gom cm s l khc nhau nu ta a yu cu gom cm khc nhau. Tuy nhin khi c yu cu gom cm khc nhau th cc bc chun ho d liu cng s khc nhau v khi ta ch quan tm n cc thuc tnh ca d liu cn thit cho gom cm m thi V d: Ta c khng gian d liu l dn s nc Vit nam : - Yu cu gom cm theo yu t giai cp trong x hi th ta s c k = 4 (nng dn, cng nhn, tr thc, thng nhn). - Yu cu gom cm theo nhm tui ta c k =3 (gi, tr, trung nin). Yu cu gom cm theo trnh hc vn ta c k = 7(m ch, cp 1, cp 2, cp 3, i hc, thc s, tin s).

(V vy vic to ra mt similarity query tng qut cho mt d liu khng n gin cht no) Cc thut ton gom cm u yu cu xc nh s cm cn thc hin (i vi thut ton K-means) hoc yu cu phn bit xc nh cc i tng c tnh cht tng t nhau (DBSCAN). K-Means l mt thut ton c p dng kh nhiu trong gom cm d liu v hiu nng v tnh hin thc kh tt. Tuy nhin ngoi vic cn cho trc s cm, K-Means cn i hi phi chn trc k im lm trng tm, vic chn ngu nhin ny c th cho ra cc kt qu khc nhau.

Data Mining

L c Hiu CH0401016

Cao hc kha II

Chi tit ca thut ton K-Means v mt s cc ci tit ca K-Means l K-medoids v Fuzzy c-mean c Thy Phc ni rt k trong gio trnh nn xin khng nhc li m mun trnh by mt thut ton ci tit khc ca k-means. Incremental K-Means: Nh ta bit thut ton K- means bt u bng cch chn k cm v chn ngu nhin k im lm trung tm cm, hoc chn phn hoch ngu nhin k cm v tnh trng tm ca tng cm ny. Vic chn ngu nhin k im lm trung tm cm nh ni trn c th cho ra cc kt qu khc nhau ty vo chn k im ny. Thut ton Incremental K-means c bn vn da trn thut ton k-means nhng s khng chn k im lm trng tm cho k cm m s tng s cm t 1 ln k cm bng cch a trung tm cm mi vo cm c mo dng ln nht (tng s cm) v tnh li trng tm cc cm. Thut ton c trnh by nh sau: Gn K=1 Phase1: Bc 1: Nu K=1 chn bt k mt im lm trung tm cm. Nu K>1 thm trung tm ca cm mi vo cm c mo dng ln nht. Gn tng im vo cm c trung tm gn im ang xt nht v cp nht li trung tm cm Nu trung tm cm khng thay i, thc hin Phase 2. Else, thc hin Phase 1 bc 2.

Bc 2: Bc 3:

Phase 2:

(tng s cm) Nu K<= gi tr n nh s cm, tng K ln 1, thc hin Phase 1 bc 1. Else, Stop.

Cc bc ca thut ton gn ging nh thut ton K-Means, tuy nhin c mt im khc bit l thut ton a ra mo dng ca cc cm. Da trn s mo dng phn chia cm. Mo dng: Mo dng ca cm c tnh nh sau: I=S-N(d(w,x )) w: trung tm ca cm, N: S Objects trong cm. S: Tng bnh phng khong cch gia cc objects trong cm v trung tm ca khng gian Euclidean.

Data Mining

L c Hiu CH0401016

Cao hc kha II

I: mo dng ca cm d(w,x): l khong cch gia trung tm w ca cm v trung tm ca khng gian Euclidean x. Nh vy 1 cm c mo dng ln c ngha l trung tm cm c v tr khng thch hp. Tht ra ta c th nhn thy vic xc nh cc cm ng ngha vi vic xc nh trung tm ca cm, thut ton ch yu tm trung tm cm chnh xc v xc nh li cc i tng trong cm. - Phase 1 ca thut ton tng t nh K-Means ch khc l khng xc nh trc k im m tng k t 1 ln. im khc bit th hai l chn cm c mo dng ln phn thnh 2 cm. Khi ta phn 1 cm mo dng ln thnh 2 cm th mo dng ca 2 cm s gim. Tnh li object cho cc cm v cp nht li trung tm cm. Sau khi trung tm cc cm khng i, ta qua Phase 2 v tng k ln 1 quay li Phase 1. tm cm c mo dng ln nht trong cc cm v tip tc tch cm ny thnh 2 cm mi. Thut ton ngng khi k = s cm cn tm.

So vi K-Means c phc tp l O(tkn) th Incremental K-Means c phc tp l O( k 2 nt). Nh vy so vi k-means th phc tp ca thut ton tng ln nhng khng bng K-mendoids do k thng nh hn nhiu so vi n. Tuy nhin u im ca thut ton l gim s ph thuc vo vic khi to cc cm ban u nn ta s khng phi lp li thut ton vi vic chn cc cm ban u khc nhau tm ra kt qu ti u nh K-Means .

Data Mining

You might also like