Professional Documents
Culture Documents
Cao hc kha II
(V vy vic to ra mt similarity query tng qut cho mt d liu khng n gin cht no) Cc thut ton gom cm u yu cu xc nh s cm cn thc hin (i vi thut ton K-means) hoc yu cu phn bit xc nh cc i tng c tnh cht tng t nhau (DBSCAN). K-Means l mt thut ton c p dng kh nhiu trong gom cm d liu v hiu nng v tnh hin thc kh tt. Tuy nhin ngoi vic cn cho trc s cm, K-Means cn i hi phi chn trc k im lm trng tm, vic chn ngu nhin ny c th cho ra cc kt qu khc nhau.
Data Mining
L c Hiu CH0401016
Cao hc kha II
Chi tit ca thut ton K-Means v mt s cc ci tit ca K-Means l K-medoids v Fuzzy c-mean c Thy Phc ni rt k trong gio trnh nn xin khng nhc li m mun trnh by mt thut ton ci tit khc ca k-means. Incremental K-Means: Nh ta bit thut ton K- means bt u bng cch chn k cm v chn ngu nhin k im lm trung tm cm, hoc chn phn hoch ngu nhin k cm v tnh trng tm ca tng cm ny. Vic chn ngu nhin k im lm trung tm cm nh ni trn c th cho ra cc kt qu khc nhau ty vo chn k im ny. Thut ton Incremental K-means c bn vn da trn thut ton k-means nhng s khng chn k im lm trng tm cho k cm m s tng s cm t 1 ln k cm bng cch a trung tm cm mi vo cm c mo dng ln nht (tng s cm) v tnh li trng tm cc cm. Thut ton c trnh by nh sau: Gn K=1 Phase1: Bc 1: Nu K=1 chn bt k mt im lm trung tm cm. Nu K>1 thm trung tm ca cm mi vo cm c mo dng ln nht. Gn tng im vo cm c trung tm gn im ang xt nht v cp nht li trung tm cm Nu trung tm cm khng thay i, thc hin Phase 2. Else, thc hin Phase 1 bc 2.
Bc 2: Bc 3:
Phase 2:
(tng s cm) Nu K<= gi tr n nh s cm, tng K ln 1, thc hin Phase 1 bc 1. Else, Stop.
Cc bc ca thut ton gn ging nh thut ton K-Means, tuy nhin c mt im khc bit l thut ton a ra mo dng ca cc cm. Da trn s mo dng phn chia cm. Mo dng: Mo dng ca cm c tnh nh sau: I=S-N(d(w,x )) w: trung tm ca cm, N: S Objects trong cm. S: Tng bnh phng khong cch gia cc objects trong cm v trung tm ca khng gian Euclidean.
Data Mining
L c Hiu CH0401016
Cao hc kha II
I: mo dng ca cm d(w,x): l khong cch gia trung tm w ca cm v trung tm ca khng gian Euclidean x. Nh vy 1 cm c mo dng ln c ngha l trung tm cm c v tr khng thch hp. Tht ra ta c th nhn thy vic xc nh cc cm ng ngha vi vic xc nh trung tm ca cm, thut ton ch yu tm trung tm cm chnh xc v xc nh li cc i tng trong cm. - Phase 1 ca thut ton tng t nh K-Means ch khc l khng xc nh trc k im m tng k t 1 ln. im khc bit th hai l chn cm c mo dng ln phn thnh 2 cm. Khi ta phn 1 cm mo dng ln thnh 2 cm th mo dng ca 2 cm s gim. Tnh li object cho cc cm v cp nht li trung tm cm. Sau khi trung tm cc cm khng i, ta qua Phase 2 v tng k ln 1 quay li Phase 1. tm cm c mo dng ln nht trong cc cm v tip tc tch cm ny thnh 2 cm mi. Thut ton ngng khi k = s cm cn tm.
So vi K-Means c phc tp l O(tkn) th Incremental K-Means c phc tp l O( k 2 nt). Nh vy so vi k-means th phc tp ca thut ton tng ln nhng khng bng K-mendoids do k thng nh hn nhiu so vi n. Tuy nhin u im ca thut ton l gim s ph thuc vo vic khi to cc cm ban u nn ta s khng phi lp li thut ton vi vic chn cc cm ban u khc nhau tm ra kt qu ti u nh K-Means .
Data Mining