You are on page 1of 28

1

GVHD: CN.Trn Nam Khnh


SV: Phm Huyn Trang
Lp: K52CA

K-Mean v ng dung

THUT TON K-MEAN


V NG DNG

NI DUNG CHNH
Phn cm

II.

Thut ton K-Mean


1.
2.
3.
4.
5.

III.

Khi qut v thut ton


Cc bc ca thut ton
V d minh ha Demo thut ton
nh gi thut ton
Tng qut ha v Cc bin th

K-Mean v ng dung

I.

ng dng ca thut ton K-Mean


2

I. PHN CM
1.

Cc i tng trong 1 cm tng t nhau.


Cc i tng khc cm th khng tng t nhau.

K-Mean v ng dung

Phn cm l g?
Qu trnh phn chia 1 tp d liu ban u thnh cc
cm d liu tha mn:

Gii quyt vn tm kim, pht hin cc cm, cc


mu d liu trong 1 tp hp ban u cc d liu khng
c nhn.
3

I. PHN CM

K-Mean v ng dung

Nu X : 1 tp cc im d liu
Ci : cm th i
X = C1 Ck
Ci

Cj =

ngoi lai

I. PHN CM
2. Mt s o trong phn cm
Minkowski
p
(||
x

y
||
i i )

1
p

i 1

Euclidean

p=2

K-Mean v ng dung

o tng t (gn nhau): cosin hai vect

v.w
cos = || v || . || w ||
5

I. PHN CM
3.

Mc ch ca phn cm
Xc nh c bn cht ca vic nhm cc i tng trong 1 tp
d liu khng c nhn.

Phn cm khng da trn 1 tiu chun chung no, m da vo


tiu ch m ngi dng cung cp trong tng trng hp.

K-Mean v ng dung

I. PHN CM
5.

Mt s phng php phn cm in hnh


cm phn hoch

Phn

cm phn cp

Phn

cm da trn mt

Phn

cm da trn li

Phn

cm da trn m hnh

Phn

cm c rng buc

K-Mean v ng dung

Phn

II.PHN CM PHN HOCH

Cc i tng trong 1 cm l tng t.


Cc i tng trong cc cm khc nhau l khng tng t
nhau.

K-Mean v ng dung

Phn 1 tp d liu c n phn t cho trc thnh k tp


con d liu (k n), mi tp con biu din 1 cm.
Cc cm hnh thnh trn c s lm ti u gi tr hm o
tng t sao cho:

c im:
Mi

i tng ch thuc v 1 cm.


Mi cm c ti thiu 1 i tng.

Mt s thut ton in hnh : K-mean, PAM, CLARA,

II.2. Thut ton K-Means


Pht biu bi ton:

S cm: K

Output
Cc cm Ci ( i = 1 K) tch ri v hm tiu chun E t
gi tr ti thiu.

xi R

K-Mean v ng dung

Input
Tp cc i tng X = {xi| i = 1, 2, , N},

II.1. KHI QUT V THUT TON

K-Mean lp li nhiu ln qu trnh:


Gn d liu.
Cp nht li v tr trng tm.

K-Mean v ng dung

Thut ton hot ng trn 1 tp vect d chiu, tp d liu


X gm N phn t:
X = {xi | i = 1, 2, , N}

Qu trnh lp dng li khi trng tm hi t v mi i


tng l 1 b phn ca 1 cm.
10

II.1. KHI QUT V THUT TON

Hm o tng t s dng khong cch Euclidean


E=

i 1 xi C j

(|| xi c j || )
2

trong cj l trng tm ca cm Cj

K-Mean v ng dung

Hm trn khng m, gim khi c 1 s thay i trong 1


trong 2 bc: gn d liu v nh li v tr tm.

11

II.2. CC BC CA THUT TON

Bc 1 - Khi to
Chn K trng tm {ci} (i = 1K).

Bc 2 - Tnh ton khong cch

( t )= {
i

x j :|| x j ci |||| x j ci* ||


(t )

( tfor
) all

= 1,*, k}

Bc 3 - Cp nht li trng tm

1
c
(t ) x j
Si | x j Si( t )
Bc 4 iu kin |dng
( t 1)
i

Lp li cc bc 2 v 3 cho ti khi khng c s thay i


trng tm ca cm.

12

II.2. CC BC CA THUT TON


Bt u
S
cm K

Khong cch cc
i tng n cc
trng tm
Nhm cc i
tng vo cc cm

K-Mean v ng dung

Trng tm

Khng c
i tng
chuyn
nhm

Kt thc

13

II.3 V D MINH HA
i tng

Thuc tnh 1 (X)

Thuc tnh 2 (Y)

K-Mean v ng dung

14

II.3 V D MINH HA

Bc 1: Khi to
Chn 2 trng tm ban u:
c1(1,1) A v c2(2,1) B, thuc 2 cm 1 v 2
K-Mean v ng dung

15

II.3 V D MINH HA

Bc 2: Tnh ton khong cch


2
2
(4

1)

(3

1)
d(C, c1) =
(4 2) 2 (3 1) 2

=8
d(C, c1) > d(C, c2)

d(D, c1) =

C thuc cm 2

K-Mean v ng dung

= 13
d(C, c2) =

(5 1) 2 (4 1) 2

= 25
2
2
d(D, c2) = (5 2) (4 1)
= 18
d(D,c1) > d(D, c2)

D thuc cm 2

16

II.3 V D MINH HA

Bc 3: Cp nht li v tr trng tm
Trng tm cm 1 c1 A (1, 1)

2 4 5 1 3 4
,
)
3
3

K-Mean v ng dung

Trng tm cm 2 c2 (x,y) = (

17

II.3 V D MINH HA

A thuc cm 1
d(B, c1 ) = 1 < d(B, c2 ) = 5.56
B thuc cm 1
d(C, c1 ) = 13 > d(C, c2 ) = 0.22

K-Mean v ng dung

Bc 4-1: Lp li bc 2 Tnh ton khong


cch
d(A, c1 ) = 0 < d(A, c2 ) = 9.89

C thuc cm 2
d(D, c1 ) = 25 > d(D, c2 ) = 3.56
D thuc cm 2

18

II.3 V D MINH HA

K-Mean v ng dung

Bc 4-2: Lp li bc 3-Cp nht trng tm


c1 = (3/2, 1) v c2 = (9/2, 7/2)

19

II.3 V D MINH HA

A thuc cm 1
d(B, c1 ) = 0.25 < d(B, c2 ) = 12.5
B thuc cm 1
d(C, c1 ) = 10.25 < d(C, c2 ) = 0.5

K-Mean v ng dung

Bc 4-3: Lp li bc 2
d(A, c1 ) = 0.25 < d(A, c2 ) = 18.5

C thuc cm 2
d(D, c1 ) = 21.25 > d(D, c2 ) = 0.5
D thuc cm 2
20

II.3 V D MINH HA

K-Mean v ng dung

21

II.4 NH GI THUT TON U


IM
1.

3.
4.
5.
6.

7.

K-Mean v ng dung

2.

phc tp: O( K .N .l ) vi l: s ln lp
C kh nng m rng, c th d dng sa i vi
nhng d liu mi.
Bo m hi t sau 1 s bc lp hu hn.
Lun c K cm d liu
Lun c t nht 1 im d liu trong 1 cm d liu.
Cc cm khng phn cp v khng b chng cho d
liu ln nhau.
Mi thnh vin ca 1 cm l gn vi chnh cm hn
bt c 1 cm no khc.

22

II.4 NH GI THUT TON NHC


IM
1.

2.

4.

5.

K-Mean v ng dung

3.

Khng c kh nng tm ra cc cm khng li hoc cc


cm c hnh dng phc tp.
Kh khn trong vic xc nh cc trng tm cm ban u
- Chn ngu nhin cc trung tm cm lc khi to
- hi t ca thut ton ph thuc vo vic khi to
cc vector trung tm cm
Kh chn ra c s lng cm ti u ngay t u,
m phi qua nhiu ln th tm ra c s lng cm
ti u.
Rt nhy cm vi nhiu v cc phn t ngoi lai trong d
liu.
Khng phi lc no mi i tng cng ch thuc v 1
cm, ch ph hp vi ng bin gia cc cm r.

23

II.5 TNG QUT HA V CC BIN TH


B. Cc bin th

Tng t thut ton K-mean


Mi cm c i din bi mt trong cc i
tng ca cm.
Chn i tng gn tm cm nht lm i
din cho cm .
K-medoid khc phc c nhiu, nhng
phc tp ln hn.

K-Mean v ng dung

1. Thut ton K-medoid:

24

II.5 TNG QUT HA V CC BIN TH


2.

Thut ton Fuzzy c-mean (FCM):

Nu K-mean l phn cm d liu cng (1 im d


liu ch thuc v 1 cm) th FCM l phn cm d
liu m (1 im d liu c th thuc v nhiu hn 1
cm vi 1 xc sut nht nh).
Thm yu t quan h gia cc phn t v cc cm
d liu thng qua cc trng s trong ma trn biu
bin bc ca cc thnh vin vi 1 cm.
FCM khc phc c cc cm d liu chng nhau
trn cc tp d liu c kch thc ln hn, nhiu
chiu v nhiu nhiu, song vn nhy cm vi nhiu
v cc phn t ngoi lai.

K-Mean v ng dung

Chung chin lc phn cm vi K-mean.

25

III. NG DNG CA THUT TON


Phn cm ti liu web.
1. Tm kim v trch rt ti liu
2. Tin x l ti liu: Qu trnh tch t v vecto ha ti
liu: tm kim v thay th cc t bi ch s ca t
trong t in.Biu din d liu di dng vect.
3. p dng K-Mean
Kt qu tr v l cc cm ti liu v cc trng tm tng
ng.
Phn vng nh

K-Mean v ng dung

26

TI LIU THAM KHO

Ti liu chnh: [WKQ08] Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep
Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, Philip

algorithms in data mining, Knowl Inf Syst (2008) 14:137

Pavel Berkhin (). Survey of Clustering Data Mining Techniques

http://en.wikipedia.org/wiki/K-means_clustering

http://en.wikipedia.org/wiki/Segmentation_(image_processing)

Slide KI2 7 Clustering Algorithms - Johan Everts

http://vi.wikipedia.org/wiki/Hc_khng_c_gim_st

http://people.revoledu.com/kardi/tutorial/kMean/NumericalExample.htm

K-Mean v ng dung

S. Yu , Zhi-Hua Zhou, Michael Steinbach, David J. Hand, Dan Steinberg (2008). Top 10

27

K-Mean v ng dung

THANK YOU FOR LISTENING

28

You might also like