You are on page 1of 9

K-Means Clustering Example

K-Means Clustering Example


We recall from the previous lecture, that clustering allows for unsupervised learning. That is, the machine / software will learn on its own, using the data learning set!, and will classif" the o#$ects into a particular class % for example, if our class decision! attri#ute is tumorType and its values are& malignant, #enign, etc. - these will #e the classes. The" will #e represented #" cluster', cluster(, etc. )owever, the class information is never provided to the algorithm. The class information can #e used later on, to evaluate how accuratel" the algorithm classified the o#$ects.
Curvatur e *.+ *../ *.(1 *.(1 Texture '.( '.0 *.0 *./ Blood Consump , 2 2 Tumor Type -enign -enign Malignant Malignant

x ' x ( x 1 x 0 . .

x ' x ( x 1 x 0 . .

Curvatur e *.+ *../ *.(1 *.(1

Texture '.( '.0 *.0 *./

Blood Consump , 2 2

Tumor Type -enign -enign Malignant Malignant

learning set! The wa" we do that, is #" plotting the o#$ects from the data#ase into space. Each attri#ute is one dimension& Texture '.(

*.0

.
, *.(1

x'

2 -

-lood Consump

*.+ Curvature

'

K-Means Clustering Example

,fter all the o#$ects are plotted, we will calculate the distance #etween them, and the ones that are close to each other % we will group them together, i.e. place them in the same cluster.

Texture Cluster ' '.( #enign

*.0

. . ...
2 , *.(1 *.+ Curvature

Cluster ( malignant

-lood Consump

With the K-Means algorithm, we recall it wor3s as follows&

K-means Clustering

Partitiona c u!tering a""roach #ach c u!ter i! a!!ociated $ith a centroid %center "oint& #ach "oint i! a!!igned to the c u!ter $ith the c o!e!t centroid 'umber o( c u!ter!, K, mu!t be !"eci(ied %i! "redetermined& The ba!ic a gorithm i! )er* !im" e

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

19

K-Means Clustering Example

K-means Clustering Details


Initia centroid! are o(ten cho!en random *+


, - u!ter! "roduced )ar* (rom one run to another+

The centroid i! %t*"ica *& the mean o( the "oint! in the c u!ter+ .- o!ene!!/ i! mea!ured b* #uc idean di!tance, co!ine !imi arit*, corre ation, etc+ %the di!tance mea!ure / (unction $i be !"eci(ied& K0Mean! $i con)erge %centroid! mo)e at each iteration&+ Mo!t o( the con)ergence ha""en! in the (ir!t (e$ iteration!+
, + + + 1(ten the !to""ing condition i! changed to .2nti re ati)e * (e$ "oint! change c u!ter!/

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

20

Example 4ro#lem& Cluster the following eight points with x, "! representing locations! into three clusters ,' (, '*! ,( (, /! ,1 +, 0! ,0 /, +! ,/ ., /! ,5 5, 0! ,. ', (! ,+ 0, 6!. 7nitial cluster centers are& ,' (, '*!, ,0 /, +! and ,. ', (!. The distance function #etween two points a=(x1, y1) and b=(x2, y2) is defined as& (a, b) = |x2 x1| + |y2 y1| . 8se 3-means algorithm to find the three cluster centers after the second iteration. 9olution& 7teration ' Point (, '*! (, /! +, 0! /, +! ., /! 5, 0! ', (! 0, 6! (, '*! Dist Mean 1 /, +! Dist Mean 2 ', (! Dist Mean 3 Cluster

,' ,( ,1 ,0 ,/ ,5 ,. ,+

K-Means Clustering Example

:irst we list all points in the first column of the ta#le a#ove. The initial cluster centers % means, are (, '*!, /, +! and ', (! - chosen randoml". ;ext, we will calculate the distance from the first point (, '*! to each of the three means, #" using the distance function&

point x1, y1 (, '*!

mean' x2, y2 (, '*! (a, b) = |x2 x1| + |y2 y1|

(point, mean1) = |x2 x1| + |y2 y1| = |( (| + <'* '*| =*=* =*

point x1, y1 (, '*!

mean( x2, y2 /, +! (a, b) = |x2 x1| + |y2 y1|

(point, mean2) = |x2 x1| + |y2 y1| = |/ (| + <+ '*| =1=( =/

point x1, y1 (, '*!

mean1 x2, y2 ', (! (a, b) = |x2 x1| + |y2 y1|

(point, mean2) = |x2 x1| + |y2 y1| = |' (| + <( '*| ='=+

K-Means Clustering Example

=6

9o, we fill in these values in the ta#le& Point (, '*! (, /! +, 0! /, +! ., /! 5, 0! ', (! 0, 6! (, '*! Dist Mean 1 * /, +! Dist Mean 2 / ', (! Dist Mean 3 6 Cluster '

,' ,( ,1 ,0 ,/ ,5 ,. ,+

9o, which cluster should the point (, '*! #e placed in> The one, where the point has the shortest distance to the mean % that is mean ' cluster '!, since the distance is *. Cluster ' (, '*! Cluster ( Cluster 1

9o, we go to the second point (, /! and we will calculate the distance to each of the three means, #" using the distance function& point x1, y1 (, /! mean' x2, y2 (, '*! (a, b) = |x2 x1| + |y2 y1| (point, mean1) = |x2 x1| + |y2 y1| = |( (| + <'* /| =*=/ =/

point x1, y1

mean( x2, y2 /

K-Means Clustering Example

(, /!

/, +! (a, b) = |x2 x1| + |y2 y1|

(point, mean2) = |x2 x1| + |y2 y1| = |/ (| + <+ /| =1=1 =5

point x1, y1 (, /!

mean1 x2, y2 ', (! (a, b) = |x2 x1| + |y2 y1|

(point, mean2) = |x2 x1| + |y2 y1| = |' (| + <( /| ='=1 =0 9o, we fill in these values in the ta#le& 7teration ' Point (, '*! (, /! +, 0! /, +! ., /! 5, 0! ', (! 0, 6! (, '*! Dist Mean 1 * / /, +! Dist Mean 2 / 5 ', (! Dist Mean 3 6 0 Cluster ' 1

,' ,( ,1 ,0 ,/ ,5 ,. ,+

9o, which cluster should the point (, /! #e placed in> The one, where the point has the shortest distance to the mean % that is mean 1 cluster 1!, since the distance is *. Cluster ' (, '*! Cluster ( Cluster 1 (, /!

K-Means Clustering Example

,nalogicall", we fill in the rest of the ta#le, and place each point in one of the clusters& 7teration ' Point (, '*! (, /! +, 0! /, +! ., /! 5, 0! ', (! 0, 6! (, '*! Dist Mean 1 * / '( / '* '* 6 1 /, +! Dist Mean 2 / 5 . * / / '* ( ', (! Dist Mean 3 6 0 6 '* 6 . * '* Cluster ' 1 ( ( ( ( 1 (

,' ,( ,1 ,0 ,/ ,5 ,. ,+

Cluster ' (, '*!

Cluster ( Cluster 1 +, 0! (, /! /, +! ', (! ., /! 5, 0! 0, 6! ;ext, we need to re-compute the new cluster centers means!. We do so, #" ta3ing the mean of all points in each cluster. :or Cluster ', we onl" have one point ,' (, '*!, which was the old mean, so the cluster center remains the same. :or Cluster (, we have :or Cluster 1, we have +=/=.=5=0!//, 0=+=/=0=6!// ! ? 5, 5! (='!/(, /=(!/( ! ? './, 1./!

K-Means Clustering Example

The initial cluster centers are shown in red dot. The new cluster centers are shown in red x. That was 7teration' epoch'!. ;ext, we go to 7teration( epoch(!, 7teration1, and so on until the means do not change an"more. 7n 7teration(, we #asicall" repeat the process from 7teration' this time using the new means we computed.

K-Means Clustering Example

You might also like