AI Pertemuan 6

K-MEANS CLUSTERING
ALGORITHM
1
K-Means Clustering Algorithm
Konsep Clustering
Clustering is the classification of objects into different

groups, or more precisely, the partitioning of a data set
into subsets (clusters), so that the data in each subset
(ideally) share some common trait - often according to
some defined distance measure.
2
Clustering Concept
o Group a number of data or objects into clusters (groups) so that
each cluster will contain as similar data as possible.
o K-Means, including unsupervised learning.
o The data on the clustering technique is not known for its output
(output or label).
o Method for measuring cluster quality: sum of squared error (SSE):
k 2
SSE    d  p, m 
i 1 pCi
i
p Є Ci = each data point in cluster i, mi = centroid of cluster i, d = closest

distances/ variance in each cluster i.
o The SSE value depends on the number of clusters and how the data is
grouped into clusters. The smaller the SSE value, the better the
clustering results.
3

o K-Means, including partitioning clustering.
o The objects are grouped into k clusters.
o First, to perform the clustering process, one must determine the
value of K.
o These clusters have a middle value (central value) called the
centroid.
o Use similarity measures to group objects.
o Similarities are translated into the concept of distance (d).
o The closer the distance between two objects or data, the
higher the similarity.
o The purpose of k-Means is to minimize the total distance of
elements between clusters.
4

1. Select the desired number of k clusters.
2. Initialize the value of k for the cluster centre (centroid) randomly.
3. Place each data or object in the nearest cluster. The distance determines
the proximity of two objects. The distance used in the k-Means algorithm
is Euclidean distance (d).
n
d Euclidean x, y   
 i i  2
x y
i 1
x = x1, x2, . . . , xn, and y = y1, y2, . . . , yn is the number of n.
4. Recalculate the cluster centre with the current cluster membership.

The cluster centre is the average (mean) of all data or objects in a
particular cluster.
5
Algoritma K-Means Clustering
6
Example 1
Tabel 1 Data point 1. Determine the number of clusters k=2.
Instances X Y 2. Determine the initial centroid
randomly, for example, from the
A 1 3
data beside m1 =(1,1), m2=(2,1).
B 3 3
3. Place each object to the nearest
C 4 3 cluster based on the centroid value
D 5 3 closest to the difference (distance).
E 1 2
Table 2, The results are: cluster1 =
{A,E,G}, cluster 2={B,C,D,F,H}. The
F 4 2
SSE values are:
G 1 1
H 2 1 k 2
SSE    d  p, mi 
i 1 pCi
tampilan data awal
7
Example 1
Tabel 2
4. Calculate the new centroid value:
m1  1  1  1 / 3, 3  2  1 / 3  1,2
m2  3  4  5  4  2 / 5, 3  3  3  2  1 / 5  3,6;2,4
5. Reassign each object using the new

cluster centre. In table 3, the new SSE
value:
Clusters and
centroids after
the first stage.
8
Example 1
Tabel 3
• cluster 1={A,E,G,H}, cluster
2={B,C,D,F}, then look for the
new centroid value, namely:
m1=(1,25;1,75) and
m2=(4;2.75).
• Reassign each object using the

new cluster centre. In table 4,
The new SSE value:
Clusters and
centroids
after the
second stage.
9
Example 1
Tabel 4
• It can be seen in table 4. There are no more member

changes in each cluster.
• The final results are: cluster 1={A,E,G,H}, and cluster
2={B,C,D,F} with SSE value = 6.25 and the number of
iterations 3.
10
K-Means Clustering Visual Basic Code

Dim isStillMoving As Boolean
Sub kMeanCluster (Data() As Variant, numCluster As Integer)
isStillMoving = True
' main function to cluster data into k number of Clusters
if totalData <= numCluster Then
' input:
'only the last data is put here because it designed to be interactive
' + Data matrix (0 to 2, 1 to TotalData);
Data(0, totalData) = totalData ' cluster No = total data
' Row 0 = cluster, 1 =X, 2= Y; data in columns
Centroid(1, totalData) = Data(1, totalData) ' X
' + numCluster: number of cluster user want the data to be clustered
Centroid(2, totalData) = Data(2, totalData) ' Y
' + private variables: Centroid, TotalData
Else
' ouput:
'calculate minimum distance to assign the new data
' o) update centroid
min = 10 ^ 10 'big number
' o) assign cluster number to the Data (= row 0 of Data)
X = Data(1, totalData)
Y = Data(2, totalData)
Dim i As Integer
For i = 1 To numCluster
Dim j As Integer
d = dist(X, Y, Centroid(1, i), Centroid(2, i))
Dim X As Single
If d < min Then
Dim Y As Single
min = d
Dim min As Single
cluster = i
Dim cluster As Integer
End If
Dim d As Single
Next i
Dim sumXY()
Data(0, totalData) = cluster
11
K-Means Clustering Visual Basic Code

For i = 1 To totalData
Do While isStillMoving
min = 10 ^ 10 'big number
' this loop will surely convergent
X = Data(1, i)
'calculate new centroids
Y = Data(2, i)
' 1 =X, 2=Y, 3=count number of data
For j = 1 To numCluster
ReDim sumXY(1 To 3, 1 To numCluster)
d = dist(X, Y, Centroid(1, j), Centroid(2, j))
For i = 1 To totalData
If d < min Then
sumXY(1, Data(0, i)) = Data(1, i) + sumXY(1, Data(0, i))
min = d
sumXY(2, Data(0, i)) = Data(2, i) + sumXY(2, Data(0, i))
cluster = j
Data(0, i))
End If
sumXY(3, Data(0, i)) = 1 + sumXY(3, Data(0, i))
Next j
Next i
If Data(0, i) <> cluster Then
For i = 1 To numCluster
Data(0, i) = cluster
Centroid(1, i) = sumXY(1, i) / sumXY(3, i)
isStillMoving = True
Centroid(2, i) = sumXY(2, i) / sumXY(3, i)
End If
Next i
Next i
'assign all data to the new centroids
Loop
isStillMoving = False
End If
End Sub
12
Exercise 1
The following table is a dataset of 15 students taking Data mining courses. The 15
students will be grouped into three parts, namely the smart, normal and poor groups.
Do the calculation of the SSE value.
NO NAMA UTS TUGAS UAS
MAHASISWA
1 Roy 89 90 75
2 Sintia 90 71 95
3 Iqbal 70 75 80
4 Dilan 45 65 59
5 Ratna 65 75 53
6 Merry 80 70 75
7 Rudi 90 85 81
8 Hafiz 70 70 73
9 Gede 96 93 85
10 Christian 60 55 48
11 Justin 45 60 58
12 Jesika 60 70 72
13 Ayu 85 90 88
14 Siska 52 68 55
15 Reitama 40 60 7
13
Exercise 2
Perform the clustering process on the following data. Also, do some experiments to
determine the optimal k (number of clusters) based on the minimum SSE value. Also,
draw a scatter graph for each value of k. Implement the clustering process using
MATLAB.
14

AI Pertemuan 6

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI Pertemuan 6

Uploaded by

Copyright:

Available Formats

K-MEANS CLUSTERING

Clustering is the classification of objects into different

p Є Ci = each data point in cluster i, mi = centroid of cluster i, d = closest

K-Means Clustering Algorithm

K-Means Clustering Algorithm

x = x1, x2, . . . , xn, and y = y1, y2, . . . , yn is the number of n.

4. Recalculate the cluster centre with the current cluster membership.

Algoritma K-Means Clustering

5. Reassign each object using the new

• Reassign each object using the

• It can be seen in table 4. There are no more member

K-Means Clustering Visual Basic Code

K-Means Clustering Visual Basic Code

You might also like