Professional Documents
Culture Documents
ALGORITHM
1
K-Means Clustering Algorithm
Konsep Clustering
2
K-Means Clustering Algorithm
Clustering Concept
o Group a number of data or objects into clusters (groups) so that
each cluster will contain as similar data as possible.
o K-Means, including unsupervised learning.
o The data on the clustering technique is not known for its output
(output or label).
o Method for measuring cluster quality: sum of squared error (SSE):
k 2
SSE d p, m
i 1 pCi
i
3
K-Means Clustering Algorithm
4
K-Means Clustering Algorithm
5
K-Means Clustering Algorithm
6
K-Means Clustering Algorithm
Example 1
Tabel 1 Data point 1. Determine the number of clusters k=2.
Instances X Y 2. Determine the initial centroid
randomly, for example, from the
A 1 3
data beside m1 =(1,1), m2=(2,1).
B 3 3
3. Place each object to the nearest
C 4 3 cluster based on the centroid value
D 5 3 closest to the difference (distance).
E 1 2
Table 2, The results are: cluster1 =
{A,E,G}, cluster 2={B,C,D,F,H}. The
F 4 2
SSE values are:
G 1 1
H 2 1 k 2
SSE d p, mi
i 1 pCi
tampilan data awal
7
K-Means Clustering Algorithm
Example 1
Tabel 2
4. Calculate the new centroid value:
m1 1 1 1 / 3, 3 2 1 / 3 1,2
m2 3 4 5 4 2 / 5, 3 3 3 2 1 / 5 3,6;2,4
Clusters and
centroids after
the first stage.
8
K-Means Clustering Algorithm
Example 1
Tabel 3
• cluster 1={A,E,G,H}, cluster
2={B,C,D,F}, then look for the
new centroid value, namely:
m1=(1,25;1,75) and
m2=(4;2.75).
9
K-Means Clustering Algorithm
Example 1
Tabel 4
11
K-Means Clustering Algorithm
12
K-Means Clustering Algorithm
Exercise 1
The following table is a dataset of 15 students taking Data mining courses. The 15
students will be grouped into three parts, namely the smart, normal and poor groups.
Do the calculation of the SSE value.
NO NAMA UTS TUGAS UAS
MAHASISWA
1 Roy 89 90 75
2 Sintia 90 71 95
3 Iqbal 70 75 80
4 Dilan 45 65 59
5 Ratna 65 75 53
6 Merry 80 70 75
7 Rudi 90 85 81
8 Hafiz 70 70 73
9 Gede 96 93 85
10 Christian 60 55 48
11 Justin 45 60 58
12 Jesika 60 70 72
13 Ayu 85 90 88
14 Siska 52 68 55
15 Reitama 40 60 7
13
K-Means Clustering Algorithm
Exercise 2
Perform the clustering process on the following data. Also, do some experiments to
determine the optimal k (number of clusters) based on the minimum SSE value. Also,
draw a scatter graph for each value of k. Implement the clustering process using
MATLAB.
14