Professional Documents
Culture Documents
CLUSTERING: PARTITIONING
APPROACH
OBJECTIVES
https://discuss.cryosparc.com/t/using-particles-from-cluster-mode-in-3d-va-for-refinement-fa
ils/3665/2
MAJOR CLUSTERING APPROACHES
• Partitioning approach:
• Construct various partitions and then evaluate them by some criterion, e.g.,
minimizing the sum of square errors
• Typical methods: k-means, k-medoids, CLARANS
• Hierarchical approach:
• Create a hierarchical decomposition of the set of data (or objects) using some criterion
• Typical methods: Diana, Agnes, BIRCH, CAMELEON
• Density-based approach:
• Based on connectivity and density functions: DBSACN, OPTICS, DenClue
• Grid-based approach:
• Based on a multiple-level granularity structure, a finite number of cells: STING, WaveCluster,
CLIQUE
PARTITIONING ALGORITHMS: BASIC CONCEPT
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
EXAMPLE OF k-Means CLUSTERING
10 10
10
9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
3 Assign 3 Update 3
2
each
2
1
the 2
1
1
0 objects 0
0 1 2 3 4 5 6 7 8 9 10
cluster 0
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
to most means
similar reassign reassign
center 10 10
K=2 9 9
8 8
Arbitrarily choose K 7
6
7
4
Update 4
center 3 3
2 the 2
0
cluster 1
0
0 1 2 3 4 5 6 7 8 9 10
means 0 1 2 3 4 5 6 7 8 9 10
THE k-Means CLUSTERING APPROACH
Point x y You are given 10 points with
X1 2 6 variables x and y.
X3
X2 3 4 Find 2 clusters using k-means
X4 algorithm
X3 3 8 X1 X10 with given initialized centroids:
X4 4 7 c1 = (3,4) and c2 = (7,4)
X9
X5 6 2 X2 X6 X X8
1
X6 6 4 X7 Tips:
X5
X7 7 3 • Numeric variable
X8 • Use Euclidean distance for
7 4
d(i,j)
X9 8 5
X10 7 6
THE k-Means CLUSTERING APPROACH
Point x y You are given 10 points with
X1 2 6 variables x and y.
X3
X2 3 4 Find 2 clusters using k-means
X4 algorithm
X3 3 8 X1 X10 with given initialized centroids:
X4 4 7 c1 = (3,4) and c2 = (7,4)
X9
X5 6 2 X2 X6 X X8
1
X6 6 4 X7 Example: d(X1, c1) and d(X1, c2)
X5
X7 7 3
X8 7 4 = 2.23
X9 8 5 = 5.39
X9 8 5 =4
E ik1 pCi ( p mi ) 2
VARIATIONS OF THE k-Means APPROACH
1. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 3rd Edition,
Morgan Kaufmann, 2012.
2. Pang-Ning Tan, Michael Steinbach & Vipin Kumar, Introduction to Data Mining, Addison
Wesley, 2019.
3. Lloyd, Stuart P. (1957). "Least square quantization in PCM". Bell Telephone Laboratories
Paper. Published in journal much later: Lloyd, Stuart P. (1982). "Least squares quantization in
PCM" (PDF). IEEE Transactions on Information
THANK YOU
Shuzlina Abdul Rahman | Sofianita Mutalib | Siti Nur Kamaliah Kamarudin