Professional Documents
Culture Documents
Hierarchical Partitional
(How-to) Hierarchical Clustering
The number of possible dendrograms
with n leafs = Since we cannot test all possible trees
(2n -3)!/[(2(n -2)) (n -2)!] we will have to heuristic search of all
possible trees. We could do this..
Number Number of Possible
of Leafs Dendrograms
2 1 Bottom-Up (agglomerative): Starting
3 3 with each item in its own cluster, find
4 15
5 105
the best pair to merge into a new
... … cluster. Repeat until all clusters are
10 34,459,425 fused together.
0 8 8 7 7
0 2 4 4
0 5 5
D( , ) = 8 0 3
D( , ) = 3 0
A generic technique for measuring similarity
To measure the similarity between two objects, transform one
of the objects into the other, and measure how much effort it
took. The measure of effort becomes the distance measure.
• Basic algorithm
1. Compute the distance matrix between the input data points
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the distance matrix
6. Until only a single cluster remains
Consider all
Choose
possible
… the best
merges…
Consider all
Choose
possible
… the best
merges…
Consider all
Cluster
possible
… the
merges…
closest
• Single linkage (nearest neighbor): In this method the distance between two
clusters is determined by the distance of the two closest objects (nearest
neighbors) in the different clusters.
• Complete linkage (furthest neighbor): In this method, the distances
between clusters are determined by the greatest distance between any two
objects in the different clusters (i.e., by the "furthest neighbors").
• Group average linkage: In this method, the distance between two clusters is
calculated as the average distance between all pairs of objects in the two
different clusters.
Single linkage
29 2 6 11 9 17 10 13 24 25 26 20 22 30 27 1 3 8 4 12 5 14 23 15 16 18 19 21 28 7
Average linkage
Summary of Hierarchal Clustering Methods
Jawab :
Matrik X menyimpan titik-titik tsb
Selanjutnya menghitung distance titik 1 dan 2, titik 1 dan 3, dst sampai semua
pasangan titik diketahui distance-nya. Fungsi matlab untuk melakukan ini adalah
pdist.
dendrogram (Z)