Professional Documents
Culture Documents
HIERARCHICAL
CLUSTERING in I.R.
By:
Suraj Jogani (20117101)
Pankaj Agarwal (20117068)
Aditya Kumar Dubey(20117901)
Electrical Engineering
6th Semester
What is clustering?
Grouping set of documents into subsets or clusters.
The Goal of clustering algorithm is: To create clusters that are coherent internally,
but clearly different from each other
Documents within a cluster should be as similar as possible; and
Documents in one cluster should be as dissimilar as possible from documents in
other clusters
Clustering algorithms
Flat algorithm
Usually start with a random (partial) partitioning
Refine it iteratively by changing the centroid
K-Means clustering
Model based clustering
Hierarchical algorithm
Hierarchical algorithms are algorithms where you also have the explicit notion of a
hierarchy
In hierarchical we can cluster our documents into certain no of clusters and then we can
group together those clusters in turn into larger clusters and so on to finally have a
hierarchy.
Bottom up ,agglomerative
Top down ,Divisive
Hard vs soft clustering
Bottom-up algorithms treat each document as a singleton clusters at the outset and then successively merge pairs of
clusters until all clusters have been merged into a single cluster that contains all documents.
1.Single link
2.Complete link
3.Group-average
4.Centroid similarity.
Single-link and complete-link clustering
Single link clustering: In single link or single linkage clustering ,the similarity of two
clusters is the similarity of their most similar members
Complete link clustering: In complete link clustering or complete linkage clustering, the
similarity of two clusters is the similarity of their most dissimilar members.
Divisive clustering