You are on page 1of 2

Difference between K means and

Hierarchical Clustering
k-means is method of cluster analysis using a pre-specified no. of clusters. It
requires advance knowledge of ‘K’.
Hierarchical clustering also known as hierarchical cluster analysis (HCA) is also a
method of cluster analysis which seeks to build a hierarchy of clusters without
having fixed number of cluster.
Main differences between K means and Hierarchical Clustering are:

k-means Clustering Hierarchical Clustering

k-means, using a pre-specified


number of clusters, the method
Hierarchical methods can be either divisive
assigns records to each cluster to find
or agglomerative.
the mutually exclusive cluster of
spherical shape based on distance.

K Means clustering needed advance In hierarchical clustering one can stop at any
knowledge of K i.e. no. of clusters one number of clusters, one find appropriate by
want to divide your data. interpreting the dendrogram.

Agglomerative methods begin with ‘n’


One can use median or mean as a
clusters and sequentially combine similar
cluster centre to represent each cluster.
clusters until only one cluster is obtained.

Divisive methods work in the opposite


direction, beginning with one cluster that
Methods used are normally less
includes all the records and Hierarchical
computationally intensive and are
methods are especially useful when the
suited with very large datasets.
target is to arrange the clusters into a natural
hierarchy.

In K Means clustering, since one start


with random choice of clusters, the In Hierarchical Clustering, results are
results produced by running the reproducible in Hierarchical clustering
algorithm many times may differ.

K- means clustering a simply a


division of the set of data objects into
A hierarchical clustering is a set of nested
non-overlapping subsets (clusters)
clusters that are arranged as a tree.
such that each data object is in exactly
one subset).
K Means clustering is found to work
Hierarchical clustering don’t work as well
well when the structure of the clusters
as, k means when the shape of the clusters
is hyper spherical (like circle in 2D,
is hyper spherical.
sphere in 3D).

Advantages: 1 .Ease of handling of any


Advantages: 1. Convergence is
forms of similarity or distance. 2.
guaranteed. 2. Specialized to clusters
Consequently, applicability to any attributes
of different sizes and shapes.
types.

Disadvantage: 1. Hierarchical clustering


Disadvantages: 1. K-Value is difficult
requires the computation and storage of an
to predict 2. Didn’t work well with
n×n distance matrix. For very large
global cluster.
datasets, this can be expensive and slow

You might also like