You are on page 1of 4

CLUSTER EVALUATION

TECHNIQUES
ATDS ASSIGNMENT

Archa E S
M180017MS
Introduction
Clustering involves the grouping of similar objects into a set known as cluster. Objects in
one cluster are likely to be different when compared to objects grouped under
another cluster. Clustering is one of the main tasks in exploratory data mining and is also a
technique used in statistical data analysis. It is a type of unsupervised learning method in
which we draw references from datasets consisting of input data without labelled responses.
Generally, it is used as a process to find meaningful structure, explanatory underlying
processes, generative features, and groupings inherent in a set of examples.
Clustering is very much important as it determines the intrinsic grouping among the
unlabelled data present. There are no criteria for a good clustering. It depends on the user,
what is the criteria they may use which satisfy their need. For instance, we could be interested
in finding representatives for homogeneous groups (data reduction), in finding “natural
clusters” and describe their unknown properties (“natural” data types), in finding useful and
suitable groupings (“useful” data classes) or in finding unusual data objects (outlier
detection). This algorithm must make some assumptions which constitute the similarity of
points and each assumption make different and equally valid clusters.

Clustering Methods
 Density-Based Methods : These methods consider the clusters as the dense region
having some similarity and different from the lower dense region of the space. These
methods have good accuracy and ability to merge two clusters. Example DBSCAN
(Density-Based Spatial Clustering of Applications with Noise) , OPTICS (Ordering
Points to Identify Clustering Structure) etc.

 Hierarchical Based Methods : The clusters formed in this method forms a tree-type
structure based on the hierarchy. New clusters are formed using the previously formed
one. It is divided into two category
 Agglomerative (bottom up approach)
 Divisive (top down approach)
examples CURE (Clustering Using Representatives), BIRCH (Balanced Iterative
Reducing Clustering and using Hierarchies) etc.

 Partitioning Methods : These methods partition the objects into k clusters and each
partition forms one cluster. This method is used to optimize an objective criterion
similarity function such as when the distance is a major parameter example K-means,
CLARANS (Clustering Large Applications based upon Randomized Search) etc.

 Grid-based Methods : In this method the data space is formulated into a finite
number of cells that form a grid-like structure. All the clustering operation done on
these grids are fast and independent of the number of data objects example STING
(Statistical Information Grid), wave cluster, CLIQUE (Clustering In Quest) etc.
Clustering Algorithms

1. K-means clustering algorithm – It is the simplest unsupervised learning algorithm that


solves clustering problem. K-means algorithm partition n observations into k clusters where
each observation belongs to the cluster with the nearest mean serving as a prototype of the
cluster.
The k-Means clustering algorithm is an unsupervised hard clustering method which assigns
the n data objects O1,….,On to a pre-defined number of exactly k clusters C1,…,Ck. Initial
verb clusters are iteratively re-organised by assigning each verb to its closest cluster
(centroid) and re-calculating cluster centroids until no further changes take place. The k-
Means algorithm is sensitive to the selection of the initial partition, so the initialisation should
be varied. k-Means imposes a Gaussian parametric design on the clustering result and
generally works well on data sets with isotropic cluster shape, since it tends to create compact
clusters.

2. Hierarchical clustering:- Hierarchical clustering methods impose a hierarchical structure


on the data objects and their step-wise clusters, i.e. one extreme of the clustering structure is
only one cluster containing all objects, the other extreme is a number of clusters which equals
the number of objects. To obtain a certain number kof clusters, the hierarchy is cut at the
relevant depth. Hierarchical clustering is a rigid procedure, since it is not possible to re-
organise clusters established in a previous step. Depending on whether the clustering is
performed top-down, i.e. from a single cluster to the maximum number of clusters, or
bottom-up, i.e. from the maximum number of clusters to a single cluster, we distinguish
divisive and agglomerative clustering. Divisive clustering is computationally more
problematic than agglomerative clustering, because it needs to consider all possible divisions
into subsets.

Difference between K Means and Hierarchical clustering


 Hierarchical clustering can’t handle big data well but K Means clustering can. This is
because the time complexity of K Means is linear i.e. O(n) while that of hierarchical
clustering is quadratic i.e. O(n2).
 In K Means clustering, since we start with random choice of clusters, the results
produced by running the algorithm multiple times might differ. While results are
reproducible in Hierarchical clustering.
 K Means is found to work well when the shape of the clusters is hyper spherical (like
circle in 2D, sphere in 3D).
 K Means clustering requires prior knowledge of K i.e. no. of clusters you want to
divide your data into. But, you can stop at whatever number of clusters you find
appropriate in hierarchical clustering by interpreting the dendrogram.

You might also like