You are on page 1of 2

Indian Institute of Management Rohtak/Kozhikode/Raipur

Cluster Analysis

Basic concept of cluster analysis

Cluster analysis is a class of techniques used to classify objects or cases into relatively
homogeneous groups called clusters. Objects in each cluster tend to be similar to each other
and dissimilar to objects in other clusters. Both cluster analysis and discriminant analysis are
concerned with classification. However, discriminant analysis requires prior knowledge of
the cluster or group membership for each object or case included to develop the classification
rule. In contrast, in cluster analysis there is no prior information about the group or cluster
membership for any of the objects. Groups or clusters are suggested by the data not defined a
priori.

Application of cluster analysis

Cluster analysis has been used in marketing for a variety of purpose, including the following:

Segmenting the market


Understanding buying behaviour
Identify new market opportunity
Reducing data

Approach of cluster analysis

Hierarchical and non-hierarchical approaches are there. Some marketing researchers suggest
to directly conduct non-hierarchical. However, for conducting non-hierarchical, first we need
to decide no of cluster (which is a subjective criteria) though one can verify if the NO is
appropriate or not in later stage. If not, then again have to run non-hierarchical. However, the
subjectivity in deciding no of cluster can be mitigated if we combine both the approach
together Hierarchical and non-hierarchical (Malhotra and Dash, 2014). In this case, the
number of clusters which we get in Hierarchical approach (e.g., with the help of
Dendrogram), can be used as an INPUT of Non-hierarchical approach. Thus, in line with
most of the researchers (e.g., Malhotra and Dash, 2014; Hair et al., 2015), we also suggest to
consider cluster analysis approach as Hierarchical CUM Non-hierarchical.

SPSS commands
Recommended approach for Cluster Analysis: Hierarchical CUM Non-hierarchical
Hierarchical approach identifies the Number of Clusters and the Objects belong to each
cluster (from Dendrogram one can see the both).
SPSS commands for Hierarchical cluster analysis
Analyze > Classify > Hierarchical
Move all variables in Variable (s) box.
Then go to Statistics box and then select Agglomeration schedule and Proximity matrix
and click Continue.
Then go to Plots and select Dendrogram and Continue.
Then go to Method and select Wards method (from Cluster method option) and Squared
Euclidean distance (from Interval option) and Continue.
(Please note depending upon the different combinations of Distance measure and Clustering
method, results may vary).
Finally Click OK.

How to interpret the Outputs of Hierarchical approach (to decide the number of clusters
and their objects)
How to decide the number of factors?
No firm rules exist, but guidelines on this issue include:
(a) Theoretical, conceptual, or practical considerations may suggest a
certain number of clusters.
(b) In hierarchical clustering, the distances at which the clusters are being
combined is an important criterion. Dendrograms can help us to decide
the number of factor (Refer class discussion).
(c) The relative sizes of the clusters should be meaningful. It is not
meaningful to have a cluster with only one case, so a three-cluster
solution is preferable in this situation.
(d) You can use Elbow rule also (Drawing scatter plot with Stage and
Coefficients columns from Ward Linkage matrix, Refer handout
distributed in class).
Non-hierarchical approach starts directly with the arbitrary selection of number of clusters.
In this case, the number of clusters which one got in Hierarchical approach can be used as an
INPUT of Non-hierarchical approach. Suppose using Dendrogram you got 3 Cluster, use that
number as input of non-hierarchical approach (this is how we combine Hierarchical approach
with Non-hierarchical approach). Non-hierarchical also known as K-means/Quick cluster.

SPSS commands for Non-hierarchical cluster analysis


Analyze > Classify > K-means
Move all variables in Variables box.
Then in Number of Clusters Box write arbitrary number of cluster (use the cluster number
as found in Hierarchical method).
Then go to Options and select Cluster Information for each Case and Continue
Then click OK.

How to interpret the Outputs of Non-hierarchical approach (Objects under each cluster
and cluster profile)
Look at Cluster Membership table which gives you how the objects are distributed under
each cluster (Cross check with Dendogram of Hierarchical approach).
For cluster profiling: Use Final Cluster Centers Table and profile each cluster looking at
the value of each valuable under each cluster. Please look at the scale (from the case) too
while profiling the cluster.