You are on page 1of 3

Indian Institute of Management Bangalore

Cluster Analysis

RMD PGP – IV (2021-2022)

Basic concept of cluster analysis

Cluster analysis is a class of techniques used to classify samples/customers into relatively homogeneous
groups called clusters. Customers/samples in each cluster tend to be similar to each other and dissimilar to
samples/customers in other clusters. Both cluster analysis and discriminant analysis are concerned with
classification. However, discriminant analysis requires prior knowledge of the cluster or group
membership for each sample/customer included to develop the classification rule. In contrast, in cluster
analysis there is no prior information about the group or cluster membership for any of the
samples/customers. Groups or clusters are suggested by the data not defined a priori.

Application of cluster analysis

Cluster analysis has been used in marketing for a variety of purpose, including the followings:

 Segmenting the market. Subsequently, managers can target one or more of these market segments.
 Understanding buying behaviour
 Identify new market opportunity
 Reducing data/customers/samples

Two Managerial Questions Pertaining to Cluster Analysis

Answers of the following two questions help managers understand/perform cluster analysis effectively:

1. How many segments (clusters) would be derived from a given set of customers/data?
2. What are the profiles of these segments (i.e., characteristics of these segments).

Approach of cluster analysis – Hierarchical or Non-hierarchical?

Hierarchical and non-hierarchical approaches are there. Hierarchical approach can answer managerial
question 1 as mentioned above but unable to answer question 2. On the other hand, non-hierarchical
approach can answer managerial question 2 but unable to answer question 1. Some marketing researchers
suggest to directly conduct non-hierarchical which can give us the profiles of segments/clusters as
mentioned in Questions 2 above. However, for conducting non-hierarchical, first we need to decide
number of cluster, as mention in Question 1 above (managers can decide the number of cluster based on
their requirements – subjective way) though one can verify if the number is appropriate or not in later
stage. If not, then again have to run non-hierarchical. However, the subjectivity in deciding no of cluster
can be mitigated if we combine both the approach together - Hierarchical and non-hierarchical (Das et al.,
2019; Malhotra and Dash, 2014). In this case, the number of clusters which we get in Hierarchical
approach (e.g., with the help of Dendrogram), can be used as an INPUT of Non-hierarchical approach.
Thus, in line with most of the researchers (e.g., Das et al., 2019; Hair et al., 2015), we also suggest to
consider cluster analysis approach as Hierarchical CUM Non-hierarchical.

1
SPSS commands
Recommended approach for Cluster Analysis: Hierarchical CUM Non-hierarchical
Hierarchical approach identifies the Number of Clusters and the Customers belong to each cluster (from
Dendrogram one can see the both).

SPSS commands for Hierarchical cluster analysis


Analyze > Classify > Hierarchical,
Next, Move all variables in Variable (s) box.
Then go to Statistics box and then select “Agglomeration schedule” and “Proximity matrix” and click
Continue.
Then go to Plots and select “Dendrogram” and Continue.
Then go to Method and select “Ward’s method” (from Cluster method option) and “Squared Euclidean
distance” (from Interval option) and Continue.
(Please note depending upon the different combinations of Distance measure and Clustering method,
results may vary). Finally Click OK.

Hierarchical approach

How to interpret the Outputs of Hierarchical approach (to decide the number of clusters and their
customers – Managerial question 1)
How to decide the number of clusters?
No firm rules exist, but guidelines on this issue include:
(a) Theoretical, conceptual, or practical considerations may suggest a certain number
of clusters.
(b) In hierarchical clustering, the distances at which the clusters are being combined
is an important criterion. Dendrograms can help us to decide the number of factor
(Refer class discussion).
(c) The relative sizes of the clusters should be meaningful. It is not meaningful to
have a cluster with only one case.

Following the above mentioned points to decide number of cluster, if we consider distance 10 of the
“Rescaled Distance Cluster Combine” (refer the Dendrogram above), we get 3 Clusters. So, the answer of
the managerial question 1 is 3 here.

Non-hierarchical approach starts directly with the arbitrary selection of number of clusters. In this case,
the number of clusters which we got in Hierarchical approach can be used as an INPUT of Non-
hierarchical approach. For example, for the above hierarchical case, we got 3 clusters which can be used as
an input/clusters for non-hierarchical approach (this is how we combine Hierarchical approach with Non-
hierarchical approach). Non-hierarchical also known as K-means/Quick cluster.

2
SPSS commands for Non-hierarchical cluster analysis
Analyze > Classify > K-means
Move all variables in Variables box.
Then in “Number of Clusters” Box write arbitrary number of cluster (use the cluster number as found in
Hierarchical method).
Then go to Options and select “Cluster Information for each Case” and Continue
Then click OK.

How to interpret the Outputs of Non-hierarchical approach (Customers under each cluster and cluster
profile – Managerial question 2)
Look at “Cluster Membership” table which gives you how the customers are distributed under each cluster
(Cross check with Dendogram of Hierarchical approach, an optional interpretation).

For cluster profiling: Use “Final Cluster Centers” Table and profile each cluster looking at the value of
each valuable under each cluster. Please look at the scale (from the case) too while profiling the cluster,
refer the following table and discussion.

Final Cluster Centers


Cluster
1 2 3
Shopping is a fun (V1) 4 2 6
Shopping is bad for your budget (V2) 6 3 4
I combine shopping with eating out (V3) 3 2 6
I try to get best buys when shopping (V4) 6 4 3
I don’t care about shopping (V5) 4 6 2
You can save a lot of money by comparing prices (V6) 6 3 4
PS: 7-point Likert-type scale was used to collect data: 1 = strongly disagree, 2 = disagree, 3 = somewhat disagree, 4 =
neutral, 5 = somewhat agree, 6 = agree, and 7 = strongly agree.

The scores under each cluster is the average scores of all customers’ opinions/perceptions
pertaining to the clustering variables such as V1, V2, V3…..For example, the average score/opinion of all
customers belong to cluster 1 pertaining to the variable “shopping is a fun (V1)” is 4 (refer the table
above). Similarly, we describe the characteristics of customers belonging to each cluster in terms of the
segmenting variables. For example, the profile/characteristics of cluster 1 is as follows:
Cluster 1: The average opinion of all customers belonging to cluster 1 on the variable V1 (shopping is a
fun) is 4, i.e., neutral (refer the scale points mentioned above). Similarly, for V2 (shopping is bad for you
budget) is agree (6), for V3 disagree, for V4 agree, for V5 neutral, and for V6 agree.
Similarly, we need to describe the characteristics of all clusters.
Then we can take an appropriate cluster(s) that matches with our marketing and product strategies.
This act is known as Targeting which is one of the main pillars of long-term profit and sustainability.

You might also like