You are on page 1of 19

DR NEENA SONDHI

CHAPTER-18

CLUSTER ANALYSIS
DR DEEPAK CHAWLA

RESEARCH METHODOLOGY CONCEPTS AND CASES


DR NEENA SONDHI SLIDE 18-1

What is Cluster analysis?


Cluster analysis is a techniques for grouping objects, cases,
entities on the basis of multiple variables. The advantage of
the technique is that it is applicable to both metric and non-
metric data.
DR DEEPAK CHAWLA

Secondly, the grouping can be done post hoc , i.e. after the
primary data survey is over. The technique has wide
applications in all branches of management . However, it is
most often used for market segmentation analysis.

RESEARCH METHODOLOGY CONCEPTS AND CASES


DR NEENA SONDHI SLIDE 18-2

Cluster analysis- basic tenets


 Can be used to cluster objects, individuals and entities

 Similarity is based on multiple variables

 Measures proximity between study variables


DR DEEPAK CHAWLA

 Groups that are grouped in one cluster are


homogenous as compared to others

 Can be conducted on metric, non-metric as well as


mixed data

RESEARCH METHODOLOGY CONCEPTS AND CASES


SLIDE 18-3

Usage of cluster analysis


DR NEENA SONDHI

 Market segmentation – customers/potential customers can

be split into smaller more homogenous groups by using


the method.
 Segmenting industries – the same grouping principle can

be applied for industrial consumers.


DR DEEPAK CHAWLA

 Segmenting markets – cities or regions with similar or

common traits can be grouped on the basis of climatic or


socio-economic conditions.

RESEARCH METHODOLOGY CONCEPTS AND CASES


SLIDE 18-4

Usage of cluster analysis


DR NEENA SONDHI

 Career planning and training analysis – for human


resource planning people can be grouped into clusters on
the basis of their educational/experience or aptitude and
aspirations.
DR DEEPAK CHAWLA

 Segmenting financial sector/instruments – different factors


like raw material cost, financial allocations, seasonality and
other factors are being used to group sectors together to
understand the growth and performance of a group of
industries.

RESEARCH METHODOLOGY CONCEPTS AND CASES


DR NEENA SONDHI SLIDE 18-5

Statistics associated with cluster analysis

 Metric data analysis

 X  X jk 
3
d ij 
2
ik
k 1
DR DEEPAK CHAWLA

Where,
 dij = distance between person i and j.
 k = variable (interval / ratio)
 i = object
 j = object

RESEARCH METHODOLOGY CONCEPTS AND CASES


SLIDE 18-6

Statistics associated with cluster


DR NEENA SONDHI

analysis
Non-metric data

 Simple matching coefficient =

 Jaccard coefficient =
DR DEEPAK CHAWLA

Where
 P=positive matches
 N=negative matches
 M=mismatches

RESEARCH METHODOLOGY CONCEPTS AND CASES


SLIDE 18-7

Statistics associated with cluster


DR NEENA SONDHI

analysis
Mixed Data
DR DEEPAK CHAWLA

RESEARCH METHODOLOGY CONCEPTS AND CASES


DR NEENA SONDHI SLIDE 18-8

Key concepts in cluster analysis


 Agglomeration schedule: A hierarchical method that provides
information on the objects, starting with the most similar pair and then
at each stage provides information on the object joining the pair at a
later stage.

 ANOVA table: The univariate or one way ANOVA statistics for each
clustering variable. The higher is the ANOVA value , the higher is the
DR DEEPAK CHAWLA

difference between the clusters on that variable.

 Cluster variate: The variables or parameters representing the objects


to be clustered and used to calculate the similarity between objects.

 Cluster centroid: The average values of the objects on all the


variables in the cluster variate.

RESEARCH METHODOLOGY CONCEPTS AND CASES


DR NEENA SONDHI SLIDE 18-9

Key concepts in cluster analysis


 Cluster seeds: Initial cluster centres in the non-hierarchical clustering
that are the initial points from which one starts. Then the clusters are
created around these seeds.

 Cluster membership: This indicates the address or the cluster to which


a particular person/object belongs.

 Dendrogram: This is a tree like diagram that is used to graphically


present the cluster results. The vertical axis represents the objects and
DR DEEPAK CHAWLA

the horizontal represents the inter-respondent distance. The figure is to


be read from left to right.

 Distances between final cluster centres: These are the distances


between the individual pairs of clusters. A robust solution that is able to
demarcate the groups distinctly is the one where the inter cluster
distance is large; the larger the distance the more distinct are the
clusters.

RESEARCH METHODOLOGY CONCEPTS AND CASES


DR NEENA SONDHI SLIDE 18-10

Key concepts in cluster analysis


 Entropy group: The individuals or small groups that do
not seem to fit into any cluster.

 Final cluster centres: The mean value of the cluster on


each of the variables that is a part of the cluster variate.

 Hierarchical methods: A step-wise process that starts


with the most similar pair and formulates a tree-like
DR DEEPAK CHAWLA

structure composed of separate clusters.

 Non-hierarchical methods: Cluster seeds or centres are


the starting points and one builds individual clusters
around it based on some pre-specified distance of the
seeds.

RESEARCH METHODOLOGY CONCEPTS AND CASES


DR NEENA SONDHI SLIDE 18-11

Key concepts in cluster analysis


 Proximity matrix: A data matrix that consists of pair-wise
distances/similarities between the objects. It is a N x N
matrix, where N is the number of objects being clustered.

 Summary: Number of cases in each cluster is indicated in


the non-hierarchical clustering method.
DR DEEPAK CHAWLA

 Vertical icicle diagram: Quite similar to the dendogram, it


is a graphical method to demonstrate the composition of
the clusters. The objects are individually displayed at the
top. At any given stage the columns correspond to the
objects being clustered, and the rows correspond to the
number of clusters. An icicle diagram is read from bottom
to top.

RESEARCH METHODOLOGY CONCEPTS AND CASES


SLIDE 18-12
Cluster analysis process
DR NEENA SONDHI

RESEARCH OBJECTIVES
Stage 1 Exploratory versus confirmatory
objectives
Select variables used to cluster objects

Metric data CLUSTER ASSUMPTIONS Nonmetric data


Are the cluster variables metric or non
metric?
Stage 2
2

Distance measures of similarity Association measures of similarity


Squared Euclidean distance Matching coefficients

Stage 3 CLUSTERING ALGORITHM


Is a hierarchical, nonhierarchical, or
combination of the two methods
used?

HIERARCHICAL NONHIERARCHICH TWO STEP COMBINATION


METHODS AL METHODS CLUSTER Use a hierarchical
Single Linkage Sequential method to specify
Complete Linkage Threshold cluster seeds for a
DR DEEPAK CHAWLA

Average Linkage Parallel Threshold nonhierarchical


Wards’ Methods Optimization method
Centroid Method

Stage 4 NUMBER OF CLUSTERS


Hierarchical methods
Examine dendrogram
Cluster membership
Conceptual consideration

Stage 5 INTERPRETING THE CLUSTERS


Examine cluster variables.
Name clusters

Stage 6 VALIDATING AND PROFILING THE


CLUSTERS
Validation
Profiling

RESEARCH METHODOLOGY CONCEPTS AND CASES


SLIDE 18-13

Illustration : Nano study


DR NEENA SONDHI

Inter respondent Distance Cluster Combine


C A S E 0 5 10 15 20 25
+---------+---------+---------+---------+---------+

18 
25 
7  
13  
11   
21 

6   
3  
8  
5  
10   
17   
22     
15    
DR DEEPAK CHAWLA

2   
16  
20   
12  
19   
14   
9  
24  
1   
23  
4 

RESEARCH METHODOLOGY CONCEPTS AND CASES


SLIDE 18-14

Illustration: Nano study


DR NEENA SONDHI

ANOVA

F Sig.

I think in India we have been able to achieve technological


39.036 .000
standard of high order

I prefer to buy things made in India 44.896 .000

I usually buy things which provide value for money 53.716 .000
DR DEEPAK CHAWLA

Convenience is more important than style 65.008 .000

I do not like wasteful expenditure 92.103 .000

When it comes to safety I believe there should be no


50.579 .000
compromises.

I'm a "saver" rather than a "spender." 23.468 .000

I like to try new and different things. 164.223 .000

I always want to be a part of changing world 96.749 .000

RESEARCH METHODOLOGY CONCEPTS AND CASES


SLIDE 18-15

Illustration : Nano study


DR NEENA SONDHI

Cluster centroids for Nano sample survey

Cluster

1 2 3

I think in India we have been able to achieve


2.17 2.00 4.40
technological standard of high order

I prefer to buy things made in India 1.67 2.22 4.70

I usually buy things which provide value for money 4.67 1.44 2.70
DR DEEPAK CHAWLA

Convenience is more important than style 4.67 1.78 2.10

I do not like wasteful expenditure 4.33 1.00 2.80

When it comes to safety I believe there should be no


4.67 1.22 2.60
compromises.

I'm a "saver" rather than a "spender." 4.17 1.00 2.60

I like to try new and different things. 1.50 4.78 1.20

I always want to be a part of changing world 1.33 4.33 1.40

RESEARCH METHODOLOGY CONCEPTS AND CASES


SLIDE 18-16

Illustration: Nano study


DR NEENA SONDHI

Cluster summary- Nano sample survey

Cluster 1( cautious consumer) 6.000


DR DEEPAK CHAWLA

Cluster 2( innovative consumer) 9.000


Cluster 3( Patriotic consumer) 10.000
Valid 25.000
Missing .000

RESEARCH METHODOLOGY CONCEPTS AND CASES


DR NEENA SONDHI SLIDE 18-17

Validating the cluster solution


 Use two-step clustering to measure the stability of
the obtained solution.

 Split the data in half and conduct clustering on each


and check cluster centroids.
DR DEEPAK CHAWLA

 Use subjective judgment to evaluate both group


formation as well as cluster potential for managerial
decision.

RESEARCH METHODOLOGY CONCEPTS AND CASES


DR NEENA SONDHI

END OF CHAPTER
DR DEEPAK CHAWLA

RESEARCH METHODOLOGY CONCEPTS AND CASES

You might also like