Professional Documents
Culture Documents
MktRes MARK7362 Lecture8 - 002
MktRes MARK7362 Lecture8 - 002
Chapter Outline
1) Overview
2) Basic Concept
3) Statistics Associated with Cluster Analysis
4) Conducting Cluster Analysis
i. Formulating the Problem
ii. Selecting a Distance or Similarity Measure
iii. Selecting a Clustering Procedure
iv. Deciding on the Number of Clusters
v. Interpreting and Profiling the Clusters
vi. Assessing Reliability and Validity
Cluster Analysis
• Used to classify objects (cases) into
homogeneous groups called clusters.
• Objects in each cluster tend to be similar and
dissimilar to objects in the other clusters.
• Both cluster analysis and discriminant analysis
are concerned with classification.
• Discriminant analysis requires prior knowledge
of group membership.
• In cluster analysis groups are suggested by the
data.
An Ideal Clustering Situation
Fig. 20.1
Variable 1
Variable 2
More Common Clustering Situation
Fig. 20.2
Variable 1
X
Variable 2
Statistics Associated with Cluster Analysis
Hierarchical Nonhierarchical
Agglomerative Divisive
Ward’s
Method
Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance
Cluster 1 Cluster 2
Average Linkage
Average Distance
Cluster 1 Cluster 2
Hierarchical Agglomerative Clustering-
Variance and Centroid Method
• Variance methods generate clusters to minimize the within-
cluster variance.
Centroid Method
Nonhierarchical Clustering Methods
1 1 1 1
2 2 2 2
3 1 1 1
4 3 3 2
5 2 2 2
6 1 1 1
7 1 1 1
8 1 1 1
9 2 2 2
10 3 3 2
11 2 2 2
12 1 1 1
13 2 2 2
14 3 3 2
15 1 1 1
16 3 3 2
17 1 1 1
18 4 3 2
19 3 3 2
20 2 2 2
Vertical Icicle Plot
Fig. 20.7
Dendrogram
Fig. 20.8
Cluster Centroids
Table 20.3
Means of Variables
Cluster No. V1 V2 V3 V4 V5 V6
ANOVA
Cluster Error
Mean Square df Mean Square df F Sig.
V1 29.108 2 0.608 17 47.888 0.000
V2 13.546 2 0.630 17 21.505 0.000
V3 31.392 2 0.833 17 37.670 0.000
V4 15.713 2 0.728 17 21.585 0.000
V5 22.537 2 0.816 17 27.614 0.000
V6 12.171 2 1.071 17 11.363 0.001
The F tests should be used only for descriptive purposes because the clusters have been
chosen to maximize the differences among cases in different clusters. The observed
significance levels are not corrected for this, and thus cannot be interpreted as tests of the
hypothesis that the cluster means are equal.