You are on page 1of 10

Cluster Analysis

1.Single Link Cluster Analysis


2.Ward’s Minimum Sum of Squares
3.k-Means Cluster Analysis
4.SPSS TwoStep Cluster Analysis
Single-Link Clustering
(most popular method)

Cost
(Importance)
.
Left Right
C
Single Link: Join item
to cluster which has the
. .A
B
.
single closest member. 

Since B<, join the star


to the Left cluster, even
though A> and C>

Complete Pain Relief


(Importance)
Cluster Analysis
Single Chain Agglomerative Procedure
(most popular method)

Part-Worth Coefficients of “Complete Pain Relief”

Therapy Therapy Therapies Therapy


A B CD E
2 5 9 10 15
Single Link: Join item to cluster which has the single closest member.

First Stage: A= 2 B=5 C=9 D=10 E=15


Second Stage: AB= 3 BD=5
(Euclidian Distance) AC=6 BE=10
AD=8 CD= 1
AE=13 CE=6
BC= 4 DE=5
Third Stage: CDA=7 CDB=4 CDE=5 AB= 3
AE =13 BE =10
Fourth Stage: ABCD=4 ABE=10 CDE=5
Fifth Stage: ABCDE=5
Single Chain Agglomerative
Clustering Output: Dendogram
5

A B C D E
Ward’s Clustering
Strength
(Importance)
.
Left D Right

Ward’s Cluster: Join item


to cluster which has the
. .
C
B A
.
smallest distance
ESS.
= mean location of points
In this case, if star is in proposed cluster
joined to left cluster,
ESS=A2+B2+C2+D2

Water Resistance (Importance)


Ward’s Minimum Variance
Agglomerative Clustering Procedure
First Stage: A= 2 B=5 C=9 D=10 E=15
Second Stage: AB= 4.5 BD=12.5
AC=24.5 BE=50.0
AD=32.0 CD= 0.5
AE=84.5 CE=18.0
BC= 8.0 DE=12.5
Third Stage: CDA=38.0 CDB=14 CDE=20.66 AB= 5.0
AE =85 BE =50.5
Fourth Stage: ABCD=41.0 ABE=93.17 CDE=25.18
Fifth Stage: ABCDE=98.8
Ward’s Minimum Variance
Agglomerative Clustering Output
98.8

25.18

0.5

A B C D E
k-Means Clustering
1. Begin with two starting center points and
allocate each item to nearest cluster center.

2. Recalculate center of clusters. Stop if center


hasn’t changed.

3. Allocate items to nearest cluster center. Goto 2.


k-Means Clustering
1 4
A
A
B B

2 5
A
A B
B

A
B
SPSS TwoStep Cluster Method
-scalable cluster analysis algorithm designed to handle
very large data sets.

-can handle both continuous and categorical variables or attributes.

-automatically select the number of clusters.

Step 1: pre-cluster the cases (or records) into many


small sub-clusters;

Step 2: cluster the sub-clusters resulting from pre-cluster


step into the desired number of clusters.

You might also like