Cluster Analysis

© All Rights Reserved

7 views

Cluster Analysis

© All Rights Reserved

- Cluster Analysis With SPSS
- MIMO_MPC
- A Comparative Agglomerative Hierarchical Clustering Method to Cluster Implemented Course
- Ant Colony Wsn
- Hunc
- MC0088
- Measuring Similarity of Malware Behavior
- 36_BaharNamakiAraghi
- umicro
- Lecture 2
- DETECTION AND REFACTORING OF BAD SMELL CAUSED BY LARGE SCALE
- A Localized Algorithm for Structural Health Monitoring Using Wireless
- Outlier Detection using Improved Genetic K-means
- Improvement on LEACH: Seeded LEACH-KED
- Clustering Algorthms
- ADAPTIVE NETWORK BASED FUZZY INFERENCE SYSTEM FOR SPEECH RECOGNITION THROUGH SUBTRACTIVE CLUSTERING
- C18.pdf
- BSSUHCA.pdf
- fgthdtgydrg
- A Clustering-Based Evidence Reasoning Method

You are on page 1of 25

Cluster Analysis

(classification analysis, numerical

taxonomy):

a class of techniques used to classify objects or

cases into relatively homogeneous groups called

clusters based on the set of variables considered.

Objects in each cluster tend to be similar to each

other and dissimilar to objects in the other

clusters.

objects: either variables or observations;

likeness: calculated from the measurements for

each object.

Applications:

1.

segmentation: clustering consumers on the

basis of benefits sought from the purchase of

a product,

2.

clustering consumers to identify

homogeneous groups, a firm can examine the

buying behavior or information seeking

behavior of each group,

3.

clustering brands and products to identify

competitive sets within the market, a firm can

examine its current offerings compared to

those of its competitors to identify potential

new product opportunities,

4.

into homogeneous clusters, a firm can select

comparable cities to test various marketing

strategies.

observations

To measure similarity between two observations a

distance measure is needed

With a single variable, similarity is straightforward

Example: income two individuals are similar if their income level

is similar and the level of dissimilarity increases as the income

gap increases

measure

Many characteristics (e.g. income, age, consumption habits,

brand loyalty, purchase frequency, family composition, education

level, ..), it becomes more difficult to define similarity with a single

value

distance, which is the concept we use in everyday life for

spatial coordinates.

Model:

Data: each object is characterized by a set of

numbers (measurements);

e.g., object 1: (x11, x12, , x1n)

object 2: (x21, x22, , x2n)

:

:

object p: (xp1, xp2, , xpn)

Distance: Euclidean distance, dij,

d ij

i1

x j1 xi 2 x j 2 xin x jn

2

Example

A

B

C

D

Household

Income

50K

50K

20K

20K

Household

Size

5

4

2

1

Size

2

4.24 3 3

A

1

B

3.61 2 2 32

C

D

$

(unit: 10K)

50K

20K

BetweenBetween-Cluster and WithinWithin-Cluster Variation

BetweenBetween-Cluster Variation = Maximize

WithinWithin-Cluster Variation = Minimize

Observations

Frequency of eating out

High

Low

Low

High

Frequency of going to fast food restaurants

Frequency of eating out

High

Low

Low

High

Frequency of going to fast food restaurants

Analysis and Hierarchical Cluster Analysis

Variables

Respondent

Score

7

6

5

4

3

2

1

Respondent A

Respondent B

Respondent C

Respondent D

Clustering procedures

Hierarchical procedures

Agglomerative (start from n clusters to

get to 1 cluster)

Divisive (start from 1 cluster to get to n

clusters)

Non hierarchical procedures

K-means clustering

Hierarchical clustering

Agglomerative:

Each of the n observations constitutes a separate cluster

The two clusters that are more similar according to some distance rule are

aggregated, so that in step 1 there are n-1 clusters

In the second step another cluster is formed (n-2 clusters), by nesting the two

clusters that are more similar, and so on

There is a merging in each step until all observations end up in a single

cluster in the final step.

Divisive

All observations are initially assumed to belong to a single cluster

The most dissimilar observation(s) is extracted to form a separate cluster

In step 1 there will be 2 clusters, in the second step three clusters and so on,

until the final step will produce as many clusters as the number of

observations. This technique is used in medical research and not in the

scope of our course.

algorithms

Non-hierarchical clustering

These algorithms do not follow a hierarchy and produce a

single partition

Knowledge of the number of clusters (c) is required

In the first step, initial cluster centres (the seeds) are

determined for each of the c clusters, either by the

researcher or by the software.

Each iteration allocates observations to each of the c

clusters, based on their distance from the cluster centres

Cluster centres are computed again and observations may

be reallocated to the nearest cluster in the next iteration

When no observations can be reallocated or a stopping rule

is met, the process stops

Algorithms vary according to the way the

distance between two clusters is defined.

The most common algorithm for

hierarchical methods include

centroid method

single linkage method

complete linkage method

average linkage method

Ward algorithm

Linkage methods

Single linkage method (nearest neighbour):

distance between two clusters is the minimum

distance among all possible distances between

observations belonging to the two clusters.

Complete linkage method (furthest neighbour):

nests two cluster using as a basis the maximum

distance between observations belonging to

separate clusters.

Average linkage method: the distance between

two clusters is the average of all distances

between observations in the two clusters.

Ward algorithm

1. The sum of squared distances is computed

within each of the cluster, considering all

distances between observation within the same

cluster

2. The algorithm proceeds by choosing the

aggregation between two clusters which

generates the smallest increase in the total sum

of squared distances.

It is a computationally intensive method,

because at each step all the sum of squared

distances need to be computed, together with all

potential increases in the total sum of squared

distances for each possible aggregation of

clusters.

Non-hierarchical clustering:

K-means method

The number k of clusters is fixed

An initial set of k seeds (aggregation centres) is

provided

First k elements

Given a certain fixed threshold, all units are

assigned to the nearest cluster seed

New seeds are computed

Go back to step 3 until no reclassification is

necessary

Units can be reassigned in successive steps

(optimising partioning)

Hierarchical Methods

Non-hierarchical methods

of clusters

Problems when data contain a

high level of error

Can be very slow, preferable

with small data-sets

At each step they require

computation of the full

proximity matrix

with large data sets

Need to specify the number of

clusters

Need to set the initial seeds

Only cluster distances to seeds

need to be computed in each

iteration

no hard and fast rules,

a.

b.

c.

considerations;

the distances at which clusters are combined

in a hierarchical clustering;

the relative size of the clusters should be

meaningful, etc.

Outlairs

It would affect your cluster solution if you

dont remove it!

It would affect your cluster solution if you

remove it! (small sample size)

variables?

What is the effect of multi-collinearity in

cluster analysis?

measured metrically, but

technique can be applied to

non-metric variables with

caution.

to a single underlying concept

or construct.

Variable Description

Work Environment Measures

X1

I am paid fairly for the work I do.

X2

I am doing the kind of work I want.

X3

My supervisor gives credit and praise for work well done.

X4

There is a lot of cooperation among the members of my work group.

group.

X5

My job allows me to learn new skills.

X6

My supervisor recognizes my potential.

X7

My work gives me a sense of accomplishment.

X8

My immediate work group functions as a team.

X9

My pay reflects the effort I put into doing my work.

X10 My supervisor is friendly and helpful.

X11 The members of my work group have the skills and/or training

to do their job well.

X12 The benefits I receive are reasonable.

Relationship Measures

X13 I have a sense of loyalty to McDonald's restaurant.

X14 I am willing to put in a great deal of effort beyond that

expected to help McDonald's restaurant to be successful.

X15 I am proud to tell others that I work for McDonald's restaurant.

Classification Variables

X16 Intention to Search

X17 Length of Time an Employee

X18 Work Type = PartPart-Time vs. FullFull-Time

X19 Gender

X20 Age

X21 Performance

Type

Metric

Metric

Metric

Metric

Metric

Metric

Metric

Metric

Metric

Metric

Metric

Metric

Metric

Metric

Metric

Metric

Nonmetric

Nonmetric

Nonmetric

Metric

Metric

For this example we are looking for subgroups among all the 63

employees of McDonald's restaurant using the organizational

commitment

commitment variables. The SPSS click through sequence is: Analyze

Classify Hierarchical Cluster. This will take you to a dialog box where

where

you select and move variables X13, X14 and X15 into the Variables

Variables box.

Next you go to the statistics box and agglomeration schedule is selected as

default option. Cluster membership none

none is selected as default. We shall

continue with default option here. Next click on plot

plot box. Check on

dendogram and in Icicle window, click on none button. Then continue.

Next click on the Method box and select Ward

Wards under Cluster Method (it

is the last option). Squared Euclidean Distances is the default under

Measure and we will use it, and we do not need to standardize this

this data.

We will not select anything on the save option now. Now click on OK

OK to

run the program.

coefficients in last two

stages

Identified

Identifiedthe

thenumber

number

ofofclusters

clustersfrom

from

dendogram

dendogram

sequence is: Analyze Classify K-mean cluster.

This will take you to a dialog box where you select

and move variables X13, X14 and X15 into the

Variables

Variables box. In the box number of clusters

clusters put

3 in place of 2. Next you go to the save box and

check on cluster membership. Next click on options.

Uncheck initial cluster option and check ANOVA

table. Now click on OK

OK to run the program.

33

34

35

ANOVA with cluster IDs and

organizational commitment variables.

36

ANOVA

variables into window

Move

cluster ID

variable into

window

2

Click on Options,

check Descriptive,

next Continue,

and then OK

37

2 Cluster ANOVA Results

Three issues to examine: (1) statistical

significance, (2) cluster sample sizes, and

(3) variable means.

Conclusion:

Conclusion:

Cluster 1

More Committed

Cluster 2

Less Committed

38

3 Cluster ANOVA

Must run postpost-hoc

hoc tests

Take 2 cluster ID

variable out and

insert 3 cluster ID

Click on Post

Hoc button and

check Scheffe

39

Conclusions:

Conclusions:

Cluster 1 Least Committed

Cluster 2 Moderately Committed

Cluster 3 Most Committed

Individual cluster sample sizes OK.

Clusters significantly different, but

must examine post hoc tests.

40

3 Cluster ANOVA

41

4 Cluster ANOVA

Click OK

to run

1 Remove 3 cluster ID

variable and insert 4

cluster ID variable

42

4 Cluster ANOVA

Conclusions:

1. Group sample sizes still OK.

2. Clusters are significantly different.

3. Means of four clusters more difficult to

interpret may want to examine polar

extremes

extremes. Most likely approach is combine

clusters 1 and 2 and do a three cluster

solution, or remove groups 1 and 2 and

compare extreme groups (3 & 4).

43

Post Hoc results

1. All clusters are

significantly different.

2. Largest differences

consistently between

clusters 3 and 4.

44

Error Reduction:

Reduction:

1 2 Clusters = 58.4%

2 3 Clusters = 25.5%

3 4 Clusters = 22.8%

4 5 Clusters = 22.2%

Conclusion:

Conclusion: benefit

similar or less after 3

clusters.

1. Examine cluster analysis

Agglomeration Schedule.

2. Consider cluster sample sizes.

3. Consider statistical significance.

4. Evaluate differences in cluster means.

5. Evaluate interpretation &

communication issues.

Error

Coefficients

45

1. Use ANOVA

2. Remove clustering variables from

Dependent List

List window

3. Insert demographic variables

4. Change Factor

Factor variable if necessary

Insert

Demogra

phic

Variables

46

1. Go to Variable View.

2. Click on None beside variable for number

of cluster groups you will examine under

Values column.

3. Assign value labels to each cluster.

4. Run ANOVA on demographics.

Assign value

labels for

clusters

47

Conclusions 3 cluster solution:

Clusters are significantly different.

More committed cluster (must know coding

to interpret) . . .

Less likely to search (lower mean)

Full time employees (code = 0)

Females (code = 1)

High performers (higher mean)

Thank you

- Cluster Analysis With SPSSUploaded byVignesh Anguraj
- MIMO_MPCUploaded bychoc_ngoay1
- A Comparative Agglomerative Hierarchical Clustering Method to Cluster Implemented CourseUploaded byJournal of Computing
- Ant Colony WsnUploaded byafjald
- HuncUploaded bybalaji-reddy-balaji-2837
- MC0088Uploaded byDhananjay Sharma
- Measuring Similarity of Malware BehaviorUploaded byelvictorino
- 36_BaharNamakiAraghiUploaded byshivanandganji
- umicroUploaded byAshish Bhardwaj
- Lecture 2Uploaded byKostis Koutsoukos
- DETECTION AND REFACTORING OF BAD SMELL CAUSED BY LARGE SCALEUploaded byijsea
- A Localized Algorithm for Structural Health Monitoring Using WirelessUploaded byAdam Iskandar
- Outlier Detection using Improved Genetic K-meansUploaded byAhmed Ibrahim Taloba
- Improvement on LEACH: Seeded LEACH-KEDUploaded byseventhsensegroup
- Clustering AlgorthmsUploaded byMohammad Al Rshdan
- ADAPTIVE NETWORK BASED FUZZY INFERENCE SYSTEM FOR SPEECH RECOGNITION THROUGH SUBTRACTIVE CLUSTERINGUploaded byAdam Hansen
- C18.pdfUploaded byRocio Gill
- BSSUHCA.pdfUploaded byFiyansh
- fgthdtgydrgUploaded byIka Yume
- A Clustering-Based Evidence Reasoning MethodUploaded byMia Amalia
- Linkage methods.docxUploaded bysanti
- Chapter 2 SegmentationUploaded byTrixie Ladesma
- Hybrid Functional Networks for Oil Reservoir PVT CharacterisationUploaded byNeil Bryan Closa
- IJET-V3I4P15Uploaded byInternational Journal of Engineering and Techniques
- 16b1516603060_22-01-2018.pdfUploaded byRahul Sharma
- Image SegmentationUploaded byAshish
- Img ProcessUploaded bysivapriya g
- Introduction to R in Business ApplicationsUploaded byPrakhar
- b41de769f124b3c91771167fb7b01bc85559Uploaded byDeepak Jain
- microbial systematicsUploaded byanon_509992240

- Creativity vs. RobotsUploaded byNesta
- UT Dallas Syllabus for mkt6337.501.11f taught by B Murthi (murthi)Uploaded byUT Dallas Provost's Technology Group
- LESSON PLAN-ca2.docxUploaded byradha
- Lec # 16 - Leon WalrasUploaded byRegine Rafer
- Modeling of Two-cell Cores for Three-dimensional Analysis of Multi-story BuildingsUploaded byltgiang
- Al Bielek's Speech at the MUFON Conference - 1990Uploaded byJCM
- Adsb DecodeUploaded byabeh
- Local Axes in ETABS _ the Structural Engineering_ an ArtUploaded byNazmul Hasan
- Discriminant AnalysisUploaded byRishi Shrivastava
- Interior DesignerUploaded byundertaker55
- Texas Hold Em Poker MathematicsUploaded byKrishnendu Rarhi
- Motion GraphsUploaded bynetherworldempire
- SPE 161767 MS Arps TheoryUploaded byelmacanonprieto
- Solid Geometry 00 Smit RichUploaded byManishNishad
- IJASS typeset paperUploaded byallaboutwork
- Lab 2 Density of Water[1]Uploaded byredraideratc
- Du u10 ParallaxUploaded byIvica Fadljević
- Methods Bound Reference Questions Probability: No Answers, only questionsUploaded byYY_L
- Powder CharacterizationUploaded byecternal
- MDU Syllabus (CSE)Uploaded bymesunil
- Sms SpamUploaded bySandip Khade
- 3 Sampling probability non probability.pptUploaded bySonal Burad Runwal
- Changing Improper Fractions to Mixed NumbersUploaded byLhor Mangaporo
- Effects of Height to Diameter Ratio and Aeration Rate on LiquidUploaded byAhmed Ali
- Q1 Educational Measurement and Evaluation.pptUploaded byLaarni Quibuyen
- for iitsUploaded bysauman
- J.heat.Transfer.1978.Vol.100.N1Uploaded bygetsweet
- UNCERTAINTYUploaded byARIF AHAMMED P
- SPice MicroSim Application NotesUploaded byregistro_login
- Codal Provisions for Pile FoundationUploaded byAshutoshWankhade

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.