You are on page 1of 12

bow83755_app H_001-012.

qxd

23/5/08

4:52 PM

Page H1

Appendix H

bow83755_app H_001-012.qxd

H2

23/5/08

4:52 PM

Appendix H

Page H2

Factor Analysis, Cluster Analysis, and Multidimensional Scaling

APPENDIX H: Factor Analysis, Cluster Analysis,


and Multidimensional Scaling1
In the following two exercises, we illustrate factor analysis, cluster analysis, and multidimensional scaling.

H.1 Factor Analysis: An Application of Correlation


A personnel officer interviewed and rated 48 job applicants on the following 15 variables.
1
2
3
4
5

Form of application letter


Appearance
Academic ability
Likeability
Self-confidence

Lucidity
Honesty
8 Salesmanship
9 Experience
10 Drive
6

11

12
13
14
15

Ambition
Grasp
Potential
Keenness to join
Suitability

In order to better understand the relationships between the 15 variables, the personnel officer will
use a technique called factor analysis. The first step in factor analysis is to standardize each variable. A variable is standardized by calculating the mean and the standard deviation of the 48 values
of the variable and then subtracting from each value the mean and dividing the resulting difference
by the standard deviation. The variance of the values of each standardized variable can be shown
to be equal to 1, and the pairwise correlations between the standardized variables can be shown to
be equal to the pairwise correlations between the original variables. Although we will not give the
48 values of each of the 15 variables (see Kendall (1980)), we present in Table H.1 a matrix containing the pairwise correlations of these variables. Considering the matrix, we note that there are
so many fairly large pairwise correlations that it is difficult to understand the relationships between
the 15 variables. When we use factor analysis, we determine whether there are uncorrelated
factors, fewer in number than 15, that (1) explain a large percentage of the total variation in the 15
variables and (2) help us to understand the relationships between the 15 variables.
To find the desired factors, we first find what are called principal components. The first
principal component is the composite of the 15 standardized variables that explains the highest percentage of the total of the variances of these variables. The SPSS output in Figure H.1 tells
us that the first principal component is
y(1)  .44676x1  .58285x2  .10900x3      .64584x15
where x1, x2, . . . , x15 denote the 15 standardized variables. Here, the coefficient multiplied by
each xi is called the factor loading of y(1) on xi and can be shown to equal the pairwise correlation between y(1) and xi. For example, the factor loading .58285 says that the pairwise correlation between y(1) and x2 is .58285. The SPSS output also tells us that the variance (or eigenvalue)
of the 48 values of y(1) is 7.50395. Furthermore, since the sum of the variances of the 15 standardized variables is 15, the SPSS output tells us that the variance of y(1) explains
(7.5039515)100%  50% of the total variation in the standardized variables. Similarly, the
SPSS output shows the second principal component, which has a variance of 2.06148 and explains (2.0614815)100%  13.7% of the total variation in the standardized variables. In all,
there are 15 principal components that are uncorrelated with each other and explain a cumulative
percentage of 100 percent of the total variation in the 15 variables. Also, note that the variance
of a particular principal component can be shown to equal the sum of the squared pairwise correlations between the principal component and the 15 standardized variables. For example, examining the first column of pairwise correlations in the upper portion of Figure H.1, it follows
that the variance of the first principal component is
(.44676)2  (.58285)2      (.64584)2  7.50395
Although the SPSS output shows the percentage of the total variation explained by each of
the 15 principal components, it only shows 7 of these principal components. The reason is that,
1

Some of the discussion and three examples in this appendix are based on Chapters 15 and 16 in Intermediate Statistical
Methods, A Computer Package Approach (Prentice Hall, 1983) by Mark L. Berenson, David M. Levine, and Mathew Goldstein.

bow83755_app H_001-012.qxd

23/5/08

4:52 PM

Page H3

H3

Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling


TA B L E

H.1

Variable
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

A Matrix of Pairwise Correlations for the Applicant Data


1

10

11

12

13

14

15

1.00

.24
1.00

.04
.12
1.00

.31
.38
.00
1.00

.09
.43
.00
.30
1.00

.23
.37
.08
.48
.81
1.00

.11
.35
.03
.65
.41
.36
1.00

.27
.48
.05
.35
.82
.83
.23
1.00

.55
.14
.27
.14
.02
.15
.16
.23
1.00

.35
.34
.09
.39
.70
.70
.28
.81
.34
1.00

.28
.55
.04
.35
.84
.76
.21
.86
.20
.78
1.00

.34
.51
.20
.50
.72
.88
.39
.77
.30
.71
.78
1.00

.37
.51
.29
.61
.67
.78
.42
.73
.35
.79
.77
.88
1.00

.47
.28
.32
.69
.48
.53
.45
.55
.21
.61
.55
.55
.54
1.00

.59
.38
.14
.33
.25
.42
.00
.55
.69
.62
.43
.53
.57
.40
1.00

Source: Reproduced by permission of the Publishers, Charles Griffin & Company Ltd., of London and High Wycombe, from Kendall, Multivariate
Analysis, 2nd. (1980).

since we wish to obtain final factors that are fewer in number than the number of original variables, we have instructed SPSS to retain 7 principal components for further study. The choice
of 7 principal components, while somewhat arbitrary, is based on the belief that 7 principal components will explain a high percentage of the total variation in the 15 variables. The SPSS output
FIGURE

H.1

SPSS Output of a Factor Analysis of the Applicant Data (7 Factors Used)

FACTOR MATRIX USING PRINCIPAL FACTOR, NO ITERATIONS


FACTOR 1a

FACTOR 2b

FACTOR 3

FACTOR 4

FACTOR 5

FACTOR 6

FACTOR 7

x1

0.44676

0.61880

0.37635

-0.12148

0.10168

0.42496

0.08504

7.50395a

50.0

x2

0.58285

-0.05019

-0.01995

0.28167

0.75188

-0.03325

0.00345

2.06148b

13.7

63.8

x3

0.10900

0.33907

-0.49450

0.71393

-0.18095

0.16113

0.18206

1.46768

9.8

73.6

x4

0.61698

-0.18150

0.57968

0.35707

-0.09904

0.07837

-0.05714

1.20910

8.1

81.6

x5

0.79807

-0.35611

-0.29930

-0.17939

0.00025

0.00377

0.06620

0.74143

4.9

86.6

x6

0.86688

-0.18544

-0.18414

-0.06923

-0.17813

0.11744

-0.30132

0.48402

3.2

89.8

x7

0.43330

-0.58195

0.36036

0.44570

-0.06052

-0.21591

0.06539

0.34408

2.3

92.1

x8

0.88244

-0.05647

-0.24821

-0.22786

0.02960

-0.06262

0.00981

0.31027

2.1

94.1

x9

0.36549

0.79438

0.09258

0.07431

-0.08999

-0.25962

-0.06758

0.25965

1.7

95.9

x10

0.86261

0.06908

-0.09993

-0.16645

-0.17554

-0.17549

0.29665

10

0.20575

1.4

97.2

x11

0.87185

-0.09840

-0.25565

-0.20948

0.13698

0.07573

0.12514

11

0.15093

1.0

98.3

x12

0.90776

-0.03023

-0.13453

0.09726

-0.06359

0.10194

-0.24685

12

0.09327

0.6

98.9

x13

0.91310

0.03250

-0.07327

0.21842

-0.10489

0.04666

-0.00366

13

0.07628

0.5

99.4

x14

0.71033

-0.11478

0.55801

-0.23496

-0.10071

0.05911

0.14353

14

0.05766

0.4

99.8

x15

0.64584

0.60374

0.10687

-0.02889

0.06431

-0.29308

-0.10537

15

0.03441

0.2

100.0

FACTOR

EIGENVALUE

PCT OF VAR

VARIMAX ROTATED FACTOR MATRIX


FACTOR 1

FACTOR 2

FACTOR 3

x1

0.12359

0.04204

0.42738

-0.00497

FACTOR 4

FACTOR 5

FACTOR 6

FACTOR 7

VARIABLE

COMMUNALITY

0.85336

0.09437

0.01521

x1

0.93708

x2

0.32636

0.21176

0.11729

0.05621

x3

0.05396

-0.02816

0.13368

0.97451 -0.01201

x4

0.22106

0.85846

x5

0.91144

0.15413

x6

0.87938

0.25709

x7

0.20161

0.87606

x8

0.90070

0.07788

x9

0.06497

x10

0.79694

0.20942

x11

0.89427

0.06033

x12

0.79690

x13

0.07715

0.90226

0.01101

x2

0.98841

0.03936

0.00014

x3

0.97293

0.13049

-0.01215

0.26494

0.09997

0.11479

x4

0.89636

-0.08310

-0.04208

-0.05072

0.13904

-0.06943

x5

0.88989
0.96088

0.10119

0.01702

0.05912

-0.00285

0.32778

x6

-0.13423

0.00066

-0.22952

0.16057

-0.06982

x7

0.90948

0.21967

-0.05953

0.05564

0.16142

-0.04510

x8

0.90031

0.88690

0.16270

0.20105

-0.01159

-0.00158

x9

0.85878

0.35909

0.02333

0.10603

-0.02550

-0.34034

x10

0.93618

0.08680

-0.01813

0.16585

0.26018

-0.11304

x11

0.91921

0.30629

0.23598

0.14095

0.12915

0.14954

0.29053

x12

0.92786

0.73031

0.40428

0.29019

0.26489

0.16244

0.14080

0.06079

x13

0.90108

x14

0.45932

0.56662

0.16988

-0.38607

0.42522

-0.03753

-0.16248

x14

0.91856

x15

0.33966

0.07614

0.84300 -0.01002

0.18417

0.17055

0.00713

x15

0.89497

-0.03039

First principal component has variance 7.50395.


Second principal component has variance 2.06148.

CUM PCT
50.0

bow83755_app H_001-012.qxd

H4

23/5/08

4:52 PM

Appendix H

Page H4

Factor Analysis, Cluster Analysis, and Multidimensional Scaling

tells us that this choice is reasonablethe first 7 principal components explain 92.1 percent of
the total variation in the 15 variables. The reason that we need to further study the 7 principal
components is that, in general, principal components tend to be correlated with many of the factors (see the factor loadings on the SPSS output) and thus tend to be difficult to interpret in a
meaningful way. For this reason, we rotate the 7 principal components by using VARIMAX rotation. This technique attempts to find final uncorrelated factors each of which loads highly on
(that is, is strongly correlated with) a limited number of the 15 original standardized variables
and loads as low as possible on the rest of the standardized variables. The SPSS output shows
the results of the VARIMAX rotation. Examining the check marks that we have placed on the
output, we see that Factor 1 loads heavily on variables 5 (self-confidence), 6 (lucidity), 8 (salesmanship), 10 (drive), 11 (ambition), 12 (grasp) and 13 (potential). Therefore, Factor 1 might be
interpreted as an extroverted personality dimension. Factor 2 loads heavily on variables 4 (likeability) and 7 (honesty). Therefore, Factor 2 might be interpreted as an agreeable personality
dimension. Similarly, Factors 3 through 7 might be interpreted as the following dimensions:
Factor 3: experience; Factor 4: academic ability; Factor 5: form of application letter;
Factor 6: appearance; Factor 7: no discernible dimension. Note that, although variable
14 (keenness to join) does not load heavily on any factor, its correlation of .56662 with Factor 2
(agreeable personality) might mean that it should be interpreted to be part of the agreeable personality dimension.
We next note that the communality to the right of each variable in Figure H.1 is the percentage of the variance of the variable that is explained by the 7 factors. The communality for
each variable can be shown to equal the sum of the squared pairwise correlations between the
variable and the 7 factors. For example, examining the first row of pairwise correlations in the
lower portion of Figure H.1, it follows that the communality for factor 1 is
(.12359)2  (.04204)2      (.01521)2  .93708
All of the communalities in Figure H.1 seem high. However, some statisticians might say that
we have retained too many factors. To understand this, note that the upper portion of Figure H.1
tells us that the sum of the variances of the first seven factors is
7.50395  2.06148  1.46768  1.20910  .74143  .48402  .34408  13.81174
This variance is (13.8117415)100%  92.1% of the sum of the variances of the 15 standardized variables. Some statisticians would suggest that we retain a factor only if its variance exceeds 1, the variance of each standardized variable. If we do this, we would retain 4
factors, since the variance of the fourth factor is 1.20910 and the variance of the fifth factor
is .74143. Figure H.2 gives the SAS output obtained by using 4 factors. Examining the check
marks that we have placed on the output, we see that Factors 1 through 4 might be interpreted
as follows: Factor 1: extroverted personality; Factor 2: experience; Factor 3: agreeable
personality; Factor 4: academic ability. Variable 2 (appearance) does not load heavily on
any factor and thus is its own factor, as Factor 6 on the SPSS output in Figure H.1 indicated is true. Variable 1 (form of application letter) loads heavily on Factor 2 (experience).
In summary, there is not much difference between the 7 factor and 4 factor solutions. We
might therefore conclude that the 15 variables can be reduced to the following 5 uncorrelated
factors: extroverted personality, experience, agreeable personality, academic ability,
and appearance.
a In Applied Multivariate Techniques (John Wiley and Sons, 1996), Subhash Sharma considers a study
in which 143 respondents rated three brands of laundry detergents on 12 product attributes using a
5-point Likert scale. The 12 product attributes are:
V1: Gentle to natural fabrics

V7: Makes colors bright

V2: Wont harm colors

V8: Removes grease stains

V3: Wont harm synthetics


V4: Safe for lingerie

V9: Good for greasy oil


V10: Pleasant fragrance

V5: Strong, powerful

V11: Removes collar soil

V6: Gets dirt out

V12: Removes stubborn stains

bow83755_app H_001-012.qxd

23/5/08

8:29 PM

Page H5

Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling


FIGURE

H.2

H5

SAS Output of a Factor Analysis of the Applicant Data (4 Factors Used)

PRINCIPAL AXIS
PRIOR ESTIMATES OF COMMUNALITY
X1

X2

X3

X4

X5

X6

X7

X8

1.000000

1.000000

1.000000

1.000000

1.000000

1.000000

1.000000

1.000000

X9

XA

XB

XC

XD

XE

XF

1.000000

1.000000

1.000000

1.000000

1.000000

1.000000

1.000000

7.503986

2.061498

1.467686

1.209097

0.741423

0.484018

0.344075

0.310272

PORTION

0.500

0.137

0.098

0.081

0.049

0.032

0.023

0.021

CUM PORTION

0.500

0.638

0.736

0.816

0.866

0.898

0.921

0.941

10

11

12

13

14

15

EIGENVALUES

EIGENVALUES

0.259652

0.205746

0.150932

0.093269

0.076283

0.057655

0.034407

PORTION

0.017

0.014

0.010

0.006

0.005

0.004

0.002

CUM PORTION

0.959

0.972

0.983

0.989

0.994

0.998

1.000

4 FACTORS WILL BE RETAINED.


FACTOR PATTERN
FACTOR1

FACTOR2

FACTOR3

FACTOR4

X1

0.44676

0.61880

0.37635

-0.12148

X2

0.58285

-0.05019

-0.01995

0.28166

FORM OF APPLICATION LETTER


APPEARANCE

X3

0.10900

0.33907

-0.49449

0.71391

ACADEMIC ABILITY

X4

0.61699

-0.18149

0.57967

0.35706

X5

0.79807

-0.35610

-0.29930

-0.17939

SELF CONFIDENCE

LIKEABILITY

X6

0.86688

-0.18543

-0.18414

-0.06923

LUCIDITY

X7

0.43330

-0.58195

0.36035

0.44569

X8

0.88244

-0.05647

-0.24821

-0.22786

X9

0.36549

0.79437

0.09258

0.07431

XA

0.86261

0.06908

-0.09993

-0.16645

DRIVE

XB

0.87186

-0.09840

-0.25564

-0.20948

AMBITION

XC

0.90776

-0.03023

-0.13453

0.09726

XD

0.91310

0.03250

-0.07327

0.21842

XE

0.71033

-0.11478

0.55800

-0.23495

KEENNESS TO JOIN

XF

0.64584

0.60373

0.10687

-0.02889

SUITABILITY

HONESTY
SALESMANSHIP
EXPERIENCE

GRASP
POTENTIAL

VARIMAX
ROTATED FACTOR PATTERN
FACTOR1

FACTOR2

FACTOR3

FACTOR4

X1

0.11447

0.83336

0.11063

-0.13808

X2

0.43964

0.14979

0.39417

0.22555

APPEARANCE

X3

0.06115

0.12744

0.00557

0.92792

ACADEMIC ABILITY

X4

0.21559

0.87360

-0.08137

LIKEABILITY

X5

0.91896

-0.10368

0.16241

-0.06219

SELF CONFIDENCE

X6

0.86439

0.10195

0.25878

0.00642

LUCIDITY

X7

0.21715

0.86440

0.00341

HONESTY

0.24667

-0.24607

0.08773

-0.04938

-0.05537

0.21919

FORM OF APPLICATION LETTER

X8

0.91799

0.20635

X9

0.08530

0.84871

SALESMANSHIP

XA

0.79576

0.35407

0.15950

-0.05026

DRIVE

XB

0.91641

0.16268

0.10496

-0.04184

AMBITION

XC

0.80415

0.25872

0.34049

0.15153

GRASP

XD

0.73917

0.32885

0.42493

0.22980

POTENTIAL

XE

0.43597

0.36420

0.54105

-0.51862

XF

0.37950

0.79807

0.07847

0.08221

EXPERIENCE

KEENNESS TO JOIN
SUITABILITY

VARIANCE EXPLAINED BY EACH FACTOR


FACTOR1

FACTOR2

FACTOR3

FACTOR4

5.745474

2.735065

2.413961

1.347767

Table H.2 is a matrix containing the pairwise correlations between the variables, and Figure H.3 is the
SPSS output of a factor analysis of the detergent data. Why did the analyst choose to retain two
Good service
factors? Discuss why Factor 1 can be interpreted to be the ability of the detergent to clean clothes.
friendly; Price level;
Discuss why Factor 2 can be interpreted to be the mildness of the detergent.
Attractiveness;
b Table H.3 shows the output of a factor analysis of the ratings of 82 respondents who were asked to
Spaciousness; Size
evaluate a particular discount store on 29 attributes using a 7-point Likert scale. Interpret and give
names to the five factors.

bow83755_app H_001-012.qxd

23/5/08

H6

4:52 PM

Appendix H

FIGURE

H.3

Page H6

Factor Analysis, Cluster Analysis, and Multidimensional Scaling

SPSS Output of a Factor Analysis of the Detergent Data2

INITIAL STATISTICS:
VARIABLE

ROTATED FACTOR MATRIX:

COMMUNALITY

V1

.42052

6.30111

52.5

52.5

VI

.12289

V2

.39947

1.81757

15.1

67.7

V2

.13900

.64781

V3

.56533

.66416

5.5

73.2

V3

.24971

.78587

V4

.56605

.57155

4.8

78.0

V4

.29387

.74118

V5

.60467

.55995

4.7

82.6

V5

.73261

.15469

V6

.57927

.44517

3.7

86.3

V6

.73241

.20401

V7

.69711

.41667

3.5

89.8

V7

.77455

.22464

V8

.74574

.32554

2.7

92.5

V8

.85701

.20629

V9

.66607

.27189

2.3

94.8

V9

.80879

.19538

V10

.59287

10

.25690

2.1

96.9

V10

.69326

.23923

V11

.71281

11

.19159

1.6

98.5

V11

.77604

.25024

V12

.64409

12

.17789

1.5

100.0

V12

.79240

.19822

TA B L E

V1
V2
V3
V4
V5
V6
V7
V8
V9
V10
V11
V12

H.2

FACTOR

EIGENVALUE

PCT OF VAR

CUM PCT

FACTOR 1

FACTOR 2
.65101

Correlation Matrix for Detergent Data2

V1

V2

V3

V4

V5

V6

V7

V8

V9

V10

V11

V12

1.00000
0.41901
0.51840
0.56641
0.18122
0.17454
0.23034
0.30647
0.24051
0.21192
0.27443
0.20694

0.41901
1.00000
0.57599
0.49886
0.18666
0.24648
0.22907
0.22526
0.21967
0.25879
0.32132
0.25853

0.51840
0.57599
1.00000
0.64325
0.29080
0.34428
0.41083
0.34028
0.32854
0.38828
0.39433
0.36712

0.56641
0.49886
0.64325
1.00000
0.38360
0.39637
0.37699
0.40391
0.42337
0.36564
0.33691
0.36734

0.18122
0.18666
0.29080
0.38360
1.00000
0.57915
0.59400
0.67623
0.69269
0.43873
0.55485
0.65261

0.17454
0.24648
0.34428
0.39637
0.57915
1.00000
0.57756
0.70103
0.62280
0.62174
0.59855
0.57845

0.23034
0.22907
0.41083
0.37699
0.59400
0.57756
1.00000
0.67682
0.68445
0.54175
0.78361
0.63889

0.30647
0.22526
0.34028
0.40391
0.67623
0.70103
0.67682
1.00000
0.69813
0.68589
0.71115
0.71891

0.24051
0.21967
0.32854
0.42337
0.69269
0.62280
0.68445
0.69813
1.00000
0.58579
0.64637
0.69111

0.21192
0.25879
0.38828
0.36564
0.43873
0.62174
0.54175
0.68589
0.58579
1.00000
0.62250
0.63494

0.27443
0.32132
0.39433
0.33691
0.55485
0.59855
0.78361
0.71115
0.64637
0.62250
1.00000
0.63973

0.20694
0.25853
0.36712
0.36734
0.65261
0.57845
0.63889
0.71891
0.69111
0.63494
0.63973
1.00000

The source of Table H.2 and Figure H.3 is Applied Multivariate Techniques by Subhash Sharma, John Wiley and Sons, Inc., New York, 1996.

H.2 Cluster Analysis and Multidimensional Scaling


Professional baseball and tennis were less popular in 2000 than in the late 1970s and early
1980s. To see why this might be true, we consider a study by Levine (1977) concerning the
perceptions of various sports in 1977. Levine had 45 undergraduate students give each of boxing (BX), basketball (BK), golf (G), swimming (SW), skiing (SK), baseball (BB), ping pong
(PP), hockey (HK), handball (H), track and field (TF), bowling (BW), tennis (T), and football
(F) an integer rating of 1 to 7 on six scales: fast moving (1) versus slow moving (7); complicated
rules (1) versus simple rules (7); team oriented (1) versus individual (7); easy to play (1) versus
hard to play (7); noncontact (1) versus contact (7); competition against opponent (1) versus competition against standard (7). The first two rows of Table H.4 present a particular undergraduates
ratings of boxing and basketball on each of the six scales, and Table H.5 presents the average
rating by all 45 undergraduates of each sport on each of the six scales.
To better understand the perceptions of the 13 sports, we will cluster them into groups. The first
step in doing this is to consider the distance between each pair of sports for each undergraduate. For
example, to calculate the distance between boxing and basketball for the undergraduate whose ratings are given in Table H.4, we calculate the paired difference between the ratings on each of the six
scales, square each paired difference, sum the six squared paired differences, and find the square root
of this sum. The resulting distance is 5.9161. A distance for each undergraduate for each pair of
sports can be found, and then an average distance over the 45 undergraduates for each pair of sports
can be calculated. Statistical software packages do this, but these packages sometimes standardize
the individual ratings before calculating the distances. We will not discuss the various ways in which

bow83755_app H_001-012.qxd

23/5/08

4:52 PM

Page H7

Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling


TA B L E

H7

Factor Analysis of the Discount Store Data3

H.3

Factor
Scale

1. Good service
2. Helpful salespersons
3. Friendly personnel
4. Clean
5. Pleasant store to shop in
6. Easy to return purchases
7. Too many clerks
8. Attracts upper-class customers
9. Convenient location
10. High quality products
11. Good buys on products
12. Low prices
13. Good specials
14. Good sales on products
15. Reasonable value for price
16. Good store
17. Low pressure salespersons
18. Bright store
19. Attractive store
20. Good displays
21. Unlimited selections of products
22. Spacious shopping
23. Easy to find items you want
24. Well-organized layout
25. Well-spaced merchandise
26. Neat
27. Big store
28. Ads frequently seen by you
29. Fast checkout
Percentage of variance explained
Cumulative variance explained

.79
.75
.74
.59
.58
.56
.53
.46
.36
.34
.02
.03
.35
.30
.17
.41
.20
.02
.19
.33
.09
.00
.36
.02
.20
.38
.20
.03
.30
16
16

II

III

.15
.03
.07
.31
.15
.23
.00
.06
.30
.27
.88
.74
.67
.67
.52
.47
.30
.10
.03
.15
.00
.20
.16
.05
.15
.12
.15
.20
.16
12
28

.06
.04
.17
.34
.48
.13
.02
.25
.02
.31
.09
.14
.05
.01
.11
.47
.28
.75
.67
.61
.29
.00
.10
.25
.27
.45
.06
.07
.00
9
37

IV

.12
.13
.09
.15
.26
.03
.23
.00
.19
.12
.10
.00
.10
.08
.02
.12
.03
.26
.34
.15
.03
.70
.57
.54
.52
.49
.07
.09
.25
8
45

.07
.31
.14
.25
.10
.03
.37
.17
.03
.25
.03
.13
.14
.16
.03
.11
.05
.05
.24
.20
.00
.10
.01
.17
.16
.34
.65
.42
.33
5
50

Communality
.67
.68
.61
.65
.67
.39
.47
.31
.26
.36
.79
.59
.60
.57
.36
.63
.18
.61
.66
.57
.09
.54
.49
.39
.43
.72
.49
.23
.28

The source of Table H.3 is Marketing Research, Sixth Edition by David A. Aaker, V. Kumar, and George S. Dax, John Wiley and
Sons, Inc., New York, 1998.

TA B L E

H.4

Sport
Boxing
Basketball
Paired
Difference

A Particular Undergraduates Ratings of Boxing and Basketball


(1) Fast
(1) Easy
(1) Comp
Mvg.
to Play
Opp.
(7) Slow (1) Compl. (1) Team (7) Hard
(1) Ncon. (7) Comp
Mvg. (7) Simple (7) Indv.
to Play (7) Con.
Std.
3
2

5
3

7
2

4
4

6
4

1
2

1

Distance
 2(1)2  (2)2  (5)2  (0)2  (2)2  (1)2
 235  5.9161

such standardization can be done. Rather, we note that Table H.6 presents a matrix containing the
average distance over the 45 undergraduates for each pair of sports, and we note that this matrix has
been obtained by using a software package that uses a standardization procedure. There are many
different approaches to using the average distances to cluster the sports. We will discuss one approachthe hierarchical, complete linkage approach. Hierarchical clustering implies that, once
two sports are clustered together at a particular stage, they are considered to be permanently joined
and cannot be separated into different clusters at a later stage. Complete linkage bases the merger
of two clusters of sports (either cluster of which can be an individual sport) on the maximum distance between sports in the clusters. For example, since Table H.6 shows that the smallest average
distance is the average distance between football and hockey, which is 2.20, football and hockey are

bow83755_app H_001-012.qxd

23/5/08

H8

4:52 PM

Appendix H
TA B L E

TA B L E

H.6

Page H8

Factor Analysis, Cluster Analysis, and Multidimensional Scaling

H.5

Average Rating of Each Sport on Each of the Six Scales

Sport

(1) Fast
Mvg.
(7) Slow
Mvg.

(1) Compl.
(7) Simple

Boxing
Basketball
Golf
Swimming
Skiing
Baseball
Ping-Pong
Hockey
Handball
Track & field
Bowling
Tennis
Football

3.07
1.84
6.13
2.87
2.13
4.78
3.18
1.71
2.53
2.82
5.07
2.89
2.42

4.62
3.78
4.49
5.02
4.60
4.18
5.13
3.22
4.67
4.38
5.16
3.78
2.76

(1) Team
(7) Indv.

(1) Easy
to Play
(7) Hard
to Play

(1) Ncon.
(7) Con.

(1) Comp
Opp.
(7) Comp
Std.

6.62
1.56
6.58
5.29
5.96
2.16
5.38
1.82
4.78
4.47
5.40
5.47
1.44

4.78
3.82
3.84
3.64
5.22
3.33
2.91
5.04
3.71
3.84
3.11
4.09
5.00

6.02
4.89
1.82
2.22
2.51
3.60
2.04
5.96
2.78
2.89
1.60
2.16
6.47

1.73
2.27
4.11
4.36
4.71
2.67
2.20
2.49
2.31
3.82
3.73
2.42
2.33

A Matrix Containing the Average Distances

Sport

BX

BK

SK

SW

BB

PP

HK

TF

BW

BK
G
SK
SW
BB
PP
HK
H
TF
BW
T
F

3.85
4.33
3.80
3.81
4.12
3.74
3.85
3.41
3.81
4.07
3.49
3.86

4.88
4.05
3.81
3.15
3.56
2.58
3.24
3.36 3
4.23
3.32
2.51

3.73
3.56
3.83
3.61
5.11
3.92
3.88
2.72
3.59
5.15

2.84
4.16
3.67
4.02
3.25
3.20
3.75
3.19
4.38

3.60
2.72
4.17
2.80
2.84
2.89
2.82
4.41

3.41
3.49
3.34
3.37
3.32
3.25
3.43

4.27
2.58
3.06
2.87 4
2.54
4.35

3.52
3.72
4.58
3.58
2.20 1

2.75
3.13
2.33 2
3.68

3.26
2.72
3.84

2.85
4.67

3.69

Source of Tables H.4, H.5, and H.6 and of Figures H.4 and H.5: D. M. Levine, Nonmetric Multidimensional Scaling and Hierarchical Clustering:
Procedures for the Investigation of the Perception of Sports, Research Quarterly, Vol. 48 (1977), pp. 341348.

clustered together in the first stage of clustering (see the tree diagram in Figure H.4). Since the second smallest average distance is the average distance between tennis and handball, which is 2.33,
tennis and handball are clustered together in the second stage of clustering. The third smallest average distance is the average distance between football and basketball, which is 2.51, but football has
already been clustered with hockey. The average distance between hockey and basketball is 2.58,
and so the average distance between basketball and the cluster containing football and hockey is
2.58the maximum of 2.51 and 2.58. This average distance is equal to the average distance between ping pong and the cluster containing tennis and handball, which (as shown in Table H.6) is
the maximum of 2.54 and 2.58that is, 2.58. There is no other average distance as small as 2.58.
Furthermore, note that the distance between basketball and football is 2.51, whereas the distance
between ping pong and tennis is 2.54. Therefore, we will break the tie between the two average
distances of 2.58 by adding basketball to the cluster containing football and hockey in the third stage
of clustering. Then, we add ping pong to the cluster containing tennis and handball in the fourth
stage of clustering. Figure H.4 shows the results of all 12 stages of clustering.
At the end of seven stages of clustering, six clusters have been formed. They are:
Cluster 1:
Cluster 2:
Cluster 3:

Boxing
Skiing
Swimming, Ping Pong, Handball,
Tennis, Track and Field

Cluster 4: Golf, Bowling


Cluster 5: Basketball, Hockey, Football
Cluster 6: Baseball

bow83755_app H_001-012.qxd

23/5/08

4:52 PM

Page H9

Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling


FIGURE

H.4

Boxing
Skiing
Swimming
Ping Pong
Handball
Tennis
Track & Field
Golf
Bowling
Basketball
Hockey
Football
Baseball

A Tree Diagram Showing Clustering of the 13 Sports

10
6
4

2
7

11

3
1

12
8

Distance

In Figure H.5 we present a two dimensional graph in which we place ovals around these six clusters. This graph is the result of a procedure called multidimensional scaling. To understand this
procedure, note that, since each sport is represented by six ratings, each sport exists geometrically
as a point in six dimensional space. Multidimensional scaling uses the relative average distances
between the sports in the six dimensional space (that is, the relationships between the average distances in Table H.6) and attempts to find points in a lesser dimensional space that approximately
have the same relative average distances between them. In this example we illustrate mapping the
six dimensional space into a two dimensional space, because a two dimensional space allows us to
most easily interpret the results of multidimensional scalingthat is, to study the location of the
sports relative to each other and thereby determine the overall factors or dimensions that appear to
separate the sports. Figure H.5 gives the output of multidimensional scaling that is given by a standard statistical software system (we will not discuss the numerical procedure used to actually carry
out the multidimensional scaling). By comparing the sports near the top of Axis II with the sports
near the bottom, and by using the average ratings in Table H.5, we see that Axis II probably represents the factor team versus individual. By comparing the sports on the left of Axis I with the
sports on the right, and by using the average ratings in Table H.5, we see that Axis I probably represents the factor degree of action, which combines contact/noncontact aspects with fast moving/slow moving aspects. Also, note that the two clusters that have been formed at the end of 11
stages of clustering in Figure H.4 support the existence of the team versus individual factor.
FIGURE

H.5

Multidimensional Scaling of
the 13 Sports
Axis II

F
HK
BK
BB

TF
H T
BX

PP
SW

Axis I
BW
G

SK

H9

bow83755_app H_001-012.qxd

H10

23/5/08

4:52 PM

Appendix H
TA B L E

Page H10

Factor Analysis, Cluster Analysis, and Multidimensional Scaling

H.7

Average Ratings of the Food Types on Three Scales

Food

Spicy/Bland

Heavy/Light

High/Low Calories

Japanese (JPN)
Cantonese (CNT)
Szechuan (SCH)
French (FR)
Mexican (MEX)
Mandarin (MAN)
American (AMR)
Spanish (SPN)
Italian (ITL)
Greek (GRK)

2.8
2.6
6.6
3.5
6.4
3.4
2.3
4.7
4.6
5.3

3.2
5.3
3.6
4.5
4.3
4.1
5.8
5.4
6.0
4.7

3.4
5.4
3.0
5.1
4.3
4.2
5.7
4.9
6.2
6.0

Although the perception of sports in 1977 relate to sports in general (and not to just professional sports), and although these perceptions do not directly relate to the popularity of sports,
note that high action, team oriented sports (football and basketball) tended to be popular in
2000. Considering the Axis II factor (team versus individual), it might be that high baseball
player salaries, free agency, frequent player moves, and the inability of small market teams to
compete made baseball seem less team oriented to fans in 2000. Perhaps more revenue sharing
between small and large market teams would improve the situation. Considering the Axis I factor (degree of action), it might be that power tennis (partially due to new tennis racquet technologies) and the resulting shorter rallies made tennis seem less action oriented to fans in 2000.
Perhaps limiting the power of tennis racquets and thus allowing smaller, exciting players (like
Jimmy Connors and John McEnroe of the 1970s and 1980s) to be major competitors might help
increase the degree of action in tennis.
a

In Intermediate Statistical Methods and Applications, A Computer Package Approach (Prentice Hall,
1983), Mark L. Berenson, David M. Levine, and Mathew Goldstein consider a marketing research
study concerning the similarities and differences between the ten types of food shown in Table H.7.
Each type of food was given an integer rating of 1 to 7 on three scales: bland (1) versus spicy (7);
light (1) versus heavy (7); and low calories (1) versus high calories (7). Table H.7 gives the average
value for each of the food types on the three scales. Figures H.6 and H.7 present the results of a
cluster analysis and multidimensional scaling of the 10 food types.
(1) Discuss why the two axes in Figure H.7 may be interpreted as oriental versus western and
spicy versus bland.
(2) Using Table H.7 and Figures H.6 and H.7, discuss the similarities and differences between the
food types.
(3) Suppose that you are in charge of choosing restaurants to be included in a new riverfront
development that initially will include a limited number of restaurants. How might Table H.7
and Figures H.6 and H.7 help you to make your choice?

FIGURE

H.6

Japanese
Cantonese
Mandarin
French
American
Italian
Spanish
Greek
Mexican
Szechuan

A Cluster Analysis of the 10 Food Types

bow83755_app H_001-012.qxd

23/5/08

4:52 PM

Page H11

Appendix H Factor Analysis, Cluster Analysis, and Multidimensional Scaling


FIGURE

H.7

Multidimensional Scaling of the


10 Food Types

CNT
JPN

SCH
MAN
Axis I
ITL
FR

SPN GRK
MEX

AMR
Axis II

b Automakers use multidimensional scaling to measure the images of their cars. Customer surveys ask
owners of different car makes to rank their autos from 1 to 10 for such qualities as youthfulness,
luxury, and practicality. The responses are used to carry out multidimensional scaling, which
produces a perceptual map showing the images of the different cars. Figure H.8 is a perceptual map
showing car images in 1984. After viewing the map in Figure H.8, Chrysler concluded that
Plymouth, Dodge, and Chrysler needed to present a more youthful image and that Plymouth and

FIGURE

H.8

Multidimensional Scaling Showing Car Images in 1984

Perceptual MapBrand Images

Cadillac

Lincoln

Mercedes

Ford

Pontiac

Chevrolet
Datsun
Toyota

Dodge
Plymouth

Porsche

BMW

Chrysler
Buick
Oldsmobile
Conservative
Looking
Appeals to
Older People

Has a Touch of Class


a Car Id Be Proud to Own
Distinctive Looking

Has Spirited
Performance
Appeals to
Young People
Fun to Drive
Sporty Looking

VW
Very Practical
Provides Good Gas Mileage
Affordable

Source: Chrylser Corp.

Source: Marketing Research: Methodological Foundations (page 492), by Gilbert A. Churchill, Jr., The Dryden Press, Orlando,
1995.
Source: John Koten, Car Makers Use Image Map as Tool to Position Products, The Wall Street Journal (March 22, 1984),
p. 31. Reprinted by permission of The Wall Street Journal, Dow Jones & Company, Inc., 1984. All Rights Reserved
Worldwide.

H11

bow83755_app H_001-012.qxd

H12

23/5/08

4:52 PM

Appendix H

Page H12

Factor Analysis, Cluster Analysis, and Multidimensional Scaling

Dodge needed to move up on the luxury scale. By the year 2000 Chrysler had introducted cars such
as the Dodge Neon, Plymouth Breeze, Dodge Intrepid, Chrysler Concorde, and Chrysler 300 M.
These cars are more youthful and/or luxurious and tremendously increased Chrysler sales. What does
the perceptual map say about the Buick and Oldsmobile divisions of General Motors? By 2000
General Motors had made Buick the family car division and had introduced new Oldsmobiles that
were more youthful and performance oriented. Do you think a perceptual map in 2002 would show
the same relationships between the Buick and Oldsmobile divisions?

You might also like