Professional Documents
Culture Documents
One may also express each of the p variables as a linear combination of the m
factors,
X j = A1 j F1 + A2 j F2 + + Amj Fm +U j
where Uj is the variance that is unique to variable j, variance that cannot be explained
by any of the common factors.
2
Click Descriptives and then check Initial Solution, Coefficients, and KMO and
Bartlett’s Test of Sphericity. Click Continue.
Click Extraction and then select Correlation Matrix, Unrotated Factor Solution,
Scree Plot, and Eigenvalues Over 1. Click Continue.
3
Click Rotation. Select Varimax and Rotated Solution. Click Continue.
Click Options. Select Exclude Cases Listwise and Sorted By Size. Click Continue.
4
Checking For Unique Variables
Aside from the raw data matrix, the first matrix you are likely to encounter in a
PCA or FA is the correlation matrix. Here is the correlation matrix for our data:
A large partial correlation indicates that the variables involved share variance that is not
shared by the other variables in the data set. Kaiser’s Measure of Sampling Adequacy
(MSA) for a variable Xi is the ratio of the sum of the squared simple r’s between Xi and
5
each other X to (that same sum plus the sum of the squared partial r’s between Xi and
each other X). Recall that squared r’s can be thought of as variances.
MSA =
∑r ij
2
∑r + ∑ pr
ij
2
ij
2
Small values of MSA indicate that the correlations between Xi and the other
variables are unique, that is, not related to the remaining variables outside each simple
correlation. Kaiser has described MSAs above .9 as marvelous, above .8 as
meritorious, above .7 as middling, above .6 as mediocre, above .5 as miserable, and
below .5 as unacceptable.
The MSA option in SAS’ PROC FACTOR [Enter PROC FACTOR MSA;] gives
you a matrix of the partial correlations, the MSA for each variable, and an overall MSA
computed across all variables. Variables with small MSAs should be deleted prior to FA
or the data set supplemented with additional relevant variables which one hopes will be
correlated with the offending variables. SPSS gives only the overall MSA.
For our sample data the partial correlation matrix looks like this:
COST SIZE ALCOHOL REPUTAT COLOR AROMA TASTE
COST 1.00 .54 -.11 -.26 -.10 -.14 .11
SIZE .54 1.00 .81 .11 .50 .06 -.44
ALCOHOL -.11 .81 1.00 -.23 -.38 .06 .31
REPUTAT -.26 .11 -.23 1.00 .23 -.29 -.26
COLOR -.10 .50 -.38 .23 1.00 .57 .69
AROMA -.14 .06 .06 -.29 .57 1.00 .09
TASTE .11 -.44 .31 -.26 .69 .09 1.00
___________________________________________________________
MSA .78 .55 .63 .76 .59 .80 .68
OVERALL MSA = .67
These MSA’s may not be marvelous, but they aren’t low enough to make me
drop any variables (especially since I have only seven variables, already an
unrealistically low number).
The SPSS output is much less detailed:
KMO and Bartlett's Test
6
Extracting Principal Components
We are now ready to extract principal components. We shall let the computer do
most of the work, which is considerable. From p variables we can extract p
components. This will involve solving p equations with p unknowns. The variance in
the correlation matrix is “repackaged” into p eigenvalues. Each eigenvalue represents
the amount of variance that has been captured by one component.
Each component is a linear combination of the p variables. The first component
accounts for the largest possible amount of variance. The second component, formed
from the variance remaining after that associated with the first component has been
extracted, accounts for the second largest amount of variance, etc. The principal
components are extracted with the restriction that they are orthogonal. Geometrically
they may be viewed as dimensions in p-dimensional space where each dimension is
perpendicular to each other dimension.
Each of the p variable’s variance is standardized to one. Each factor’s eigenvalue
may be compared to 1 to see how much more (or less) variance it represents than does
a single variable. With p variables there is p x 1 = p variance to distribute. The principal
components extraction will produce p components which in the aggregate account for
all of the variance in the p variables. That is, the sum of the p eigenvalues will be equal
to p, the number of variables. The proportion of variance accounted for by one
component equals its eigenvalue divided by p.
For our beer data, here are the eigenvalues and proportions of variance for the
seven components:
Initial Eigenvalues
% of Cumulative
Component Total Variance %
1 3.313 47.327 47.327
2 2.616 37.369 84.696
3 .575 8.209 92.905
4 .240 3.427 96.332
5 .134 1.921 98.252
6 9.E-02 1.221 99.473
7 4.E-02 .527 100.000
Extraction Method: Principal Component Analysis.
7
with eigenvalues of one or more. That is, drop any component that accounts for less
variance than does a single variable. Another device for deciding on the number of
components to retain is the scree test. This is a plot with eigenvalues on the ordinate
and component number on the abscissa. Scree is the rubble at the base of a sloping
cliff. In a scree plot, scree is those components that are at the bottom of the sloping plot
of eigenvalues versus component number. The plot provides a visual aid for deciding at
what point including additional components no longer increases the amount of variance
accounted for by a nontrivial amount. Here is the scree plot produced by SPSS:
Scree Plot
3.5
3.0
2.5
2.0
1.5
1.0
Eigenvalue
.5
0.0
1 2 3 4 5 6 7
Component Number
For our beer data, only the first two components have eigenvalues greater than
1. There is a big drop in eigenvalue between component 2 and component 3. On a
scree plot, components 3 through 7 would appear as scree at the base of the cliff
composed of components 1 and 2. Together components 1 and 2 account for 85% of
the total variance. We shall retain only the first two components.
I often find it useful to try at least three different solutions, and then decide
among them which packages the variance in a way most pleasing to me. Here I could
try a one component, a two component, and a three component solution.
8
Component Matrixa
Component
1 2
COLOR .760 -.576
AROMA .736 -.614
REPUTAT -.735 -.071
TASTE .710 -.646
COST .550 .734
ALCOHOL .632 .699
SIZE .667 .675
Extraction Method: Principal Component Analysis.
a. 2 components extracted.
As you can see, almost all of the variables load well on the first component, all
positively except reputation. The second component is more interesting, with 3 large
positive loadings and three large negative loadings. Component 1 seems to reflect
concern for economy and quality versus reputation. Component 2 seems to reflect
economy versus quality.
Remember that each component represents an orthogonal (perpendicular)
dimension. Fortunately, we retained only two dimensions, so I can plot them on paper.
If we had retained more than two components, we could look at several pairwise plots
(two components at a time).
For each variable I have plotted in the vertical dimension its loading on
component 1, and in the horizontal dimension its loading on component 2. Wouldn’t it
be nice if I could rotate these axes so that the two dimensions passed more nearly
through the two major clusters (COST, SIZE, ALCH and COLOR, AROMA, TASTE)?
Imagine that the two axes are perpendicular wires joined at the origin (0,0) with a pin. I
rotate them, preserving their perpendicularity, so that the one axis passes through or
near the one cluster, the other through or near the other cluster. The number of
9
degrees by which I rotate the axes is the angle PSI. For these data, rotating the axes
-40.63 degrees has the desired effect.
Here is the loading matrix after rotation:
a
Rotated Component Matrix
Component
1 2
TASTE .960 -.028
AROMA .958 1.E-02
COLOR .952 6.E-02
SIZE 7.E-02 .947
ALCOHOL 2.E-02 .942
COST -.061 .916
REPUTAT -.512 -.533
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.
10
I used with our example data). In this case, they found that the first unrotated factor had
loadings close to those of the true factor, with only low loadings on the second factor.
However, after rotation, factor splitting took place – for some of the variables the
obtained solution grossly underestimated their loadings on the first factor and
overestimated them on the second factor. That is, the second factor was imaginary and
the first factor was corrupted. Interestingly, if there were unique variables in the data
set, such factor splitting was not a problem. The authors suggested that one include
unique variables in the data set to avoid this potential problem. I suppose one could do
this by including "filler" items on a questionnaire. The authors recommend using a
random number generator to create the unique variables or manually inserting into the
correlation matrix variables that have a zero correlation with all others. These unique
variables can be removed for the final analysis, after determining how many factors to
retain.
Explained Variance
The SPSS output also gives the variance explained by each component after
the rotation. The variance explained is equal to the sum of squared loadings (SSL)
across variables. For component 1 that is (.762 + .742 +...+ .672) = 3.31 = its
eigenvalue before rotation and (.962 + .962 +...+ -.512) = 3.02 after rotation. For
component 2 the SSL’s are 2.62 and 2.91. After rotation component 1 accounted for
3.02/7 = 43% of the total variance and 3.02 / (3.02 + 2.91) = 51% of the variance
distributed between the two components. After rotation the two components together
account for (3.02 + 2.91)/7 = 85% of the total variance.
The SSL’s for components can be used to help decide how many components to
retain. An after rotation SSL is much like an eigenvalue. A rotated component with an
SSL of 1 accounts for as much of the total variance as does a single variable. One may
want to retain and rotate a few more components than indicated by “eigenvalue 1 or
more” criterion. Inspection of the retained components’ SSL’s after rotation should tell
you whether or not they should be retained. Sometimes a component with an
eigenvalue > 1 will have a postrotation SSL < 1, in which case you may wish to drop it
and ask for a smaller number of retained components.
You also should look at the postrotation loadings to decide how well each
retained component is defined. If only one variable loads heavily on a component, that
component is not well defined. If only two variables load heavily on a component, the
11
component may be reliable if those two variables are highly correlated with one another
but not with the other variables.
Naming Components
Now let us look at the rotated loadings again and try to name the two
components. Component 1 has heavy loadings (>.4) on TASTE, AROMA, and COLOR
and a moderate negative loading on REPUTATION. I’d call this component
AESTHETIC QUALITY. Component 2 has heavy loadings on large SIZE, high
ALCOHOL content, and low COST and a moderate negative loading on REPUTATION.
I’d call this component CHEAP DRUNK.
Communalities
Let us also look at the SSL for each variable across factors. Such a SSL is
called a communality. This is the amount of the variable’s variance that is accounted
for by the components (since the loadings are correlations between variables and
components and the components are orthogonal, a variable’s communality represents
the R2 of the variable predicted from the components). For our beer data the
communalities are COST, .84; SIZE, .90; ALCOHOL, .89; REPUTAT, .55; COLOR, .91;
AROMA, .92; and TASTE, .92.
Communalities
Initial Extraction
COST 1.000 .842
SIZE 1.000 .901
ALCOHOL 1.000 .889
REPUTAT 1.000 .546
COLOR 1.000 .910
AROMA 1.000 .918
TASTE 1.000 .922
Extraction Method: Principal Component Analysis.
12
It is also possible to employ oblique rotational methods. These methods do not
produce orthogonal components. Suppose you have done an orthogonal rotation and
you obtain a rotated loadings plot that looks like this:
13