Professional Documents
Culture Documents
Outline
Data reduction
PCA vs. FA
Assumptions and other issues
Multivariate analysis in terms of eigenanalysis
PCA basics
Examples
Ockhams Razor1
One of the hallmarks of good science is
parsimony and elegance of theory
However in analysis it is also often desirable to
reduce data in some fashion in order to better
get a better understanding of it or simply for
ease of analysis
In MR this was done more
implicitly
Reduce the predictors to a single
composite
The sum of the weighted variables
x First component captures most of the variance, 2nd second most and
so on until all the variance is accounted for
x Since the first few capture most of the variance they are typically of
focus
PCA/FA
Principal Components Analysis
Extracts all the factors underlying a set of
variables
The number of factors = the number of variables
Completely explains the variance in each variable
Factor Analysis
Analyzes only the shared variance
x Error is estimated apart from shared variance
FA
I1
I2
PCA
I3
I1
I2
I3
FA vs. PCA
PCA
PCA is mathematically precise in orthogonalizing
dimensions
PCA redistributes all variance into orthogonal components
PCA uses all variable variance and treats it as true variance
FA
FA distributes common variance into orthogonal factors
FA is conceptually realistic in identifying common factors
FA recognizes measurement error and true factor variance
FA vs. PCA
In some sense, PCA and FA are not so different
conceptually than what we have been doing
since multiple regression
Creating linear combinations
PCA especially falls more along the line of what weve
already been doing
FA analyzes covariance
(communality)
FA is a close approximation to
the R matrix
FA the goal is to explain as
much of the covariance with a
minimum number of factors
that are tied specifically to
assumed constructs
FA can give multiple solutions
depending on the method and
the estimates of communality
Assumptions/Issues
Assumes reliable variables/correlations
Very much affected by missing data, outlying cases and
truncated data
Data screening methods (e.g. transformations, etc.) may
improve poor factor analytic results
Normality
Univariate - normally distributed variables make the
solution stronger but not necessary if we are using the
analysis in a purely descriptive manner
Multivariate is assumed when assessing the number of
factors
Assumptions/Issues
No outliers
Influence on correlations would bias results
Variables as outliers
Assumptions/Issues
Factorable R matrix
Need inter-item/variable correlations > .30 or PCA/FA isnt going to do
much for you
Large inter-item correlations do not guarantee a solution either
x While two variables may be highly correlated, they may not be correlated with
others
Multicollinearity/Singularity
In traditional PCA it is not problem; no matrix inversion is necessary
x As such it is a solution to dealing with collinearity in regression
Assumptions/Issues
Sample Size and Missing Data
True missing data are handled in the usual ways
Factor analysis via Maximum Likelihood needs large samples and it is one of
the only drawbacks
The more reliable the correlations are, the smaller the number of
subjects needed
Need enough subjects for stable estimates
How many?
Depends on the nature of the data and the number of parameters to be
estimated
x For example, a simple setting with few variables and clean data might not need as
much
x Having several hundred data points for a more complex solution with messy data
with lower correlations among the variables might not provide a meaningful result
(PCA) or even converge upon a solution (FA)
Other issues
No readily defined criteria by which to judge
outcome
Before we had R2 for example
Extraction Methods
PCA
Extracts maximum variance with each component
First component is a linear combination of variables
that maximizes component score variance for the cases
The second (etc.) extracts the max. variance from the
residual matrix left over after extracting the first
component (therefore orthogonal to the first)
If all components retained, all variance explained
PCA
Components are linear combinations of variables.
These combinations are based on weights (eigenvectors) developed by the analysis
Although in PCA there are no sides of the equation, and youre not necessarily
correlating the factors, components, variates, etc.
PCA
With multivariate research we come to eigenvalues and
eignenvectors
Eigenvalues
Conceptually can be considered to measure the strength (relative length)
of an axis in N-dimensional space
Derived via eigenanalysis of the square symmetric matrix
x The covariance or correlation matrix
Eigenvector
Each eigenvalue has an associated eigenvector. While an eigenvalue is
the length of an axis, the eigenvector determines its orientation in space.
The values in an eigenvector are not unique because any coordinates that
described the same orientation would be acceptable.
Data
Example data of womens height and weight
height
57
58
60
59
61
60
62
61
62
63
62
64
63
65
64
66
67
66
68
69
weight
93
110
99
111
115
122
110
116
122
128
134
117
123
129
135
128
135
148
142
155
Zheight
Zweight
-1.77427146053986
-1.47097719378091
-.86438866026301
-1.16768292702196
-.561094393504058
-.86438866026301
-.257800126745107
-.561094393504058
-.257800126745107
.0454941400138444
-.257800126745107
.348788406772796
.0454941400138444
.652082673531747
.348788406772796
.955376940290699
1.25867120704965
.955376940290699
1.5619654738086
1.86525974056755
-1.96516286068824
-.873405715861441
-1.5798368095729
-.809184707342218
-.552300673265324
-.102753613630758
-.873405715861441
-.4880796647461
-.102753613630758
.282572437484583
.667898488599925
-.423858656226876
-.0385326051115347
.346793446003807
.732119497119148
.282572437484583
.732119497119148
1.56699260786906
1.18166655675371
2.01653966750362
Data transformation
Consider two variables height and weight
X would be our data matrix, w our eigenvector
(coefficients)
Multiplying our original data by these weights1
results in a column vector of values
z1 = Xw
Data transformation
Consider a woman 5 and 122 pounds
She is -.86sd from the mean height and -.10 sd
from the mean weight for this data
b
a ' b = (a1a2 ) 1 = a1b1 + a2b2
b2
The first eigenvector associated with the
normalized data1 is [.707,.707], as such the
resulting value for that data point is -.68
So with the top graph we have taken the
original data point and projected it onto a new
axis -.68 units from the origin
Now if we do this for all data points we will
have projected them onto a new
axis/component/dimension/factor/linear
combination
The length of the new axis is the eigenvalue
Data transformation
Suppose we have more than one
dimension/factor?
In our discussion of the techniques thus far,
we have said that each component or
dimension is independent of the previous
one
What does independent mean?
r=0
r=1
Data transformation
The other eigenvector associated with
the data is (-.707,.707)
Doing as we did before wed create that
second axis, and then could plot the data
points along these new axes1
We now have two linear combinations,
each of which is interpretable as the
vector comprised of projections of
original data points onto a directed line
segment
Note how the basic shape of the original
data has been perfectly maintained
The effect has been to rotate the
configuration (45o) to a new orientation
while preserving its essential size and
shape
It is an orthogonal transformation
Note that we have been talking of
specifiying/rotating axes, but rotating the
points themselves would give us the same
result
Application of PC analysis
Components analysis is a kind of data reduction
start with an inter-related set of measured variables
identify a smaller set of composite variables that can be constructed
from the measured variables and that carry as much of their
information as possible
PC Extraction
Extraction is the process of forming PCs as linear combinations of
the measured variables as we have done with our other techniques
x PC1 = b11X1 + b21X2 + + bk1Xk
x PC2 = b12X1 + b22X2 + + bk2Xk
x PCf = b1fX1 + b2fX2 + + bkfXk
3 variable example
Consider 3 variables with
the correlations displayed
In a 3d sense we might
envision their
relationship as this, with
the shadows what the
scatterplots would
roughly look like for each
bivariate relationship
X1
X3
X2
PCA
In principal components, we extract as many
components as there are variables
As mentioned previously, each component by default is
uncorrelated with the previous
If we save the component scores and were to look at their
graph it would resemble something like this
Here is an example of
magazine readership from the
chapter handout
Underlined loadings are > .30
How might this be
interpreted?
Applied example
Six items
Three sadness, three relationship quality
N = 300
PCA
Matrix reproduction
Loadings
Eigenvalue of factor 1 =
.6092 + .6142 .5932 + .7282 + .7672 + .7642 = 2.80
Note that an index of the quality of a factor analysis (as opposed to PCA) is the extent to
which the factor loadings can reproduce the correlation matrix1. with PCA, the correlation
is reproduced exactly if all components are retained, however when we dont, we can use a
similar approach to fit.
Original correlation
For components
p2 p
2 p + 5
ln R
= (n 1)
p2 p
= df
2
p = number of variables
n = number of observations
ln R = natural log of the determinant of R
Kaisers Rule
Practical approach
Scree Plot
Chi-square
Horns Procedure
Rotation
Sometimes our loadings will be a little difficult
to interpret initially
Given such a case we can rotate the solution
such that the loadings perhaps make more sense
This is typically done in factor analysis but is possible
here too
Rotation
You can think of it as shifting the axes or
rotating the egg in our previous graphic
The gist is that the relations among the items is
maintained, while maximizing their more
natural loadings and minimizing off-loadings1
Note that as PCA is a technique that initially
creates independent components, and
orthogonal rotations that maintain this
independence are typically used
Loadings will be either large or small, little in between
Jackknife
With smaller samples conduct PCA multiple times each with a specific case held
out
Using the eigenvectors, calculate the component score for the value held out
Compare the eigenvalues for the components involved
Bootstrap
Other issues:
Factoring items vs. factoring scales1
Items are often factored as part of the process of
scale development
Check if the items go together like the scales
author thinks
Scales (composites of items) are factored to
examine construct validity of new scales
test theory about what constructs are interrelated
Other issues:
Factoring items vs. factoring scales
The limited reliability and validity of items
means that they will be measured with less
precision, and so, their intercorrelations for any
one sample will be fraught with error
Since factoring starts with R, factorings of items
is likely to yield spurious solutions -- replication
of item-level factoring is very important!
Is the issue really items vs. scales ?
No -- it is really the reliability and validity of the things
being factored, scales having these properties more than
items
Other issues:
When is it appropriate to use PCA?
Another reason to use PCA, which isnt a great one
obviously, is that the maximum likelihood test
involved in and Exploratory Factor Analysis does
not converge
PCA will always give a result (it does not require
matrix inversion) and so can often be used in such a
situation
Well talk more on this later, but in data reduction
situations EFA is typically to be preferred for social
scientists and others that use imprecise measures
Other issues:
Selecting Variables for Analysis
Sometimes a researcher has access to a data set that
someone else has collected -- an opportunistic data set
While this can be a real money/time saver, be sure to
recognize the possible limitations
Be sure the sample represents a population you want to
talk about
Carefully consider variables that arent included and
the possible effects their absence has on the resulting
factors
this is especially true if the data set was chosen to be efficient variables
chosen to cover several domains
Other issues:
Selecting the Sample for Analysis
How many?
Keep in mind that the R and so the factor solution is the
same no matter how many cases are used -- so the point
is the representativeness and stability of the correlation
Advice about the subject/variable ration varies pretty
dramatically
5-10 cases per variable
300 cases minimum (maybe + # of items)1
PCA in R1
Package name
Function name
base
princomp
psych
principal
VSS
pcaMethods
As the name implies this package is all about PCA, and from a modern
approach. Will automatically estimate missing values (via traditional,
robust, or Bayesian methods) and is useful just for that for any analysis.
pca
Q2 for cross validation