Professional Documents
Culture Documents
12.1 Introduction
The subject of multivariate analysis deals with the statistical analysis of the data collected on
more than one (response) variables. These variables may be correlated with each other, and
their statistical dependence is often taken into account when analyzing such data. This
consideration of statistical dependence makes multivariate analysis somewhat different in
approach and considerably more complex than the corresponding univariate analysis (in
which there is only one response variable under consideration).
When there are a number of variables in a research design, it is often helpful to reduce the
variables into a smaller set of factors. This is achieved through principal components
analysis (PCA) and factor analysis (FA) which extract factors based on the total variance of
the factors. PCA and FA are used to find a fewer number of components (factors) that explain
most of the variance (or capture most of the information) in the original set of a large number
of variables: the first factor extracted explains the highest percentage of variance (or contains
the highest percentage of information); the second factor explains the second highest
percentage of variance, etc.
A necessary condition to carry out PCA and FA is that there need to be relationships
between the variables. If the original variables are nearly uncorrelated, then there is no point
in carrying out PCA or FA.
Example:
Correlations from a group of test scores in Math and Physics may suggest an underlying
„intelligence‟ factor
Correlations from a group of test scores in swimming and track events (athletics) may
suggest an underlying „physical-fitness‟ factor
Note: PCA and FA should never be done if the number of variables is greater than the number
of subjects (sample size).
Note that if we have P original variables, the number of factors that we can construct is again
P. But retaining all P factors does not make sense since the aim of FA is to reduce the original
variables into a smaller set of factors. Thus, we retain only those factors whose variance
(eigenvalue) is greater than unity (one).
12.2 The factor model
Factor model postulate: The observed variables X1 , X2 , . . ., XP are linearly dependent upon a
few unobservable random factors (components) F1 , F2 , . . ., FM , called common factors, and P
additional sources of variation 1 , 2 , . . ., P , called errors or specific factors.
The coefficient lij is the loading of the ith variable Xi on the jth factor Fj (that is, the
covariance between the observable variable Xi and the latent common factor Fj ),
i 1, 2, . . ., P , j 1, 2, . . ., M . Note that F1 , F2 , . . ., FM are common for all observed
variables X1 , X2 , . . ., XP and hence appear in each of the equations; while 1 is associated
with only X1 , 2 is associated with only X2 , …, P is associated with only XP .
Remark: It is often the case that the original loadings lij may not have easy interpretation.
Thus, it is a usual practice to rotate them until a „simpler structure‟ is achieved. The rotation
should be such that each variable loads high on a single factor and has small-to-moderate
loadings on the remaining factors. The most widely used factor rotation is Kaiser‟s varimax
rotation.
In summary, the main aim of FA is the reduction of a large number of variables whose
interrelationships are complex to a much smaller set of new variables whose
interrelationships are simple and at the same time contain most of the information in the
original variables. The original set of P correlated variables X1 , X2 , . . ., XP is transformed
into a new set of uncorrelated variables in order to examine the relationships among them.
The new set of uncorrelated variables is called factors (constructs or components). FA often
reveals relationships that may not be previously suspected, and thus, allow easy
interpretations.
We will be using data obtained from customer satisfaction survey on the services of „Bus
Company A‟. The data set contains information on six variables that measure the quality of
services.
Select the first six variables and then click on the arrow to move them under Variables:
Bartlett's test of sphericity tests whether the correlation matrix is an identity matrix or not
(that is, whether the variables are uncorrelated or not). Note that if the variables are
uncorrelated, then the factor model is inappropriate. Rejection of the null hypothesis
(correlation matrix is an identity matrix) implies that the factor model is appropriate.
Results of factor analysis: factor analysis has extracted two factors (components). The first
component explains 37.5% of the total variance in the original six variables, while component
2 explains 28.2% of the total variance. Thus, 65.7% of the total variance in the original six
variables can be explained by these two factors, that is, 65.7% of the information contained in
the original six variables is captured by these two factors.
The factor solution from the unrotated component matrix is shown below. We can see that
the third variable (Bus is road-worthy) has almost similar loadings on components 1 and 2
(0.617 and 0.557), and hence, it is difficult to determine whether it belongs to component 1 or
2. The solution is to rotate the matrix until a „simpler structure‟ is achieved.
Component Matrix
Component
1 2
Buses always arrive on time .694 .295
Bus ticketing system is convenient .451 .570
Bus is road-worthy (does not frequently breakdown on the road) .617 .557
Bus Company A (BCA) has adequate shed for passengers .761 -.405
Bus has spacious seats, ample leg room & foot space .800 -.395
Buses are Neat .747 -.242
The first three variables load high on component 2. These variables are indicators of the
ability to perform the promised service dependably and accurately. Thus, component 2 can be
labeled as „reliability‟.