You are on page 1of 6

Chapter XII

Introduction to Multivariate Statistical Analysis

12.1 Introduction

The subject of multivariate analysis deals with the statistical analysis of the data collected on
more than one (response) variables. These variables may be correlated with each other, and
their statistical dependence is often taken into account when analyzing such data. This
consideration of statistical dependence makes multivariate analysis somewhat different in
approach and considerably more complex than the corresponding univariate analysis (in
which there is only one response variable under consideration).

When there are a number of variables in a research design, it is often helpful to reduce the
variables into a smaller set of factors. This is achieved through principal components
analysis (PCA) and factor analysis (FA) which extract factors based on the total variance of
the factors. PCA and FA are used to find a fewer number of components (factors) that explain
most of the variance (or capture most of the information) in the original set of a large number
of variables: the first factor extracted explains the highest percentage of variance (or contains
the highest percentage of information); the second factor explains the second highest
percentage of variance, etc.

A necessary condition to carry out PCA and FA is that there need to be relationships
between the variables. If the original variables are nearly uncorrelated, then there is no point
in carrying out PCA or FA.

Example:
 Correlations from a group of test scores in Math and Physics may suggest an underlying
„intelligence‟ factor
 Correlations from a group of test scores in swimming and track events (athletics) may
suggest an underlying „physical-fitness‟ factor

Note: PCA and FA should never be done if the number of variables is greater than the number
of subjects (sample size).

The essential purpose of FA is to describe the covariance relationships among a set of P


correlated variables X1 , X2 , . . ., XP in terms of a few underlying, but unobservable
(latent), random quantities called factors F1 , F2 , . . ., FM , where M is less than or equal to P.
The observed variables are then modeled as linear combinations of these M factors, plus
“error” terms.

Note that if we have P original variables, the number of factors that we can construct is again
P. But retaining all P factors does not make sense since the aim of FA is to reduce the original
variables into a smaller set of factors. Thus, we retain only those factors whose variance
(eigenvalue) is greater than unity (one).
12.2 The factor model

Factor model postulate: The observed variables X1 , X2 , . . ., XP are linearly dependent upon a
few unobservable random factors (components) F1 , F2 , . . ., FM , called common factors, and P
additional sources of variation 1 , 2 , . . ., P , called errors or specific factors.

The factor model is given by:

X1  1  l11F1  l12 F2  . . .  l1M FM  1


X 2   2  l21F1  l22 F2  . . .  l2M FM   2
.....
X P   P  lP1F1  lP2 F2  . . .  lPM FM  P

The coefficient lij is the loading of the ith variable Xi on the jth factor Fj (that is, the
covariance between the observable variable Xi and the latent common factor Fj ),
i  1, 2, . . ., P , j  1, 2, . . ., M . Note that F1 , F2 , . . ., FM are common for all observed
variables X1 , X2 , . . ., XP and hence appear in each of the equations; while 1 is associated
with only X1 , 2 is associated with only X2 , …, P is associated with only XP .

Remark: It is often the case that the original loadings lij may not have easy interpretation.
Thus, it is a usual practice to rotate them until a „simpler structure‟ is achieved. The rotation
should be such that each variable loads high on a single factor and has small-to-moderate
loadings on the remaining factors. The most widely used factor rotation is Kaiser‟s varimax
rotation.

In summary, the main aim of FA is the reduction of a large number of variables whose
interrelationships are complex to a much smaller set of new variables whose
interrelationships are simple and at the same time contain most of the information in the
original variables. The original set of P correlated variables X1 , X2 , . . ., XP is transformed
into a new set of uncorrelated variables in order to examine the relationships among them.
The new set of uncorrelated variables is called factors (constructs or components). FA often
reveals relationships that may not be previously suspected, and thus, allow easy
interpretations.

12.3 Illustrative example

We will be using data obtained from customer satisfaction survey on the services of „Bus
Company A‟. The data set contains information on six variables that measure the quality of
services.

Advanced Research Methods II, ECSU 5 Emmanuel GabreYohannes


Procedures for factor analysis in SPSS

Click on Analyze, select Dimension Reduction and then Factor…

Select the first six variables and then click on the arrow to move them under Variables:

Advanced Research Methods II, ECSU 6 Emmanuel GabreYohannes


Click on Descriptives…, select KMO and Bartlett’s test of sphericity, and then click on
Continue

Click on Rotation…, select Varimax, and then click on Continue

When you click on OK, you will see a number of outputs.

Bartlett's test of sphericity tests whether the correlation matrix is an identity matrix or not
(that is, whether the variables are uncorrelated or not). Note that if the variables are
uncorrelated, then the factor model is inappropriate. Rejection of the null hypothesis
(correlation matrix is an identity matrix) implies that the factor model is appropriate.

Advanced Research Methods II, ECSU 7 Emmanuel GabreYohannes


In our case, the p-vale of the test is less than 0.05. Thus, we reject the null hypothesis and
conclude that we can proceed with factor analysis.

KMO and Bartlett's Test


Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .745
Approx. Chi-Square 525.288
Bartlett's Test of Sphericity df 15
Sig. .000

Results of factor analysis: factor analysis has extracted two factors (components). The first
component explains 37.5% of the total variance in the original six variables, while component
2 explains 28.2% of the total variance. Thus, 65.7% of the total variance in the original six
variables can be explained by these two factors, that is, 65.7% of the information contained in
the original six variables is captured by these two factors.

Total Variance Explained


Component Initial Eigenvalues Extraction Sums of Squared Rotation Sums of Squared
Loadings Loadings
Total % of Cumulative Total % of Cumulative Total % of Cumulative
Variance % Variance % Variance %
1 2.842 47.366 47.366 2.842 47.366 47.366 2.249 37.491 37.491
2 1.101 18.354 65.720 1.101 18.354 65.720 1.694 28.229 65.720
3 .772 12.861 78.581
4 .560 9.333 87.914
5 .440 7.330 95.245
6 .285 4.755 100.000

The factor solution from the unrotated component matrix is shown below. We can see that
the third variable (Bus is road-worthy) has almost similar loadings on components 1 and 2
(0.617 and 0.557), and hence, it is difficult to determine whether it belongs to component 1 or
2. The solution is to rotate the matrix until a „simpler structure‟ is achieved.

Component Matrix
Component
1 2
Buses always arrive on time .694 .295
Bus ticketing system is convenient .451 .570
Bus is road-worthy (does not frequently breakdown on the road) .617 .557
Bus Company A (BCA) has adequate shed for passengers .761 -.405
Bus has spacious seats, ample leg room & foot space .800 -.395
Buses are Neat .747 -.242

Advanced Research Methods II, ECSU 8 Emmanuel GabreYohannes


The rotated component matrix (the final solution) is shown below. We can see that the last
three variables (indicators) load high on component 1. These variables are the 'visible' aspects
of the service (appearance of physical facilities, equipment, etc.) that are employed to
improve customer satisfaction. Thus, component 1 can be labeled as „tangibles‟.

The first three variables load high on component 2. These variables are indicators of the
ability to perform the promised service dependably and accurately. Thus, component 2 can be
labeled as „reliability‟.

Rotated Component Matrix


Component
1 2
Buses always arrive on time .392 .644
Bus ticketing system is convenient .034 .726
Bus is road-worthy (does not frequently breakdown on the road) .176 .812
Bus Company A (BCA) has adequate shed for passengers .854 .115
Bus has spacious seats, ample leg room & foot space .880 .146
Buses are neat .748 .239

Advanced Research Methods II, ECSU 9 Emmanuel GabreYohannes

You might also like