You are on page 1of 21

Factor Analysis

By

Amir Iqbal

1

Introduction

• Factor analysis is a data reduction technique for identifying the internal structure
of a set of variables.
• Factor analysis is a de-compositional procedure that identifies the underlying
relationships that exist within a set of variables.
• Factor analysis forms groups of metric variables (interval or ratio scaled). These
groups are called factors.
• These factors can be thought of as underlying constructs that cannot be measured
by a single variable (e.g. happiness).
• Common factors have effects shared in common with more than one observed
variable.
• Unique factors have effects that are unique to a specific variable.
2

(Identified by the factor scores) 3 .OBJECTIVES OF FACTOR ANALYSIS • To determine how many factors are needed to explain the set of variables • To find the extent to which each variable is associated with each of a set of common factors. • To determine the amount of each factor possessed by each observation. • To provide interpretation to the common factors.

Assumptions • Linear Relationship – The variables used in factor analysis should be linearly related to each other. • Moderately Correlated – The variables must also be at least moderately correlated to each other. 4 . This can be checked by looking at scatterplots of pairs of variables. which means that carrying out a factor analysis would be pointless. otherwise the number of factors will be almost the same as the number of original variables.

correlation matrix • It presents the inter-correlations between the studied variables. but correlate very badly with variables outside of that group” (Field 2000). • The dimensionality of this matrix can be reduced by “looking for variables that correlate highly with a group of other variables. 5 . • These variables with high inter-correlations could well measure one underlying variable. which is called a ‘factor’.

77 1.00         0.00       0.04 0.14 0. • These clusters of variables could well be “manifestations of the same underlying variable” (Rietveld & Van Hout 1993: 255).0   0.09   0.11 1.00   1.00 .08 0.00           0.61 9 1.10 0.66 0.A hypothetical correlation matrix • In this matrix two clusters of variables with high inter-correlations are represented.08 • The data of this matrix could then be reduced down into these two underlying variables or factors.12 0. 6 0.51 0 0.4 0.06 0.87 1. 1.

– But no extreme multi-collinearity. 7 . As this would cause difficulties in determining the unique contribution of the variables to a factor (Field 2000: 444).Correlation Matrix • important: Two things – The variables have to be inter-correlated.

account for about 70-80% of the variance.Number of factors to be Retained – Retain only those factors with an eigenvalue larger than 1 (Guttman-Kaiser rule). in total. 8 . – Keep the factors which. – Make a scree-plot. keep all factors before the breaking point or elbow.

This will indicate whether there is an obvious cut-off between large and small eigenvalues.g. is probably the most common one. – The second method.How many factors to include use one of the following methods: • The factors account for a particular percentage (e. • Choose factors with eigenvalues over 1 (if using the correlation matrix). 9 . 75%) of the total variability in the original variables. • Use the scree plot of the eigenvalues. choosing eigenvalues over 1.

Interpreting factor loadings: • By one rule of thumb in confirmatory factor analysis. • In any event. • on the rationale that the .25 for other factors call loadings above .4 "low". • A lower level such as .7 level corresponds to about half of the variance in the indicator being explained by the factor. loadings should be . the . not by arbitrary cutoff levels.4 for the central factor and . 10 .6 "high" and those below .7 standard is a high one and real-life data may well not meet this criterion. factor loadings must be interpreted in the light of theory. • However.7 or higher to confirm that independent variables identified a priori are represented by a particular factor.

11 . • The Kaiser criterion is the default in SPSS and most computer programs • But is not recommended when used as the sole cut-off criterion for estimating the number of factors.0.Kaiser criterion: • The Kaiser rule is to drop all components with eigenvalues under 1.

toward later components. the eigenvalues drop. plots the components as the X axis and the corresponding eigenvalues as the Y-axis. as picking the "elbow" can be subjective because the curve has multiple elbows or is a smooth curve. 12 . – That is. the researcher may be tempted to set the cut-off at the number of factors desired by his or her research agenda. • When the drop ceases and the curve makes an elbow toward less steep decline. Cattell's scree test says to drop all further components after the one starting the elbow.Scree plot: • The Cattell scree test. • This rule is sometimes criticised for being amenable to researchercontrolled "fudging". • As one moves to the right.

13 .

14 . factor analysis is feasible. Thus from the perspective of Bartlett's test.If Barlett's test of spericity is significant H0: The inter-correlation matrix of these variables is an identity matrix is rejected.

60 or higher in order to proceed with a factor analysis.50 as a cut-off value.8 or higher.• As a rule of thumb. and a desirable value of 0. • Kaiser suggests 0. 15 . KMO should be 0.

16 .

40.2 cannot be considered => such items are omitted and the analysis must be recalculated 17 . the loading structure is likely to be random. The variables with the highest loading are the "marker variables". • If fewer than 10 variables have a loading of more than 0.60. • A factor can be interpreted if at least 10 variables have a loading of more than 0. • Normative: A factor loading of less than 0. The variables with the highest loading are the "marker variables".A loading must satisfy certain criteria • A factor can be interpreted if at least 4 variables have a loading of more than 0.40 and the sample size is less than 300.

18 .

8% + 0. 0.6912 + 0.479 => 47.9% of the variance of v01 is explained by Factors 1 and 2.479 = 0.9% = 47.0392 47.1% 19 .Example variable v01 Communality after extraction 0.

20 .

THANKS .