Professional Documents
Culture Documents
Term III
FACTOR ANALYSIS
Ankit Shah(251135)
Dhiraj Upadhyaya(251152)
Indrasish Mishra(251160)
1
Contents
INTRODUCTION......................................................................................................................3
OBJECTIVE OF THE PROBLEMS.........................................................................................3
APPLICATIONS:......................................................................................................................3
HYPOTHYSIS FORMULATION:............................................................................................4
Sample Problem 1:.....................................................................................................................4
Sample Problem 2:...................................................................................................................10
Sample Problem 3:...................................................................................................................13
Sample Problem 4:...................................................................................................................22
Sample Problem 5:...................................................................................................................36
Sample Problem 6:...................................................................................................................40
INTRODUCTION
2
Factor analysis is a method of data reduction. It does this by seeking underlying
unobservable (latent) variables that are reflected in the observed variables (manifest
variables). It is a technique that requires a large sample size. It is based on the correlation
matrix of the variables involved, and correlations usually need a large sample size before they
stabilize. Tabachnick and Fidell cite Comrey and Lee’s (1992) advise regarding sample size:
50 cases is very poor, 100 is poor, 200 is fair, 300 is good, 500 is very good, and 1000 or
more is excellent. As a rule of thumb, a bare minimum of 10 observations per variable is
necessary to avoid computational difficulties.
Objective of the problems is to reduce the number of variables and make categories based on
correlation among variables.
In order to attain this objective, Factor analysis has been used on 6 different datasets to
combine and reduce the number of variables on the basis of correlation among the variables.
APPLICATIONS
2. Screening of Variables:
3. Summary:
4. Sampling of variables:
5. Clustering of objects:
3
● Helps us to put objects (people) into categories depending on their factor
scores.
HYPOTHESIS FORMULATION
The null hypothesis for factor analysis is that the correlation matrix is an identity matrix.
DATA ANALYSIS
SAMPLE PROBLEM 1
The KMO statistic varies between 0 and 1. A value of 0 indicates that the sum of partial
correlations is large relative to the sum of correlations, indicating diffusion in the pattern of
correlations and value close to 1 indicates that patterns of correlations are relatively compact
and so factor analysis will give distinct and reliable factors. It should be greater than 0.5 at
least but ideally it should be greater than 0.6 here the value is .755 which is acceptable.
4
Bartlett's measure tests the null hypothesis that the original correlation matrix is an identity
matrix. For factor analysis to work we need some relationships between variables and if the R
matrix were an identity matrix then all correlation coefficients would be zero. The significant
value (p value) should be less than 0.05 and here its 0.000. So Null hypothesis is rejected.
2 ) Communalities
The initial component value before extraction of the communalities are all 1 implying 100%
participation . Extraction reflect the common variance in the data structure.
For eg. 61.3% of the variance associated with 'Form of letter of application' is common, or
shared, variance. After extraction some of the factors are discarded and so some information
is lost. The amount of variance in each variable that can be explained by the retained factors
is represented by the communalities after extraction.
5
3 ) Total Variance Explained
It shows the Eigen values associated with each linear component (factor) before extraction,
after extraction and after rotation. 15 linear components have been found before extraction.
The Eigen values associated with each factor represent the variance explained by that
particular linear component and also the percentage of variance explained like factor 1 shows
43.618% of total variance. The first few factors have relatively large amounts of variance
(especially factor 1) which slowly reduces to small amounts of variance. All factors with
Eigen values greater than 1 are extracted which gives four factors displayed in Extraction
Sums of Squared Loadings. In Rotation Sums of Squared Loadings, the Eigen values of the
factors after rotation are displayed. Rotation has the effect of optimizing the factor structure
and one consequence for these data is that the relative importance of the four factors is
equalized. Before rotation, factor 1 accounted for considerably more variance than the
remaining three (43.618% compared to 11.678,8.713) but after extraction it was only
32.800% of variance (compared to 15.161, 14.905 respectively).
6
4 ) Scree Plot
The component number 4 reflects the point of inflexion on the curve. The curve reaches a
stable plateau after that .
7
5) Component Matrix
This matrix contains the loadings of each variable onto each factor. 4 components are created.
Risk averseness in few parameters due to high correlation.
The idea of rotation is to reduce the number factors on which the variables under
investigation have high loadings. Rotation does not actually change anything but makes the
interpretation of the analysis easier. Looking at the table above, we can see which variables
are substantially loaded on Factor (Component). These factors can be used as variables for
further analysis. A cut-off value of 0.5 is generated to compare all the factors. Values of
8
components which are above the cut-off value of 0.5 are considered. Few variables are
independent. Now, factors with high cut-off are named as per their characteristics.
9
SAMPLE PROBLEM 2
Objective
A survey was conducted by a University about opinion of its employees (Faculty) on various
parameters of the university like Student performance, Caste, Gender, Race etc. A
questionnaire was prepared in this regard with 67 questions and 1428 respondents.
Data Analysis
Communalities
The following table consisted of all values greater than 0.5 and values upto 0.985. This
indicates that the factors could significantly explain variances of all the variables.
KMO = 0.755
Bartlett Test
As null hypothesis was rejected, it is concluded that Correlation Matrix is not equal to
Identity matrix. There existed correlation among variables and hence Factor reduction test
could be applied upon.
10
Total Variance Explained
This table pointed out 16 factors having Eigen Values greater than 1, which could
significantly explain the variance of all the variables with data loss of 33% approx.
Component Matrix
Table indicates the variation of each variable explained by each factor. Some of the variables
had insignificant variations explained by all the factors. So the values were adjusted in next
table.
11
Rotated Component Matrix
The table gives the refined view of the previous tables with all variables being significantly
explained by one factor or the other using Varimax iterations. 67 variables were clubbed into
16 factors.
This table shows variation of one factor explained by all other factors.
12
SAMPLE PROBLEM 3
Objective
A survey was conducted to understand how anxious a given individual would be about
learning how to use SPSS, a questionnaire was designed which contained 23 variables and
2571 responses were obtained to understand what latent variables contribute to the anxiety
about the SPSS.
KMO = 0.930
Bartlett Test
As null hypothesis was rejected, it is concluded that Correlation Matrix is not equal to
Identity matrix. There existed correlation among variables and hence Factor reduction test
could be applied upon.
df 253
Sig. .000
Communalities
The following table consisted of all values are greater than 0.5 and however for all intents and
purposes, we proceed with the analysis.
13
Communalities
Initial Extraction
14
Computers are out to get me 1.000 .378
This table pointed out 4 factors having Eigen Values greater than 1, which could significantly
explain the variance of all the variables with data loss of 49% approx.
15
2 1.739 7.560 39.256 1.739 7.560 39.256
16
Component Matrix
Table indicates the variation of each variable explained by each factor. Some of the variables
had insignificant variations explained by all the factors. So the values were adjusted in next
table.
Component Matrixa
Component
1 2 3 4
17
My friends will think I'm stupid -.303 .548 .146 .010
for not being able to cope with
SPSS
People try to tell you that SPSS .669 -.048 .051 .248
makes statistics easier to
understand but it doesn't
18
I slip into a coma whenever I .643 .330 -.210 -.342
see an equation
The table gives the refined view of the previous tables with all variables being significantly
explained by one factor or the other using Varimax iterations. 23 variables were clubbed into
4 factors.
Component
1 2 3 4
19
I dream that Pearson is attacking .320 .516 .314 .039
me with correlation coefficients
People try to tell you that SPSS .473 .523 .095 -.084
makes statistics easier to
understand but it doesn't
20
Everybody looks at me when I -.146 -.372 -.029 .428
use SPSS
21
SAMPLE PROBLEM 4
A survey was conducted to understand how do aptitude and standardized tests form
performance dimensions, a questionnaire was designed which contained 74 variables and 107
responses were obtained to understand what latent variables contribute to performance in the
tests.
Here, the value is .792 which is acceptable. The data has a good adequacy with a high score.
Bartlett's measure tests the null hypothesis that the original correlation matrix is an identity
matrix. For factor analysis to work we need some relationships between variables and if the R
matrix were an identity matrix then all correlation coefficients would be zero. The significant
value (p value) should be less than 0.05 and here its 0.000.So null hypothesis is rejected.
2 ) Communalities
The initial component value before extraction of the communalities are all 1 implying 100%
participation .Extraction reflect the common variance in the data structure.
22
23
24
The following table consisted of values greater than 0.5 and values upto 0.998 as well as less
than 0.5. However for all intents and purposes, we proceed with the analysis.
After extraction some of the factors are discarded and so some information is lost. The
amount of variance in each variable that can be explained by the retained factors is
represented by the communalities after extraction.
25
It shows the Eigen values associated with each linear component (factor) before extraction,
after extraction and after rotation. The Eigen values associated with each factor represent the
variance explained by that particular linear component and also the percentage of variance
explained like factor 1 shows 31.867% of total variance. The first few factors have relatively
large amounts of variance (especially factor 1) which slowly reduces to small amounts of
variance. All factors with Eigen values greater than 1 are extracted which gives seven factors
displayed in Extraction Sums of Squared Loadings. In Rotation Sums of Squared Loadings,
the Eigen values of the factors after rotation are displayed. Rotation has the effect of
optimizing the factor structure and one consequence for these data is that the relative
importance of the seven factors is equalized. Before rotation, factor 1 accounted for
considerably more variance than the remaining three (31.867% compared to 16.302, 15.208)
but after extraction it was only 24.352% of variance (compared to 13.911, 13.672
respectively). This table pointed out 7 factors having Eigen Values greater than 1, which
could significantly explain the variance of all the variables with data loss of 6% approx.
4 ) Scree Plot
26
The component number 7 reflects the point of inflexion on the curve. The curve reaches a
stable plateau after that .
5) Component Matrix
27
28
29
This matrix contains the loadings of each variable onto each factor. 7 components are created.
Table indicates the variation of each variable explained by each factor. Some of the variables
had insignificant variations explained by all the factors. So the values were adjusted in next
table.
30
31
32
33
The table gives the refined view of the previous tables with all variables being significantly
explained by one factor or the other using Varimax iterations. 74 variables were clubbed into
7 factors.
34
SAMPLE PROBLEM 5
Objective
A survey was conducted by a University about National Merit Twin study taking various
parameters which can affect the score like twin pair number, Sex, Zygosity, NMSQT subject
test etc. A questionnaire was prepared in this regard with 17 questions and 1678 respondents.
KMO = 0.83
Bartlett Test
As null hypothesis was rejected, it is concluded that Correlation Matrix is not equal to
Identity matrix. There existed correlation among variables and hence Factor reduction test
could be applied upon.
Communalities
The following table consisted of values from 0.019 to 1. This indicates that all the factors
could not significantly explain variances of all the variables but some did explain the variance
of all the factors.
35
Total Variance Explained
This table pointed out 3 factors having Eigen Values greater than 1, which could significantly
explain the variance of all the variables with data loss of 25% approx.
36
SCREE PLOT
Component Matrix
Table indicates the variation of each variable explained by each factor. Some of the variables
had insignificant variations explained by all the factors. So the values were adjusted in next
table.
37
Rotated Component Matrix
The table gives the refined view of the previous tables with all variables being significantly
explained by one factor or the other using Varimax iterations. 17 variables were clubbed into
3 factors.
38
SAMPLE QUESTION 6
Objective
A survey was conducted where different people were asked to rate themselves on a scale of 1
to 5 on various personality types. The data set is formed out of these scores. The number of
personality types were 44, with 459 being the number of total respondents. The aim of doing
factor analysis is to reduce the number of variables into factors, that contain variables that are
inter-dependent.
The KMO value is 0.841. This means the sample of respondents is adequate. Any value
above 0.5 is considered adequate.
The value of Sigma is 0.000 which means it is very small. Thus, we can reject the null
hypothesis that the correlation matrix is an identity matrix. It implies that all the variables are
not independent of each other.
df 946
Sig. .000
Communalities
This table gives an idea of how much of the variance can be explained by each variable. In
the output table, most of the variables have a value of over 0.5.
Communalities
39
Initial Extraction
generates enthusiasm in
1.000 .624
others
40
assertive 1.000 .571
41
This table calculated the eigenvalues of all the variables. It then removes the variables having
eigenvalues less than 1. It reduces the number of factors of 10, that combined explain 58.2%
of the variance. It also gives a rotated value to give a more appropriate value of the variance
explained by the 10 factors.
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
Component Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of Variance Cumulative %
42
20 .660 1.500 76.502
43
SCREE PLOT
Component Matrix
Component Matrixa
Component
1 2 3 4 5 6 7 8 9 10
talkative .551 -.285 .278 -.490 .035 -.017 .041 -.076 .181 -.067
finds fault -.185 -.195 .354 .081 .539 -.019 -.130 -.190 .112 -.053
44
does a thorough job .510 .434 .065 -.002 .316 .053 .100 -.056 .105 -.038
depressed -.516 .262 .329 -.089 .045 -.070 .104 .034 -.105 .357
original .568 -.083 .277 .226 -.119 -.179 .197 .211 .053 -.059
reserved -.369 .526 -.023 .380 .015 .200 .073 -.233 -.040 -.054
helpful .532 .299 -.001 -.216 -.193 .042 .108 .082 .222 .323
careless -.468 -.340 .200 -.079 -.225 .303 .039 .057 .000 -.077
relaxed .402 -.244 -.393 .363 -.068 .242 .112 .166 .136 .013
curious .313 -.006 .444 .165 -.002 .101 .156 -.185 -.186 -.280
full of energy .353 -.306 .159 .172 .147 .389 -.206 -.014 -.207 -.077
starts quarrels -.294 -.263 .185 .104 .279 .100 -.015 .249 .233 .201
reliable .556 .410 .036 -.101 .124 -.012 .093 -.047 .197 -.277
tense -.379 .310 .499 -.241 .117 .070 -.006 .003 -.176 .093
ingenious .166 .160 .366 .403 -.047 -.045 .132 .477 -.108 .063
generates enthusiasm
.459 -.334 .327 -.018 .100 .278 .007 -.026 -.299 .131
in others
forgiving .252 -.012 -.015 -.114 -.275 .397 .038 -.199 -.019 .007
disorganized -.382 -.370 .059 -.072 -.457 .185 .050 .043 -.012 -.102
worries -.189 .475 .419 -.211 -.055 -.046 -.110 .244 -.092 -.116
imaginative .301 -.052 .506 .125 -.144 -.060 .230 -.045 -.015 -.256
quiet -.435 .552 -.143 .399 -.082 .228 .047 -.088 -.019 -.039
trusting .448 .163 -.006 -.343 -.266 .298 -.031 .047 .107 .009
lazy -.562 -.388 .123 -.017 -.203 .047 .101 .025 .107 .154
emotionally stable .419 -.310 -.301 .309 -.034 .196 .060 .070 .172 -.151
inventive .302 -.215 .364 .349 -.099 -.118 .232 -.248 -.239 .059
assertive .386 -.465 .229 .102 .328 .108 -.035 .042 -.034 .139
cold and aloof -.416 .005 .118 .255 .305 .162 -.028 .025 .380 .028
perseveres .559 .390 .031 -.033 .221 -.029 .034 -.030 .124 -.098
45
moody -.346 .123 .448 -.088 .172 .085 .159 -.069 .386 .061
values artistic
.206 .079 .327 .392 -.325 -.013 -.429 -.104 .171 .002
experiences
shy -.364 .495 -.047 .245 -.186 .277 .109 -.169 .210 -.079
considerate .517 .344 .036 -.262 -.189 .252 .063 -.028 .043 .364
efficient .390 .296 -.031 .123 .479 .126 .038 -.174 -.177 .308
calm in tense situations .299 -.263 -.358 .364 .071 .208 .213 -.136 -.041 .243
prefers routine work -.185 .158 -.133 -.007 .160 .369 -.420 .369 -.129 -.105
outgoing .494 -.460 .162 -.285 .050 .154 -.209 -.050 .039 -.052
sometimes rude -.386 -.417 .245 .103 .235 .020 .107 .085 .288 -.043
sticks to plans .481 .241 .058 .098 .295 .123 -.222 .080 .058 -.110
nervous -.465 .386 .290 -.082 -.001 .309 -.028 .015 -.141 -.085
reflective .333 .141 .343 .355 -.278 -.035 .200 .315 .025 .055
few artistic interests -.293 -.053 -.203 -.194 .281 .311 .351 .342 -.137 -.107
co-operative .530 .259 .064 -.291 -.199 .153 .015 .093 .103 .003
ditractable -.404 -.310 .281 -.142 -.162 .264 .054 -.244 .111 .032
a. 10 components extracted.
Using the Varimax rotation technique, it reduces the loadings a variable can have on a factor.
Thus, it makes the interpretation of the data easier in the output.
Component
46
1 2 3 4 5 6 7 8 9 10
talkative .142 -.658 .019 .388 .185 .081 -.056 -.015 -.207 .241
finds fault .112 -.121 .179 -.377 .270 .504 -.177 .051 -.038 .019
does a thorough job .697 .042 .005 .246 .112 .055 .088 -.049 -.063 .063
depressed -.172 .226 .584 -.067 -.111 .190 .074 -.067 -.079 -.342
original .214 -.267 -.193 .105 .127 -.046 .568 .142 -.220 .135
reserved .088 .785 .137 -.101 .047 .053 -.040 .034 -.004 -.026
helpful .314 -.126 -.043 .644 -.166 -.065 .168 .082 -.118 -.131
careless -.654 .051 .112 -.003 .146 .198 -.025 -.091 .106 .101
relaxed .056 -.003 -.747 .130 .058 -.020 .183 -.023 .108 -.040
curious .137 .001 .072 .018 .543 -.043 .239 .065 -.231 .263
full of energy .051 -.191 -.234 .027 .610 .032 .039 .113 .235 -.007
starts quarrels -.183 -.100 .012 -.174 -.028 .547 .101 -.053 .161 -.180
reliable .624 -.012 -.024 .311 .018 -.052 .085 -.021 -.089 .355
tense -.058 .141 .724 .012 .125 .159 .023 -.080 .041 -.087
ingenious .114 .065 .066 -.052 .095 .047 .746 .088 .090 -.081
generates enthusiasm
.035 -.375 -.075 .170 .632 -.014 .145 .013 -.010 -.163
in others
forgiving -.092 .060 -.141 .461 .271 -.125 -.106 .034 -.030 .061
disorganized -.723 .027 -.011 .045 .026 -.016 -.001 -.019 .025 .149
worries .074 .119 .659 .085 -.074 -.018 .225 .029 .193 .157
imaginative .041 -.102 .085 .065 .330 .014 .396 .103 -.319 .329
quiet .007 .841 .081 -.074 -.085 .008 .011 .018 .123 -.045
trusting .116 -.131 -.032 .658 .052 -.150 .000 .021 .099 .136
lazy -.671 .016 .074 -.123 -.093 .268 -.031 -.040 -.079 -.099
emotionally stable .066 -.085 -.701 .072 .125 .008 .107 .020 .061 .138
inventive -.003 -.072 -.095 -.108 .453 -.092 .306 .184 -.466 -.078
47
assertive .114 -.463 -.233 -.059 .428 .249 .112 .037 -.003 -.161
cold and aloof -.054 .271 .012 -.208 -.061 .615 -.045 .019 .115 -.005
perseveres .667 -.039 -.028 .247 .046 -.029 .091 .021 -.054 .143
moody -.068 .142 .375 .064 -.035 .598 .008 -.039 -.160 .085
values artistic
.017 .093 -.029 .076 .132 .030 .191 .744 .023 .123
experiences
shy -.052 .752 .080 .140 -.110 .134 -.038 .046 -.004 .118
considerate .285 -.039 .038 .730 .052 -.133 .064 .056 -.050 -.215
efficient .626 .073 -.040 .076 .286 .029 -.053 -.061 -.034 -.404
calm in tense situations .061 .061 -.644 .033 .205 -.016 .000 -.073 -.129 -.322
prefers routine work .016 .134 .079 -.034 .038 .015 -.054 -.036 .740 -.016
outgoing .018 -.619 -.163 .239 .350 .041 -.125 .112 .054 .131
sometimes rude -.316 -.107 -.009 -.302 .023 .576 .035 -.097 -.040 .074
sticks to plans .568 -.068 -.083 .115 .189 .050 .081 .136 .235 .095
nervous -.139 .434 .528 .048 .134 .109 -.016 -.102 .217 .051
reflective .076 .053 -.051 .175 .076 -.020 .707 .190 -.104 .025
few artistic interests -.130 .077 .000 -.044 .002 .119 .022 -.705 .277 -.034
co-operative .272 -.153 .022 .599 .013 -.158 .126 .037 .030 .140
ditractable -.550 .041 .163 .078 .197 .299 -.211 .006 -.134 .057
MANAGERIAL IMPLICATIONS
A number of managerial implications come from the factor analysis of 6 datasets. Following
are the managerial implications:
48
INTERVIEW DATASET: Having 15 variables on which interviewer evaluated. It was
found that as much as 4 factors are sufficient to cover the entire set of 15 variables. That
makes the decision maker his work easy of selecting the best suited person for the post. So,
the 15 variables in this dataset can be reduced to 4 factors on the basis of correlation among
variables.
In this dataset, a principal component analysis has been carried out with rotation. This resulted in four
correlated factors, constituting several aspects of ‘ features needed for interviews’. It turned out that
the 15 variables based on qualification appearance etc of the candidate can be reduced to 4 different
factors, which could indicate the different kinds of qualities that is considered important by
organizations.
OPINION OF EMPLOYEES: The factor analysis technique helped to reduce the no. Of
variables from 67 and clubbed them into 16 factor groups with 33% of data loss. Now these
factors could be analysed using any statistical tools.
SELF RATING: The factor analysis clubbed the initial 44 variables into a total of 10
factors. Each factor is comprised of some loadings of each variable. Thus, factor analysis
helps to reduce the data sets so that it can be easily studied, interpreted and further tools can
be applied to it easily.
CONCLUSIONS
Factor analysis is used to identify latent constructs or factors. It is commonly used to reduce variables
into a smaller set to save time and facilitate easier interpretations. In all the data sets taken, there were
49
some correlations among the variables and hence factor analysis helped in reducing the number of
variables and come up with better result.
50