You are on page 1of 33

Vishal Mishra (IBS, Hyderabad)

Factor Analysis- Exploratory Factor Analysis


FACTOR ANALYSIS is a multivariate technique to:

• Study interdependence between variables.

• Identify an underlying factor (latent dimension) that is manifested in observed variables.

It can sometimes be argued that the interdependent variables are manifestation of


the same underlying cause (factor/dimension).

Note: Exploratory Factor Analysis is also considered as a data reduction technique.


Vishal Mishra (IBS, Hyderabad)

Factor Analysis- Exploratory Factor Analysis


Example: Academic performance of a student
Vishal Mishra (IBS, Hyderabad)

Factor Analysis- Exploratory Factor Analysis


Example: Inter-personal relationships (people skills) of an individual
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Factor Analysis (FA)

• FA uses correlations between variables to identify similarity/interdependence.

• FA combines similar variables into a factor….(Similar variables are represented


predominantly in a Factor)

Note:

• Each factor is a combination of all the variables.

• The outcome of FA is as many factors as the variables.


Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Factor Analysis (FA):

Each variables is first standardized in FA, so that its mean becomes 0 and S.D. (or variance)
becomes 1.

Only that factor which has information greater than that of a single variable is retained. This
information is the variance captured by that factor (also called as Eigen Value of that factor).

Note: Variance is considered as a measure of information.


Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Factor Analysis (FA):

Only that factor which has information greater than that of a single variable is retained. This
information is the variance captured by that factor (also called as Eigen Value of that factor).

Since variance of a standardized variable is 1. the aforementioned gives us the criteria for
identifying the number of factors that one should retain i.e. Eigen Value > 1.

Note: Since we retain only factors with EV >1, the Exploratory Factor Analysis is also
considered as a data reduction technique.
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Example*:
A company that manufactures motorcycles is interested in identifying the variables that
influence the buying behavior for motorcycles. Based on detailed literature survey, focus group
discussion and interviews with the current riders, they identified 10 variables that were
considered important. Their research team then conducted a survey with 20 motorcycle
enthusiasts for this purpose (where each identified variable was converted to a statement). The
respondents were asked to indicate on a 7-point scale (1=completely agree, 7= completely
disagree), their agreement or disagreement on a set of 10 statements relating to their perceptions
about the motorcycles and their views on some attributes of the motorcycles.

* (Adapted from Nargundkar, R. (2003). Marketing research-Text & cases 2E. Tata McGraw-Hill Education.)
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Example:
The ten statements are as follows:
1. I use a motorcycle because I can afford it.
2. It gives me a sense of liberty to own a motorcycle.
3. Low ownership cost makes a motorcycle economical in the long run.
4. A motorcycle is a man’s vehicle.
5. I feel powerful when I ride my motorcycle.
6. My acquaintances who don’t have motorcycle envy me.
7. I feel good whenever I see an advertisement of my motorcycle.
8. My motorcycle is comfortable to ride.
9. In my opinion motorcycles are safe.
10. Law makers should permit more than two people on a motorcycle.
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Example:
The ten statements are as follows:
1. I use a motorcycle because I can afford it.
2. It gives me a sense of liberty to own a motorcycle.
3. Low ownership cost makes a motorcycle economical in the long run.
4. A motorcycle is a man’s vehicle.
5. I feel powerful when I ride my motorcycle.
6. My acquaintances who don’t have motorcycle envy me.
7. I feel good whenever I see an advertisement of my motorcycle.
8. My motorcycle is comfortable to ride.
9. In my opinion motorcycles are safe.
10. Law makers should permit more than two people on a motorcycle.
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Example: The responses from 20 respondents are as follows:
S.No. v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
1 1 4 1 6 5 6 5 2 3 2
2 2 3 2 4 3 3 3 5 5 2
3 2 2 2 1 2 1 1 7 6 2
4 5 1 4 2 2 2 2 3 2 3
5 1 2 2 5 4 4 4 1 1 2
6 3 2 3 3 3 3 3 6 5 3
7 2 2 5 1 2 1 2 4 4 5
8 4 4 3 4 4 5 3 2 3 3
9 2 3 2 6 5 6 5 1 4 1
10 1 4 2 2 1 2 1 4 4 1
11 - - - - - - - - - -
12 - - - - - - - - - -
13 - - - - - - - - - -
14 - - - - - - - - - -
15 - - - - - - - - - -
16 - - - - - - - - - -
17 - - - - - - - - - -
18 - - - - - - - - - -
19 - - - - - - - - - -
20 - - - - - - - - - -
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Example: Correlations between variables: Correlation Matrix R output
In this output, diagonal elements are the correlation of a variable with itself and off-diagonal
elements are the correlations. Values above and below the diagonal are mirror images.

v1 v2 v3 v4 v5
v1 1.00000000 -0.18310493 0.54848220 0.07357736 0.17067213
v2 -0.18310493 1.00000000 -0.54780886 -0.08859109 -0.32965368
v3 0.54848220 -0.54780886 1.00000000 -0.14356106 0.00000000
v4 0.07357736 -0.08859109 -0.14356106 1.00000000 0.89801144
v5 0.17067213 -0.32965368 0.00000000 0.89801144 1.00000000
v6 0.16659801 0.06083144 -0.18330397 0.94361921 0.85244790
v7 0.19627719 -0.20748068 -0.04854521 0.91486619 0.95505454
v8 0.30923952 -0.32150944 0.43863109 -0.27727914 -0.14291756
v9 0.28267571 -0.25901360 0.40585385 -0.02709517 0.08796296
v10 0.47514388 0.02411144 0.31968303 0.08719866 0.12325736
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Example: Correlations between variables: : R output

v6 v7 v8 v9 v10
v1 0.16659801 0.19627719 0.30923952 0.28267571 0.47514388
v2 0.06083144 -0.20748068 -0.32150944 -0.25901360 0.02411144
v3 -0.18330397 -0.04854521 0.43863109 0.40585385 0.31968303
v4 0.94361921 0.91486619 -0.27727914 -0.02709517 0.08719866
v5 0.85244790 0.95505454 -0.14291756 0.08796296 0.12325736
v6 1.00000000 0.88593769 -0.32790761 -0.04784146 0.07125659
v7 0.88593769 1.00000000 -0.22223154 0.02879561 0.16708878
v8 -0.32790761 -0.22223154 1.00000000 0.82085983 0.06003911
v9 -0.04784146 0.02879561 0.82085983 1.00000000 -0.09955402
v10 0.07125659 0.16708878 0.06003911 -0.09955402 1.00000000
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Example: Correlations between variables: R output
v1 v2 v3 v4 v5
v1 1.00000000 -0.18310493 0.54848220 0.07357736 0.17067213
v2 -0.18310493 1.00000000 -0.54780886 -0.08859109 -0.32965368
v3 0.54848220 -0.54780886 1.00000000 -0.14356106 0.00000000
v4 0.07357736 -0.08859109 -0.14356106 1.00000000 0.89801144
v5 0.17067213 -0.32965368 0.00000000 0.89801144 1.00000000
v6 0.16659801 0.06083144 -0.18330397 0.94361921 0.85244790
v7 0.19627719 -0.20748068 -0.04854521 0.91486619 0.95505454
v8 0.30923952 -0.32150944 0.43863109 -0.27727914 -0.14291756
v9 0.28267571 -0.25901360 0.40585385 -0.02709517 0.08796296
v10 0.47514388 0.02411144 0.31968303 0.08719866 0.12325736
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Example: Correlations between variables: R output

v6 v7 v8 v9 v10
v1 0.16659801 0.19627719 0.30923952 0.28267571 0.47514388
v2 0.06083144 -0.20748068 -0.32150944 -0.25901360 0.02411144
v3 -0.18330397 -0.04854521 0.43863109 0.40585385 0.31968303
v4 0.94361921 0.91486619 -0.27727914 -0.02709517 0.08719866
v5 0.85244790 0.95505454 -0.14291756 0.08796296 0.12325736
v6 1.00000000 0.88593769 -0.32790761 -0.04784146 0.07125659
v7 0.88593769 1.00000000 -0.22223154 0.02879561 0.16708878
v8 -0.32790761 -0.22223154 1.00000000 0.82085983 0.06003911
v9 -0.04784146 0.02879561 0.82085983 1.00000000 -0.09955402
v10 0.07125659 0.16708878 0.06003911 -0.09955402 1.00000000
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Example: Correlations between variables

What is that value of correlation that should be taken as high?

How many of the total correlations should satisfy this condition (of being greater than a specific
value) ?
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Example: Correlations between variables

What is that value of correlation that should be taken as high?


How many of the total correlations should satisfy this conditions?

Example: With 3 variables A,B & C, the number of bi-variate correlations are (AB,BC,CA) 3.
The general formula for identifying the number of correlations is same as selecting 2 variables
out of 3 i.e. nC2

With 10 variables the number of correlations are 45


Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Assessing the overall picture (similarity/correlation between variables):

Kaiser-Meyer-Olkin’s Measure of Sampling Adequacy (K-M-O’s MSA)

It gives the proportion of common variance in the data set (calculated using correlations).

It lies between 0 and 1 and a higher value indicates more common variance i.e. adequate
correlation between variables.

As a rule of thumb, a value of >= 0.5 indicates suitability of the data for factor analysis.
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Assessing the overall picture (similarity/correlation between variables):

Kaiser-Meyer-Olkin’s Measure of Sampling Adequacy (K-M-O’s MSA): R Output

Kaiser-Meyer-Olkin factor adequacy


Call: KMO(r = FAData)

Overall MSA = 0.62


MSA for each item =
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
0.51 0.34 0.58 0.76 0.75 0.67 0.81 0.55 0.45 0.28
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Assessing the overall picture (similarity/correlation between variables):

Bartlett’s test of sphericity: Tests the null hypothesis that correlation matrix is an identity
matrix

Chi-square Test

H0: Correlation matrix is an Identity matrix


H1: Correlation matrix is not an Identity matrix A B C D
A 1 0 0 0
B 0 1 0 0
Example: Identity matrix for 4 variables A, B, C. D
C 0 0 1 0
D 0 0 0 1
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Assessing the overall picture (similarity/correlation between variables): R
Output

Bartlett’s test of sphericity: Tests the null hypothesis that correlation matrix is an identity
matrix
Chi-square Test
$chisq
[1] 164.0985
H0: Correlation matrix is an Identity matrix
H1: Correlation matrix is not an Identity matrix $p.value
[1] 1.868594e-15

$df
[1] 45
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Deciding on the number of factors: The criteria of Eigen Value

Eigen Value: Variance captured by a given factor from the total variance in the
data set

Once variables are standardized their mean becomes 0 and standard deviation
becomes 1

Therefore in our data set the total variance (after standardization) = ?


Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Deciding on the number of factors: The criteria of Eigen Value
Eigen Value: Variance captured by a given factor from the total variance in the data set.
Criteria is to select that factor whose Eigen Value is >1 (or >=1): R Output
Method: Principal Component Analysis
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4
Standard deviation 1.9704863 1.6664376 1.1724960 0.97207952
Proportion of Variance 0.3882816 0.2777014 0.1374747 0.09449386
Cumulative Proportion 0.3882816 0.6659830 0.8034577 0.89795157
Comp.5 Comp.6 Comp.7 Comp.8
Standard deviation 0.69228707 0.54066239 0.34140546 0.260742802
Proportion of Variance 0.04792614 0.02923158 0.01165577 0.006798681
Cumulative Proportion 0.94587771 0.97510929 0.98676506 0.993563742
Comp.9 Comp.10
Standard deviation 0.193330646 0.164273664
Proportion of Variance 0.003737674 0.002698584
Cumulative Proportion 0.997301416 1.000000000
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Deciding on the number of factors: The criteria of Eigen Value
Criteria is to select that factor whose Eigen Value is >1 (or >=1): R Output
Method: Principal Component Analysis

Importance of components:
Eigen Values Comp.1 Comp.2 Comp.3 Comp.4
Standard deviation 1.9704863 1.6664376 1.1724960 0.97207952
Comp.1 = 3.8828 Proportion of Variance 0.3882816 0.2777014 0.1374747 0.09449386
Comp.2 = 2.7770 Cumulative Proportion 0.3882816 0.6659830 0.8034577 0.89795157

Comp.3 = 1.3747 Comp.5 Comp.6 Comp.7 Comp.8


Standard deviation 0.69228707 0.54066239 0.34140546 0.260742802
Proportion of Variance 0.04792614 0.02923158 0.01165577 0.006798681
Cumulative Proportion 0.94587771 0.97510929 0.98676506 0.993563742
Comp.9 Comp.10
Standard deviation 0.193330646 0.164273664
Proportion of Variance 0.003737674 0.002698584
Cumulative Proportion 0.997301416 1.000000000
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Association between factors and the variables (Factor Loadings): R Output

Principal Components Analysis


Call: principal(r = FAData, nfactors = 3, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
PC1 PC2 PC3 h2 u2 com
v1 0.18 0.67 0.49 0.72 0.278 2.0
v2 -0.14 -0.61 0.25 0.45 0.548 1.5
v3 -0.11 0.82 0.22 0.73 0.269 1.2
v4 0.97 -0.04 -0.10 0.94 0.055 1.0
v5 0.95 0.17 -0.14 0.95 0.050 1.1
v6 0.95 -0.08 -0.03 0.91 0.086 1.0
v7 0.97 0.10 -0.05 0.95 0.045 1.0
v8 -0.32 0.77 -0.31 0.80 0.201 1.7
v9 -0.07 0.74 -0.48 0.78 0.223 1.7
v10 0.16 0.32 0.81 0.79 0.211 1.4
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Association between factors and the variables (Factor Loadings): R Output
Rearranged in spreadsheet

Computations of: PC1 PC2 PC3 h2 u2 com


v1 0.18 0.67 0.49 0.72 0.278 2
Eigen Value (the amount of v2 -0.14 -0.61 0.25 0.45 0.548 1.5
variance captured by a factor): v3 -0.11 0.82 0.22 0.73 0.269 1.2
v4 0.97 -0.04 -0.1 0.94 0.055 1
sum of squared loadings for a v5 0.95 0.17 -0.14 0.95 0.05 1.1
factor v6 0.95 -0.08 -0.03 0.91 0.086 1
v7 0.97 0.1 -0.05 0.95 0.045 1
v8 -0.32 0.77 -0.31 0.8 0.201 1.7
Communality, h2 (the amount of v9 -0.07 0.74 -0.48 0.78 0.223 1.7
v10 0.16 0.32 0.81 0.79 0.211 1.4
variance of a variable captured
by the extracted factors): sum of squared loadings for a variable
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Association between factors and the variables (Factor Loadings): R Output

PC1 PC2 PC3


SS loadings 3.88 2.78 1.37
Proportion Var 0.39 0.28 0.14
Cumulative Var 0.39 0.67 0.80
Proportion Explained 0.48 0.35 0.17
Cumulative Proportion 0.48 0.83 1.00

Mean item complexity = 1.4


Test of the hypothesis that 3 components are sufficient.

The root mean square of the residuals (RMSR) is 0.08


with the empirical chi square 12.59 with prob < 0.82

Fit based upon off diagonal values = 0.96


Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Association between factors and the variables (Factor Loadings): R Output

Principal Components Analysis


Call: principal(r = FAData, nfactors = 3, rotate = "none")
The problem of cross-loading: Standardized loadings (pattern matrix) based upon correlation matrix
Issue of interpretation PC1 PC2 PC3 h2 u2 com
v1 0.18 0.67 0.49 0.72 0.278 2.0
v2 -0.14 -0.61 0.25 0.45 0.548 1.5
v3 -0.11 0.82 0.22 0.73 0.269 1.2
v4 0.97 -0.04 -0.10 0.94 0.055 1.0
v5 0.95 0.17 -0.14 0.95 0.050 1.1
v6 0.95 -0.08 -0.03 0.91 0.086 1.0
v7 0.97 0.10 -0.05 0.95 0.045 1.0
v8 -0.32 0.77 -0.31 0.80 0.201 1.7
v9 -0.07 0.74 -0.48 0.78 0.223 1.7
v10 0.16 0.32 0.81 0.79 0.211 1.4
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Association between factors and the variables (Rotated Factor Loadings)
Solution to the problem of cross-loadings:
Factor Rotation
It is a mathematical transformation (rotation of axis) to make the interpretation of relationship
between factors and variables more meaningful. The agenda is to ensure that a variable loads
(correlates) highly on only one factor and low with other factors. A factor can then be
understood/interpreted in terms of the variables that load high on it.

Rotations are of two types: Oblique and Orthogonal

(We use orthogonal rotation, Varimax: This ensures that the extracted factors are not
related/correlated to each other… it is then said that they are orthogonal to each other)
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Association between factors and the variables (Rotated Factor Loadings): R
Output
Principal Components Analysis
Interpretation of Factors Call: principal(r = FAData, nfactors = 3, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
RC1 RC2 RC3 h2 u2 com
v1 0.13 0.31 0.78 0.72 0.278 1.4
v2 -0.18 -0.64 -0.11 0.45 0.548 1.2
v3 -0.12 0.60 0.59 0.73 0.269 2.1
v4 0.97 -0.06 -0.01 0.94 0.055 1.0
v5 0.96 0.13 0.06 0.95 0.050 1.0
v6 0.95 -0.14 0.03 0.91 0.086 1.0
v7 0.97 0.02 0.11 0.95 0.045 1.0
v8 -0.26 0.85 0.10 0.80 0.201 1.2
v9 0.01 0.88 -0.04 0.78 0.223 1.0
v10 0.06 -0.15 0.87 0.79 0.211 1.1
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Association between factors and the variables (Rotated Factor Loadings): R
Output
Principal Components Analysis
Interpretation of Factors Call: principal(r = FAData, nfactors = 3, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
RC1 RC2 RC3 h2 u2 com
v1 0.13 0.31 0.78 0.72 0.278 1.4
v2 -0.18 -0.64 -0.11 0.45 0.548 1.2
v3 -0.12 0.60 0.59 0.73 0.269 2.1
v4 0.97 -0.06 -0.01 0.94 0.055 1.0
v5 0.96 0.13 0.06 0.95 0.050 1.0
v6 0.95 -0.14 0.03 0.91 0.086 1.0
v7 0.97 0.02 0.11 0.95 0.045 1.0
v8 -0.26 0.85 0.10 0.80 0.201 1.2
v9 0.01 0.88 -0.04 0.78 0.223 1.0
v10 0.06 -0.15 0.87 0.79 0.211 1.1
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Association between factors and the variables (Rotated Factor Loadings): R
Output
RC1 RC2 RC3
Impact of rotation on: SS loadings 3.84 2.43 1.76
Proportion Var 0.38 0.24 0.18
Cumulative Var 0.38 0.63 0.80
Eigen Value Proportion Explained 0.48 0.30 0.22
Cumulative Proportion 0.48 0.78 1.00
Communality
Mean item complexity = 1.2
Test of the hypothesis that 3 components are sufficient.

The root mean square of the residuals (RMSR) is 0.08


with the empirical chi square 12.59 with prob < 0.82

Fit based upon off diagonal values = 0.96


Vishal Mishra (IBS, Hyderabad)
Exploratory Factor Analysis
$scores
Factors as substitute for variables RC1 RC2 RC3
[1,] 1.68498941 -0.40883592 -1.10007932
(Factor Scores): R Output [2,] 0.23413311 0.98275950 -0.77439118
[3,] -0.99919009 1.76950803 -0.89370776
[4,] -0.56421453 0.35204784 1.50798998
[5,] 0.92295965 -0.62889268 -0.61124487
[6,] 0.05442922 1.39461768 0.16137540
[7,] -0.97169979 0.74217256 1.34455035
[8,] 0.64412946 -0.24776795 0.83929447
[9,] 1.76282589 0.17188581 -1.01006652
[10,] -0.91150445 0.33515935 -1.25122170
[11,] -0.27342801 -0.87615600 -1.23328916
[12,] -1.24611228 -1.41824582 -0.72313602
[13,] 0.37233153 1.79509893 0.34304829
[14,] -0.44860354 -0.17411842 -0.40117163
[15,] -0.40275116 -1.81627045 1.13807409
[16,] -0.76376112 0.22642548 1.40836014
[17,] -0.94760644 -1.31369165 -0.05849328
[18,] -0.66786395 -0.41270609 -0.55202665
[19,] 0.42308529 0.01603588 0.23697567
[20,] 2.09785181 -0.48902609 1.62915969
Vishal Mishra (IBS, Hyderabad)

Exploratory Factor Analysis


Factors as substitute for variables
(Factor Scores): R Output $weights
RC1 RC2 RC3
v1 0.003993844 0.022975660 0.433896541
v2 -0.063221506 -0.278140315 0.042550434
v3 -0.041176547 0.176140798 0.283432456
v4 0.255619330 0.003455988 -0.042407756
v5 0.256862668 0.080498530 -0.029581569
v6 0.244738110 -0.037589131 -0.006735316
v7 0.253104824 0.025568865 0.013745003
v8 -0.047225499 0.359600883 -0.057068289
v9 0.033361048 0.405662823 -0.166395091
v10 -0.032469842 -0.202811755 0.568262124

$r.scores
RC1 RC2 RC3
RC1 1.000000e+00 2.109424e-15 -1.026956e-15
RC2 2.123302e-15 1.000000e+00 9.436896e-16
RC3 -1.016548e-15 9.159340e-16 1.000000e+00

You might also like