You are on page 1of 21

Factor Analysis

Dr. R. RAVANAN
Joint Director of Collegiate Education, Chennai Region
Former Principal, Presidency College, Chennai
Former Head, Dept of Statistics, Presidency College

E-mail: ravananstat@gmail.com

Mobile: 98403 75672 / 94442 21627


Introduction
• Factor analysis is a statistical techniques to study the
inter-relationships among the variables in an effort to
find a new set of factors, fever in number than the
original variables so that the factors are common
among the original variables.
• There is difference between factor analysis and
principal component analysis.
• In principal component analysis the components are
so selected that they can explain maximum variation
in the original data set.
• In factor analysis a small number of common factors
are extracted so that these common factors are
sufficient to study the relationships of original
variables.
Aims of Factor Analysis
• Factor analysis helps the researcher to reduce the number of variables to
be analyzed, thereby making the analysis easier.
• For example, Consider a market researcher at a credit card company who
wants to evaluate the credit card usage and behaviour of customers, using
various variables. The variables include age, gender, marital status, income
level, education, employment status, credit history and family background.
• Analysis based on a wide range of variables can be tedious and time
consuming.
• Using Factor Analysis, the researcher can reduce the large number of
variables into a few dimensions called factors that summarize the available
data.
• Its aims at grouping the original input variables into factors which
underlying the input variables.
• For example, age, gender, marital status can be combined under a factor
called demographic characteristics. The income level, education,
employment status can be combined under a factor called socio-economic
status. The credit card and family background can be combined under
factor called background status.
Benefits of Factor Analysis
• To identify the hidden dimensions or construct which
may not be apparent from direct analysis

• To identify relationships between variables

• It helps in data reduction

• It helps the researcher to cluster the product and


population being analyzed.
Terminology in Factor Analysis
• Factor: A factor is an underlying construct or dimension that
represent a set of observed variables. In the credit card
company example, the demographic characteristics, socio
economic status and background status represent a set of
variables.
• Factor Loadings: Factor loading help in interpreting and
labeling the factors. It measure how closely the variables in the
factor are associated. It is also called factor-variable
correlation. Factor loadings are correlation coefficients between
the variables and the factors.
• Eigen Values: Eigen values measure the variance in all the
variables corresponding to the factor. Eigen values are
calculated by adding the squares of factor loading of all the
variables in the factor. It aid in explaining the importance of the
factor with respect to variables.Generally factors with eigen
values more than 1.0 are considered stable. The factors that
have low eigen values (<1.0) may not explain the variance in
the variables related to that factor.
Terminology in Factor Analysis
• Communalities: Communalities, denoted by h2, measure the
percentage of variance in each variable explained by the
factors extracted. It ranges from 0 to 1. A high communality
value indicates that the maximum amount of the variance in
the variable is explained by the factors extracted from the
factor analysis.
• Total Variance explained: The total variance explained is the
percentage of total variance of the variables explained. This is
calculating by adding all the communality values of each
variable and dividing it by the number of variables.
• Factor Variance explained: The factor variance explained is
the percentage of total variance of the variables explained by
the factors. This is calculating by adding the squared factor
loadings of all the variables and dividing it by the number of
variables.
Procedure followed for Factor Analysis

• Define the problem


• Construct the correlation matrix that measures the
relationship between the factors and the variables.
• Select an appropriate factor analysis method
• Determine the number of factors
• Rotation of factors
• Interpret the factors
• Determine the factor scores
Application Areas/Example

1. In marketing research, a common application area of Factor Analysis is to


understand underlying motives of consumers who buy a product category or a
brand

2. The worked out example in the chapter will help clarify the use of Factor
Analysis in Marketing Research

3. In this example, we assume that a two wheeler manufacturer is interested in


determining which variables his potential customers think about when they
consider his product

4. Let us assume that twenty two-wheeler owners were surveyed by this


manufacturer (or by a marketing research company on his behalf). They were
asked to indicate on a seven point scale (1=Completely Agree, 7=Completely
Disagree), their agreement or disagreement with a set of ten statements
relating to their perceptions and some attributes of the two-wheelers.

5. The objective of doing Factor Analysis is to find underlying "factors" which


would be fewer than 10 in number, but would be linear combinations of some
of the original 10 variables.
The research design for data collection can be stated as follows-

Twenty 2-wheeler users were surveyed about their perceptions and image
attributes of the vehicles they owned. Ten questions were asked to each of
them, all answered on a scale of 1 to 7 (1= completely agree, 7= completely
disagree).

1. I use a 2-wheeler because it is affordable.


2. It gives me a sense of freedom to own a 2-wheeler.
3. Low maintenance cost makes a 2-wheeler very economical in the long
run.
4. A 2-wheeler is essentially a man’s vehicle.
5. I feel very powerful when I am on my 2-wheeler.
6. Some of my friends who don’t have their own vehicle are jealous of me.
7. I feel good whenever I see the ad for 2-wheeler on T.V., in a magazine or
on a hoarding.
8. My vehicle gives me a comfortable ride.
9. I think 2-wheelers are a safe way to travel.
10. Three people should be legally allowed to travel on a 2-wheeler.
Now we will attempt to interpret factor 2. We look in fig 4,
down the column for Factor 2, and find that variables 8 and 9
have high loadings of 0.85203 and 0.87772, respectively. This
indicates that factor 2 is a combination of these two variables.

But if we look at fig. 2, the unrotated factor matrix, a slightly


different picture emerges. Here, variable 3 also has a high
loading on factor 2, along with variables 8 and 9. It is left to
the researcher which interpretation he wants to use, as there are
no hard and fast rules. Assuming we decide to use all three
variables, the related statements are “low maintenance”,
“comfort” and “safety” (from statements 3, 8 and 9). We may
combine these variables into a factor called “utility” or
“functional features” or any other similar word or phrase which
captures the essence of these three statements / variables.
For interpreting Factor 3, we look at the column labelled factor 3
in fig. 4 and find that variables 1 and 10 are loaded high on factor
3. According to the unrotated factor matrix of fig. 2, only
variable 10 loads high on factor 3. Supposing we stick to fig. 4,
then the combination of “affordability’ and “cost saving by 3
people legally riding on a 2-wheeler” give the impression that
factor 3 could be “economy” or “low cost”.

We have now completed interpretation of the 3 factors with


eigen values of 1 or more. We will now look at some additional
issues which may be of importance in using factor analysis.
Additional Issues in Interpreting Solutions

We must guard against the possibility that a variable may load


highly on more than one factors. Strictly speaking, a variable should
load close to 1.00 on one and only one factor, and load close to 0 on
the other factors. If this is not the case, it indicates that either the
sample of respondents have more than one opinion about the
variable, or that the question/ variable may be unclear in its
phrasing.

The other issue important in practical use of factor analysis is the


answer to the question ‘what should be considered a high loading
and what is not a high loading?” Here, unfortunately, there is no
clear-cut guideline, and many a time, we must look at relative
values in the factor matrix. Sometimes, 0.7 may be treated as a high
value, while sometimes 0.9 could be the cutoff for high values.
The proportion of variance in any one of the original variables which is
captured by the extracted factors is known as Communality. For example,
fig. 3 tells us that after 3 factors were extracted and retained, the
communality is 0.72243 for variable 1, 0.45214 for variable 2 and so on
(from the column labelled communality in fig. 3).

This means that 0.72243 or 72.24 percent of the variance (information


content) of variable 1 is being captured by our 3 extracted factors together.
Variable 2 exhibits a low communality value of 0.45214. This implies that
only 45.214 percent of the variance in variable 2 is captured by our extracted
factors. This may also partially explain why variable 2 is not appearing in our
final interpretation of the factors (in the earlier section). It is possible that
variable 2 is an independent variable which is not combining well with any
other variable, and therefore should be further investigated separately.
“Freedom” could be a different concept in the minds of our target audience.

As a final comment, it is again the author’s recommendation that we use the


rotated factor matrix (rather than unrotated factor matrix) for interpreting
factors, particularly when we use the principal components method for
extraction of factors in stage 1.
Orthogonal Rotations
• Varimax: Minimize the complexity of the
components by making the large loadings larger
and the small loadings smaller within each
component.

• Quartimax: Makes large loadings larger and small


loadings smaller within each variable.

• Equamax: A compromize between these two.

You might also like