You are on page 1of 10

Factor Analysis as a Tool for Survey Analysis

*Karan Khulbe, **Pradeep Kumar, #Prof. Yashwant Singh Thakur, #Course Coordinator, Shubham Dadariya,

*Department Of Business Management, Dr. Harisingh Gour Vishwavidyalaya, Sagar, MP,India.

Abstract Factor analysis is particularly suitable to extract few factors from the large number of related
variables to a more manageable number, prior to using them in other analysis such as multiple regression or
multivariate analysis of variance. It can be beneficial in developing of a questionnaire. Sometimes adding more
statements in the questionnaire fail to give clear understanding of the variables. With the help of factor analysis,
irrelevant questions can be removed from the final questionnaire. This study proposed a factor analysis to
identify the factors underlying the variables of a questionnaire to measure tourist satisfaction. In this study,
Kaiser-Meyer-Olkin measure of sampling adequacy and Bartlett’s test of Sphericity are used to assess the
factorability of the data. Determinant score is calculated to examine the multicollinearity among the variables.
To determine the number of factors to be extracted, Kaiser’s Criterion and Scree test are examined. Varimax
orthogonal factor rotation method is applied to minimize the number of variables that have high loadings on
each factor. The internal consistency is confirmed by calculating Cronbach’s alpha and composite reliability to
test the instrument accuracy. The convergent validity is established when average variance extracted is greater
than or equal to 0.5. The results have revealed that the factor analysis not only allows detecting irrelevant items
but will also allow extracting the valuable factors from the data set of a questionnaire survey. The application of
factor analysis for questionnaire evaluation provides very valuable inputs to the decision makers to focus on few
important factors rather than a large number of parameters.

KEYWORDS : factor analysis, types, method, steps, advantages, limitations.

1.Introduction
Factor Analysis is a multivariate statistical satisfaction. Therefore, in order to identify the
technique applied to a single set of variables when factors, it is necessary to understand the concept
the investigator is interested in determining which and steps to apply factor analysis for the
variables in the set form logical subsets that are questionnaire survey. Factory analysis is based on
relatively independent of one another.[1] In other the assumption that all variables correlate to some
words, factor analysis is particularly useful to degree. The variables should be measured at least
identify the factors underlying the variables by at the ordinal level. The sample size for factor
means of clubbing related variables in the same analysis should be larger but the more acceptable
factor[2] . In this paper, the main focus is given on range would be a ten-to-one ratio[3,4]. There are
the application of factor analysis to reduce huge two main approaches to factor analysis: exploratory
number of inter-correlated measures to a few factor analysis (EFA) and confirmatory factor
representative constructs or factors that can be used analysis (CFA). Exploratory factor analysis is used
for subsequent analysis[4]. The goal of the present for checking dimensionality and often used in the
work is to examine the application of factor early stages of research to gather information about
analysis of a questionnaire item to measure tourist the interrelationships among a set of variables[5].
The confirmatory factor analysis is a more complex clusters of closely related data points. Principal
and sophisticated set of techniques used in the component analysis has applications in many fields
research process to test specific hypotheses or such as population genetics, microbiome studies,
theories concerning the structure underlying a set and atmospheric science.[8]
of variables[6,7].

Applications Of PCA :
What is Factor Analysis?
1.Intelligence
It refers to a method that reduces a large variable into
a smaller variable factor. Furthermore, this technique
takes out maximum ordinary variance from all 2.Resedential Differentiation.
the variables and put them in common score.

3.Developement Indexes
Moreover, it is a part of General Linear Model
(GLM) and it believes several theories that contain
no multicollinearity, linear relationship, 4.Population Genetics
true correlation, and relevant variables into the
analysis among factors and variables. 5.Market Research And Index Of Attitudes

Types of Factor Analysis 6.Quantitative Finance

There are different methods that we use in factor


analysis from the data set: Limitations of PCA
PCA is a powerful and versatile technique, but it
1. Principal component analysis also has some limitations that you should be
It is the most common method which the researchers aware of. For example, PCA is sensitive to the
use. Also, it extracts the maximum variance and put choice of variables and the order of observations,
them into the first factor. Subsequently, it removes meaning that different selections or arrangements
the variance explained by the first factor and extracts of your data may lead to different results.
the second factor. Moreover, it goes on until the last Therefore, you should always justify your choice of
factor. variables and ensure that your data is randomly
ordered before applying PCA. Another limitation is
that PCA is not robust to missing values, meaning
that if your data has gaps or errors, PCA may not
work properly or produce inaccurate results.
Therefore, you should always handle missing
values properly before applying PCA, such as by
imputing them, deleting them, or using methods
that can deal with them, such as probabilistic PCA
or multiple imputation.

A Closer Look at PCA


(Principal Component
Principal component analysis (PCA) is
a linear dimensionality reduction technique with Analysis)
applications in data analysis, visualization and data Let’s examine the model to help us further
preprocessing. This is accomplished by linearly describe PCA. It assumes that we have a
transforming the data onto a new coordinate high-dimensional representation of data that
system (principal components) such that the is, in fact, embedded in a low-dimensional
directions capturing the largest variation in the space. We assume that L≈XV where L is
some low-rank matrix, X is the original data
data can be easily identified. Many studies use the
and V is a projection operator.
first two principal components in order to plot the
data in two dimensions and to visually identify
Our original data has n features, and we wish to formal measurement model that is used when
reduce them to k features, where k << n. With this, both observed and latent variables are assumed to
we need to project the data onto another vector be measured at the interval level. Characteristic of
space [2].
EFA is that the observed variables are first
standardized (mean of zero and standard
PCA has many interesting representations, but
we’ll look at a formulation that we’ve found to be deviation of 1). EFA is executed on the correlation
the most natural: matrix between the items. In EFA, a latent variable
is called a factor and the associations between
 To find the Principal Components (PCs), latent and observed variables are called factor
we wish to minimize the squared loadings. Factor loadings are standardized
reconstruction error between our original regression weights. Since EFA is an exploratory
data points and their projection onto technique, there is no expected distribution of
some k-dimensional vector space.
loadings; hence, it is not possible to test
 Mathematically, we need to find a
statistically whether or not factor loadings are the
projection matrix V that solves the
following optimization same across cultural groups. However, congruence
problem:argminVTV=Ik‖X−XVVT‖F2 measures, such as Tucker's ϕ, have been
 X is our original data. XV is the projection developed to indicate whether the pattern of
to the k-dimensional space, and XVV is factor loadings across items on a factor is the same
the reconstruction back to the original across cultural groups. Sufficient congruence for
space.
structural equivalence is usually taken to be found
 Projection from a 2-dimensional space to
if Tucker's ϕ exceeds 0.95. Values below 0.90 are
a 1-dimensional space can be illustrated
by the following image: taken to indicate that one or more items show
deviant factor loadings and thus show
bias. Bootstrap procedures have been developed
to test the identity of factor loadings in EFA.

EFA is used to investigate structural equivalence.


However, since it works on standardized variables
(mean of zero and standard deviation of 1), this
model is not suited to detect nonuniform and
especially uniform item bias.
EFA is often used in the multidimensional situation
where more than one latent variable is measured at
the same time. Before evaluating congruence in
this case, the factor structures should be rotated
In the image above: toward a target structure.

 The blue dots are our original data


vectors. Exploratory factor analysis can be performed
 The red dots are their projections. by the following methods :
 The blue lines are the reconstruction
1.R- type factor analysis : when factors are
errors.
 The dashed black line is the optimal calculated from the correlation matrix, then it is
projection V that we found when solving called R-type factor analysis.
our optimization problem.
We have now successfully projected the data from 2.Q-type factor analysis : when factors are
a 2-dimensional space (x,y axis) to a 1-dimensional calculated from the individual respondent, then it is
space (line). said to be Q-type factor analysis.

Exploratory factor analysis has three basic


2.Exploratory Factor decision points: (1) decide the number of
Analysis : factors, (2) choosing an extraction method,
Exploratory factor analysis (EFA) is a classical (3) choosing a rotation method.
EXPLORATORY FACTOR ANALYSIS: DECIDING EXPLORATORY FACTOR ANALYSIS: FACTOR
THE NUMBER OF FACTORS EXTRACTION

The most common approach to deciding the Once the number of factors are decided the
number of factors is to generate a scree plot. The researcher runs another factor analysis to get the
scree plot is a two dimensional graph with factors loadings for each of the factors. To do this, one has
on the x-axis and eigenvalues on the y-axis. to decided which mathematical solution to use to
Eigenvalues are produced by a process called find the loadings. There are about five basic
principal components analysis (PCA) and represent extraction methods (1) PCA, which is the default in
the variance accounted for by each underlying most packages. PCA assumes there is no
factor. They are not represented by percentages measurement error and is considered not to be a
but scores that total to the number of items. A 12- true exploratory factor analysis; (2) maximum
item scale will theoretically have 12 possible likelihood (a.k.a. canonical factoring); (3) alpha
underlying factors, each factor will have an factoring, (4) image factoring, (5) principal axis
eigenvalue that indicates the amount of variation factoring with iterated communalities (a.k.a. least
in the items accounted for by each factor. If a the squares). Without getting into the details of each
first factor has an eigenvalue of 3.0, it accounts for of these, I think the best evidence supports the
25% of the variance (3/12=.25). The total of all the use of principal axis factoring and maximum
eigenvalues will be 12 if there are 12 items, so likelihood approaches. I typically use the former.
some factors will have smaller eigenvalues. They Gorsuch (1989) recommends the latter if only a
are typically arranged in a scree plot in decending few iterations are performed (not really possible in
order like the following: most packages). Snook and Gorsuch (1989) show
that PCA can give poor estimates of the population
loadings in small samples. With larger samples,
most approaches will have similar results. The
extraction method will produce factor loadings for
every item on every extracted factor. Researchers
hope their results will show what is called simple
structure, with most items having a large loading
on one factor but small loadings on other factors.

From the scree plot you can see that the first EXPLORATORY FACTOR ANALYSIS: ROTATION
couple of factors account for most of the variance, Once an initial solution is obtained, the loadings
then the remaining factors all have small are rotated. Rotation is a way of maximizing high
eigenvalues. The term “scree” is taken from the loadings and minimizing low loadings so that the
word for the rubble at the bottom of a mountain. simplest possible structure is achieved. There are
A researcher might Factors Eigenvalues Newsom 3 two basic types of rotation: orthogonal and
SEM Winter 2005 examine this plot and decide oblique. Orthogonal means the factors are
there are 2 underlying factors and the remainder assumed to be uncorrelated with one another.
of factors are just “scree” or error variation. So, This is the default setting in all statistical packages
this approach to selecting the number of factors but is rarely a logical assumption about factors in
involves a certain amount of subjective judgment. the social sciences. Not all researchers using EFA
Another approach is called the Kaiser-Guttman realize that orthogonal rotations imply the
rule and simply states that the number of factors assumption that they probably would not
are equal to the number of factors with consciously make. Oblique rotation derives factor
eigenvalues greater than 1.0. I tend to recommend loadings based on the assumption that the factors
the scree plot approach because the Kaiser- are correlated, and this is probably most likely the
Guttman approach seems to produce many case for most measures. So, oblique rotation gives
factors. the correlation between the factors in addition to
the loadings. Here are some common algorithms
for orthogonal and oblique rotation: Orthogonal background knowledge to develop a hypothesis
rotation: varimax, quartamax, equamax. Oblique about how to measure it, then apply CFA to test
rotation: oblimin, promax, direct quartimin I am the accuracy of their ideas. Researchers use
not an expert on the advantages and structural equation modeling software to conduct
disadvantages of each of these rotation confirmatory factor analysis because it requires
algorithms, and they reportedly produce fairly processing complex data sets with advanced
similar results under most circumstances (although mathematical models and equations.
orthogonal and oblique rotations will be rather
different). I tend to use promax rotation because it CFA is a popular research and data analysis
is known to be relatively efficient at achieving procedure in the social sciences, particularly
simple oblique structure. psychology, because it can address theoretical
models and concepts that are difficult to measure,
Advantages of EFA : such as emotions and psychological symptoms. In
social sciences, these measurement systems are
Exploratory factor analysis (EFA) is generally usually survey questions, rating scales and other
used to discover the factor structure of a measure inventories. For example, a researcher may use
and to examine its internal reliability. EFA is often CFA to determine how well each mental health
recommended when researchers have no survey question shows anxiety disorder symptoms.
hypotheses about the nature of the underlying
factor structure of their measure. Key terms of CFA :
Limitation of EFA : Here are some of the basic terms to know when
conducting a confirmatory factor analysis:[9]
The major limitation behind Exploratory Factor
Analysis is its simplicity. Hence, the researcher will Observed Variable
not get a reliable inference. Therefore, Exploratory
Factor Analysis is used less as compared to An observed variable is a factor that you use to
Confirmatory Factor Analysis. measure a concept. Observed variables include the
data you record during your research. Questions
3.Confirmatory Factor on surveys often address different observed
variables.
Analysis:
For example, consider a mental health
Understanding the relationships between different professional using a survey to assess anxiety
variables is an important part of statistical analysis. symptoms. One survey question asks the
Confirmatory factor analysis is a procedure respondent to rate their stress level from one to
researchers use to determine if their theories five. Because the respondent's stress levels may
about data relationships are accurate. If you're show anxiety, and the survey provides a
interested in social research or statistics, quantifiable system to measure stress, it's an
understanding how to apply this technique can observed variable.
help you learn essential insights about your data.
Latent Variable
What is confirmatory factor
The latent variable, also known as the construct, is
analysis? the shared concept that different measurement
systems assess. Latent variables are difficult to
Confirmatory factor analysis (CFA) is a statistical
observe directly but can influence the outcome of
modeling method that assesses how accurately
the observed variables in an experiment. For
different systems measure and evaluate a concept.
example, the latent variable of anxiety may affect
With this method, researchers use their
the outcome of someone's reported stress levels. Establishing a baseline to describe the latent
Someone with anxiety may rate their stress at variable makes it possible for you to evaluate the
level five, while someone without anxiety is likely accuracy of the observed variables. You can define
to choose a lower score. Although asking about the latent variable by listing characteristics or
stress doesn't directly measure anxiety, it can still collecting additional data. For example, if you
provide researchers with insight into the want to use CFA to determine if an intake survey is
relationship between stress and anxiety. a good assessment of self-esteem, start by
defining self-esteem. You may use your
CFA exists to assess the indirect relationship professional knowledge to determine that self-
between latent and observed variables. esteem involves traits like confidence, sociability,
adaptability and goals.
Factor Loading
2.Determine measurement
Factor loading is a number that describes how
closely an observed variable corresponds with a methods :
latent variable. It is usually between zero and one,
although some data sets can produce factor Next, identify the measurement method you want
loadings higher than one when computing multiple to test and which observed variables to include.
variables. Factor loadings with a higher value have These variables are usually survey questions. You
a stronger correlation with the latent variable. may include multiple questions from the same
survey or choose questions from different surveys
For example, the data analysis on a survey for the depending on the type of analysis you want to
latent variable of anxiety produces a factor loading conduct.
score of 0.85 for question one and 0.33 for
question two. Because the factor loading in Here are some example observed variables from a
question one is higher than in question two, survey assessing self-esteem:
question one is likely better at identifying people
Rank your confidence from one to five.
with anxiety than question two.
Agree or disagree: I’m uncomfortable accepting
CFA VS EFA complements from others.

Confirmatory factor analysis and exploratory Rate your adaptability from one to five.
factor analysis are two complementary techniques
for reviewing research data. Exploratory factor 3.Collect the data :
analysis identifies possible relationships between
variables, while confirmatory factor analysis tests Gather the information you want to use in your
those relationships. Researchers with an extensive confirmatory factor analysis. Decide if you'll collect
background in a subject area often use initial responses from your own research or use
confirmatory factor analysis because they can outside data from other sources. Try to secure a
predict possible relationships in their data. They large sample size of information to ensure an
use exploratory factor analysis to learn about new accurate analysis. Once you have enough quality
patterns and identify innovative trends. information, input it into statistical modeling
software.

Steps of CFA : 4.Establish consistent


1.Specify the latent variable : parameters :
Start by determining what concept you want to Using your statistical modeling software, establish
analyze and establishing its theoretical definition. standardized parameters to evaluate the latent
and observed variables. Decide what
measurement system you want to use as the
standard and allow the software to convert all
other values to that measurement.

Example, if you want to use the rating system of


one to five, with one being "disagree" and five as
"agree," you first need to convert all other types of
questions to that format. For questions that ask
respondents to agree or disagree with a
statement, assign a value of one to "disagree" and
five to "agree." This allows the software to
Limitations of CFA
compute many data consistently.
Complexity: CFA requires a complex mathematical
model, which can be difficult to construct and
5.Compute the data : interpret, even for experienced researchers

Use your statistical modeling software to compute Assumptions: CFA assumes that the data collected
the factor loading for your data. Follow the is accurate and that the proposed model is correct.
prompts for your specific software interface to
produce your results. Most factor analysis
software puts this information in a table, although 1. One Factor Confirmatory Factor
some generate graphs and tables to express the Analysis
same information.

6.Interpretation :
Review the factor loading column of the factor
analysis table to determine how well each
observed variable relates to the latent variable.
Decide what factor loading value shows a
significant relationship, and use that to guide your
interpretation. For example, you may decide that
any variables with a factor loading of 0.75 are valid
for assessing self-esteem. If all the survey
questions have a factor loading over 0.75, you can
conclude that your survey is a good overall
measurement. 2. Two Factor Confirmatory Factor
Analysis
Advantage of CFA :
The main advantage of CFA lies in its ability to aid
researchers in bridging the often-observed gap
between theory and observation. For example, an
instrument might be developed by creating
multiple items for each of several specific
theoretical constructs.
Review Matrix

S Author Year Organisation Key Factors of Factor Analysis


NO

1 M Tavakol 2020 Factor Analysis simplifies a matrix of correlations.

2 Syed Mohammad Ather 2009 Factor Analysis (FA) attempts to simplify complex and diverse
relationships that exist among a set of observed variables by
uncovering common.

3 Abbas 2019 Factor analysis (FA) is a multivariate technique that is used to


F.M. Alkarkhi, Wasin describe the relationships between different variables under
A.A. Alqaraghuli study (observable variables) with new variables called factors,
where the number of factors is less than the number of
original variables
4 Christof Schuster, Ke- 2005 The results from a factor analysis of a correlation matrix and
Hai Yuan the corresponding covariance matrix are not identical

5 Johnny R.J. Fontaine 2005 CFA offers a measurement model based on structural
equation modeling

6 S.E. Richards, E. Holmes 2015 Reviews different chemometrics methods for the analysis of
genomics, transcriptomics, proteomics, metabolomics,
and metagenomics datasets. It discusses a range of statistical
data integration techniques.

7 R. Wehrens 2009 FA is closely related to PCA. Rather than a mapping into


lower dimensions, or, equivalently, a rotation, that is PCA, FA
aims to fit an explicit model.

8 Michael C. Ashton 2013 Factor analysis allows the researcher to reduce many specific
traits into a few more general “factors” or groups of traits,
each of which includes several of the specific traits.

9 Nerea Martín- 2019 Factor analysis is a multivariable method that uses the
Calvo, Miguel observed data to define one or several vectors (one or several
Ángel Martínez- dietary patterns) grouping the food or food groups according
González to their degree of correlation.

10 Luisa Cutillo 2019 Factor Analysis (FA) is an independence technique, in which


there is no dependent variable

Objective of the study :


To find out different types of vector analysis in details.
Advantages of factor Conclusion :
analysis : In short, Factor Analysis brings in simplicity after
reducing variables. Factor Analysis, including
1. Both objective and subjective attributes can be Principal Component Analysis, is also often used
used. along with segmentation studies. In this post, we
understood about the factor analysis method, and
2. It can be used to identify the hidden dimensions
the assumptions made before working on the
or constraints which may or may not be apparent
method.
from direct analysis.

3. It is not extremely difficult to do and at the 1.PCA :


same time its inexpensive and gives accurate
results. Principal component analysis is a technique to
summarize data, and is highly flexible depending
4. There is flexibility in naming and using on your use case. It can be valuable in both
dimensions.[10] displaying and analyzing a large number of
possibly dependent variables. Techniques of
Disadvantages performing principal component analysis range
from arbitrarily selecting principal components, to
automatically finding them until a variance is
Of factor analysis : reached.

1. The usefulness depends on the researcher’s


ability to develop a complete and accurate set of 2.EFA :
product attributes. If important attributes are
missed the value of procedure is reduced EFA is a complex multivariate statistical method
accordingly. involving many linear and sequential steps.
Exploratory factor analysis (EFA) is generally used
2. Naming of the factors can be difficult multiple to discover the factor structure of a measure and
attributes can be highly correlated with no to examine its internal reliability. EFA is often
apparent reasons. recommended when researchers have no
hypotheses about the nature of the underlying
3. If the observed variables are completely factor structure of their measure.
unrelated the factor analysis is unable to produce
meaningful pattern.
3.CFA :
4. It is not possible to know factors actually
represents, only theory can help inform the The main advantage of CFA lies in its ability to aid
researcher’s on this. researchers in bridging the often-observed gap
between theory and observation. For example, an
instrument might be developed by creating
multiple items for each of several specific
theoretical constructs
REFRENCES :
[1] Tabachnick, B.G. and Fidell, L.S., Using multivariate statistics (6th ed.), Pearson, 2013

[2] Verma, J. and Abdel-Salam, A., Testing statistical assumptions in research, John Willey & Sons
Inc., 2019.

[3] Ho, R., Handbook of univariate and multivariate data analysis and interpretation with SPSS,
Chapman & Hall/CRC, Boca Raton, 2006.

[4] Hair, J.F., Anderson, R.E., Tatham, R.L., and Black, W.C., Multivariate data analysis (5th ed.), N J:
Prentice-Hall, Upper Saddle River, 1998.

[5] Pituch, K. A. and Stevens, J., Applied multivariate statistics for the social sciences: Analyses
with SAS and IBM’s SPSS (6th ed.), Taylor & Francis, New York, 2016.

[6] Hair, J. J., Black, W.C., Babin, B. J., Anderson, R. R., Tatham, R. L., Multivariate data analysis,
Upper Saddle River, New Jersey, 2006.

[7] Pallant, J., SPSS survival manual: a step by step guide to data analysis using SPSS, Open
University Press/ Mc Graw-Hill, Maidenhead, 2010.

[8] Retrieved from wikipedia: https://en.wikipedia.org/wiki/Principal_component_analysis

[9] (2023, february 4). Retrieved from indeed: https://www.indeed.com/career-advice/career-


development/confirmatory-factor-analysis.

[10] https://www.du.ac.in/du/uploads/departments/Operational%20Research/24042020_Lect-
8%20Factor%20Analysis.pdf

You might also like