CHAPTER 7: Multi-Variate Technique 1: Factor Analysis
FACTOR ANALYSIS
• Factor analysis aims to simplify complex datasets by identifying a smaller set of unobserved factors that explain the
variance in a larger number of observed variables. These latent factors are not directly measurable but play a crucial
role in shaping the observed data.
• Factor Analysis (FA) is also known as exploratory factor analysis (EFA). It is called EFA to differentiate it from
confirmatory factor analysis (CFA).
SOCIO ECONOMIC STATUS
• Factor Analysis (FA) is also known as exploratory factor analysis (EFA). It is called EFA to differentiate it from
confirmatory factor analysis (CFA).
• CFA is a technique based on covariance-based structural equation modeling (CB-SEM). FA or EFA is a major
technique in multivariate statistics. The foundations for developing CB-SEM is FA or EFA and multiple regression
analysis (MRA). If a student wants to learn CB-SEM, he or she should learn and understand FA and MRA first.
• FA and CFA are based on interdependence techniques, where the variables are not referred to as independent,
dependent, or moderating.
Factor Analysis (FA) is like assembling a jigsaw puzzle.
• Factor Analysis – processed by forming the variables into a structure called factors.
• These variables come from the item-indicators or item-questions that describe the factor which is initially called
pseudo-factor.
• Pseudo Factor – temporary label for the variables (item-questions) to be grouped in the survey questionnaire.
• Once the structure is formed through FA, the factors will be renamed based on the attributes of the variables that
comprise each factor.
Why Is It Useful?
• Simplification
• Understanding
• Research and Surveys
Factor Analysis (FA) is like discovering hidden treasures in a forest.
Based on variance, FA can be further categorized into principal component analysis and common factor analysis.
1. PRINCIPAL COMPONENT ANALYSIS 2. COMMON FACTOR ANALYSIS
Considers the total variance and derives factors Considers only the common shared variance,
that contain small proportions of unique variance assuming that both the unique and error variance
and, in some instances, error variances. are not of interest in defining the structure of the
variables.
Regarding the sample size in Factor Analysis, a sample size of 100 (observations) or larger is preferable (although FA can
factor-analyze at least a sample size of 50). It should be 200 or higher, if you will split the samples for validation purposes.
GENERAL RULE: The minimum is to have at least five times as many observations as the number of variables to be analyzed,
and the more acceptable sample size would have a 10:1 ratio. In other words, the ideal scenario would be to have ten
observations for every variable you are analyzing.
ANALYSIS AND INTERPRETATIONS OF EFA RESULTS
The EFA results will be analyzed and interpreted based on the following output:
1. Measure of sampling adequacy (MSA) through Kaiser-Meyer-Olkin (KMO) Test and Bartlett’s test of sphericity.
Kaiser-Meyer-Olkin (KMO) Test – measures the sampling adequacy. The measure can be interpreted with the
following guidelines: 0.90 or above, marvelous; 0.80 or above, meritorious ; 0.70 above, middling; 0.60 above,
mediocre; 0.50 above, miserable; and below 0.50, unacceptable.
Bartlett’s Test – it should be significant (0.05 or lesser) to make the factor analysis reliable.
2. Total variance explained
The cumulative variance explained should be 60% higher to make it valid and reliable. Rotated (rotation sums of
squared loadings) should be applied for easy interpretation.
3. Communalities of variables
Each observed variable’s communality is the proportion of variance in that variable that can be explained by the
common factors. It is calculated by summing the squared factor loadings for each variable. Communality below
0.50 is interpreted as failing to explain the other variables (Burns and Burns, 2008).
4. Scree plot
A graphical presentation of factor as they are shown based on their eigenvalue. Ideally for a factor to be
considered significant, it should have an eigenvalue of greater than 1 (Kaiser criterion).
5. Unrotated or rotated factor loadings
It is better to use rotated factor loadings as it would make the interpretation easier. Factor Loadings should be
interpreted in terms of their significance, which is also linked to the sample size.
When checking with the rotated output which
factor loads significantly, it is important to note
that a variable can cross-load to more than one
factor loading, the choice should be the variable
that loads the greatest.
APPLICATION OF FACTOR ANALYSIS USING PRINCIPAL COMPONENT ANALYSIS
Example of how to use the exploratory factor analysis
Main Problem: What attracts the
shoppers to Davao City Malls?
Subproblem: What are the factors
influencing mall choice? (to be
measured using EFA)
STEPS IN PROCESSING FACTOR ANALYSIS
1. Encode in an Excel ( although it can be done directly through SPSS, it is easier done in Excel) template (assuming you
have designed an encoding template)
2. Import the Excel data from the SPSS environment or interface through:
a. Open the SPSS software; On the command menu: File > Open > Data > Look in: (where the file is located –
either drive C, drive D or E, or desktop); In the Files of type: change it from SPSS (.sav) to Excel (.xls, xlsx,
xlsm).
b. Press Open > Worksheet – select the sheet your file is stored – then press Ok.
c. On the SPSS interface, accept or change the data name, type width, decimal, label, values, and measure the
variable view.
3. Click Analyze > Dimension Reduction > Factor (and shows Factor analysis dialog box)
4. Enter all variables (scales) into the Variables box
5. Then press the Descriptives button
6. Tick Statistics (initial solution) and Correlation matrix (coefficients, significance levels, determinant, KMO, and
Bartlett’s test of sphericity), or you can tick them if you want to see all the results. Then press the Continue button.
7. Press the Extraction button and tick the Unrotated factor solution (however, you can omit this if you just want the
rotated factor solution) and Scree plot.
8. Press Continue and then Rotation button. Tick Varimax (none is the default). Tick the Rotated solution and Loading
plot. Note that the Maximum Iterations for Convergence is 25 (default). If you cannot find significant convergence,
then increase it to, say 50 or 75.
9. If you need to save the variables for other applications, press Continue and then Scores. Tick Save as variables.
Otherwise, skip this part.
10. Press Continue and then press the Options button. The default is Exclude cases listwise (which means that it will
remove the date with missing values). If you want the data with missing values to be replaced with mean values, tick
Replace with mean. Tick Sorted by size (so that the variables will be arranged from the greatest to least value).
Tick Suppress small coefficients and enter the value of significant factor loading based on sample size.
11. When everything is included, you can press the Ok button to start FA processing.
FA OUTPUT AND INTERPRETATION
• The KMO measure of sampling adequacy is a measure of the proportion of variance in the data that might be caused
by underlying factors. It ranges from 0 to 1, with higher values indicating a better fit for factor analysis. A KMO value of
0.5 or higher is generally considered to be acceptable, and a value of 0.8 or higher is considered to be good.
• Bartlett’s test of sphericity is a statistical test that tests the null hypothesis that the correlation matrix is an identity
matrix. An identity correlation matrix indicates that the variables are unrelated, which would make them unsuitable
for factor analysis. A significant result from Bartlett’s test (p-value < 0.05) indicates that the null hypothesis can be
rejected and that the variables are likely to be correlated, which is a necessary condition for factor analysis.
The KMO value is 0.832, which is above the recommended threshold of 0.8. This suggests that the data is well-suited for factor
analysis.
The p-value for Bartlett's test is 0.000, which is less than 0.05. This indicates that the null hypothesis can be rejected and that
the variables are likely to be correlated.
Overall, the results of the KMO and Bartlett's tests suggest that the data in the image is well-suited for factor analysis.
COMMUNALITIES
• To interpret communalities, it is important to remember that they are a proportion of the variance explained. This
means that a communality of 0.5 does not mean that half of the variance in the variable is explained by the common
factors. It means that the common factors explain 50% of the variance that is explained by all factors, including the
common factors and the specific factors.
• Communality below 0.50 is interpreted as failing to explain the other variables, thus a candidate for deletion,
especially if it has a low factor loading.
TOTAL VARIANCE EXPLAINED
• Factor analysis (FA) is considered reliable when its total variance explained (TVE) is at least 0.60. This means that the
common factors should be able to explain at least 60% of the variance in the observed variables.
Scree Plot
• A scree plot is a graphical representation of the eigenvalues of the factors extracted in a factor analysis.
• Kaiser’s rule is a heuristic rule for determining the number of significant factors to retain in a factor analysis.
• To interpret a scree plot and Kaiser’s rule, look for the elbow in the scree plot. The factors before the elbow are
generally considered to be significant, while the factors after the elbow are generally considered to be insignificant.
• Kaiser’s rule states that only factors with eigenvalues greater than 1 should be considered as significant.
Rotated Component Matrix
• A rotated component matrix is a table of factor loadings that have been rotated to make them easier to interpret.
• The rotation is done in such a way that the factor loadings for each variable are concentrated on a single factor.
• This makes it easier to identify which variables are most strongly associated with which factors.
• The rotated component matrix in the previous examples shows nine factors.
• The factor loading for each variable is shaded to indicate which factor it loads the highest on.
• The rotated component matrix can be used to interpret the results of a factor analysis.
What Interpretation of Factor Analysis can be Drawn Based on the Previous Example?
• factor loading of 0.50 or higher is considered practically significant.
• The following variables have practically significant factor loadings:
• Mall Reasonable Prices: Med GS Price, High GS Price, Med DS Price, Low DS Price, Low GS Price, High DS Price.
• Mall Promotion and Entertainment: Entertain Live, Entertain Facilities, Promo VM, Promo Ads, Promo Billboard,
Entertain Social, Promo Sale
• Mall Atmospherics and Comfort: Atmos Lighting, Atmos CR Location, Atmos Color, Atmos Escalator, Atmos Benches,
Atmos Music, Atmos Aircon
What interpretation of factor analysis can be drawn based on the previous example?
• Mall Product Variety: Product Variety DS, Product Assort GS, Product Assort DS, Product Variety GS.
• Mall Service: Service Guards, Service Clerks, Service CAC.
• Mall Density: Atmos GS Crowded, Atmos DS Crowded.
• Mall Accessibility to Public Vehicles: Acce Publictrans, Access Location.
• Mall Accessibility to Private Vehicles: Access Near Park, Access Private Park, Atmos CR Clean
• Mall Smell: Atmos Odor, Atmos Supermarket
VALIDATING FA USING SPLIT SAMPLES THROUGH SPSS
1. Open the main file (where you use for Factor Analysis) SPSS file
2. Go to Data menu and click Select Cases
3. On the select cases box, click Random sample of cases. Set the parameter by clicking Sample...
4. Select Exactly. On the first box, input the number of desired random cases to be extracted. On the second box, input
the total number of cases in your dataset. In this example, tick Exactly 100 cases (for split 1) from the first 200 cases.
Then click Continue.
5. On the Output options, select Copy selected cases to a new dataset. Enter the desired name of the split sample 1.
Then press Ok.
6. A new window opens Mall Split. Save the data file for Split 1 (Malls Split1).
7. For the second split sample (Split2), go to Data menu and click Select Cases. Just change the Dataset name (in this
example – Malls_Split2). Click Ok.
8. A new window opens- Malls Split2. Save the data file for split 2 (Malls_Split2).
COMPARISON OF MAIN FA WITH SPLIT 1 AND SPLIT 2
• Based on the validation of split samples, the results are stable within the two samples (but do not expect the data to
be very identical; the only way to have this kind of result is when the split samples are taken directly from the
population, which is tedious if the population is large); and therefore, the data structure can be generalized across the
population.
Factor analysis is a multivariate technique that is used to identify underlying factors that explain the variation in a set of
observed variables. It is a powerful tool for data reduction and dimensionality reduction.
Key points:
• Factor analysis is based on the interdependence technique, where the variables are not labeled as dependent,
independent, or moderating.
• The relationship between the variables is manifested with a line with two-headed arrows.
• Factor analysis is processed by forming the variables into structures called factors.
• The variables come from the item-indicators that describe the factors, which initially are called pseudo factors.
• Once the structure is formed into factors, the factors will be renamed based on the attributes of the variables that
comprise each factor.