You are on page 1of 4


Factor analysis

Two Use Cases

1. Reducing data dimensionality 2. Identifying latent dimensions

Goals Goals
• Reduction of the number of observed variables into a  Identification of latent structures (factors) among many
smaller number of uncorrelated factors (artificial variables) observable variables which are the cause of correlative
that will account for most of the variance in the observed relationships between the variables.
initial variables.
• The factors are manifest (artificial) variables, which are
linear combinations of the original variables (no
assumption about an underlying causal model).
• The factors are uncorrelated with each other and are
ordered so that the first few factors retain most of the
variance of the original variables.

Part-time MBA in Business Innovation 2021-2023 Customer Analytics


Theoretical foundation
 Factor analysis assumes that the total variance can be partitioned into common and unique variance.
Total common variance is equal to total variance explained but does not equal total variance. There is a
unique variance that is specific to a variable and includes an error component. 

 Observed variables represent some of the latent constructs to be measured. The measurement can be accomplished with
some imprecision (error).
 All variables can be related to some factors with larger or smaller weight (factor loading).

Part-time MBA in Business Innovation 2021-2023 Customer Analytics


 Communality. This is the proportion of each variable’s variance that can be explained by the factors. It is also noted as h2 and
can be defined as the sum of squared factor loadings for the variables.

o Communalityk = lk12 + lk22 + … + lkm2

o Communality + Specificity = 1, for every variable Xk

 Eigenvalues. The eigenvalue represents the total variance explained by each factor. Any factor with an eigenvalue ≥1 explains
more variance than a single observed variable. If we conduct a factor analysis on the correlation matrix, the variables are
standardized, which means that each variable has a variance of 1, and the total variance is equal to the number of variables
used in the analysis. The first factor will always account for the most variance (and hence have the highest eigenvalue), and the
next factor will account for as much of the left-over variance as it can, and so on. Hence, each successive factor will account for
less and less variance.

o Determination of the number of factors based on eigenvalues. The rule to extract only factors with eigenvalues >1
(determination based on eigenvalues).

 Factor. An underlying dimension that explains the correlations among a set of variables.

 Factor loadings. Factor loading of a variable (lkm) quantifies the extent to which the variable is related to a given factor. These
are simple correlations between the variables and the factors. By factor loadings, we can identify the variables with the strongest
association to the underlying latent variable.

 Factor loading plot. A factor loading plot is a plot of the original variables using the factor loadings as coordinates.

 Factor matrix. A factor matrix contains the factor loadings of all the variables on all the factors extracted.

 Factor scores. These are composite scores estimated for each participant on the derived factors.

 Percentage of variance. This is the percentage of the total variance attributed to each factor.

Part-time MBA in Business Innovation 2021-2023 Customer Analytics


 Rotation. The goal of factor rotation is to obtain some of the loadings that are very large (near ±1), and the remaining loadings
are very small (near 0).

o An orthogonal rotation maintains the axes at right angles (preserves orthogonality amongst factors)

 Varimax rotation. An orthogonal method of factor rotation that minimizes the number of variables with high
loadings on a factor, thereby enhancing the interpretability of the factors.

o An oblique rotation does not maintain the axes at right angles and leads to correlated factors. This may increase
interpretability by simplifying the factor matrix (even further).

o Any rotation changes the loadings, but:

 The total amount of variance explained stays constant
 The communalities stay constant

 Scree plot. A scree plot is a plot of the eigenvalues against the number of factors in order of extraction.

o Determination of the number of factors based on scree plots. Plot eigenvalues against the number of factors. A kink
in the function indicates the number of factors to be extracted.

 Specificity. The part of the variance in the variable that is not explained by extracted factors.

Part-time MBA in Business Innovation 2021-2023 Customer Analytics

You might also like