You are on page 1of 40

e1 e2 e3 e4

1 1 1 1

c e m p

CONFIRMATORY
FACTOR ANALYSIS:
ONE FACTOR MODELS

PSYC 5130 Week 4 September 22, 2009


Conceptual Nature of Latent Variables

 Latent variables correspond to some type of


hypothetical construct
 Not measured directly
 Require a specific operational definition
 Indicators of the construct need to be selected
 Data from the indicators must be consistent with
certain predictions (e.g., moderately correlated with
one another)
Why use latent variables?
Brown 1-2, Kline 70-74

 Measurement error is inevitable and we must deal


with it.
◦ Advantageous because latent variables are less
influenced by measurement error.
 Latent variables are the “real” variables, not
measured variables.
 Cannot use factor scores because they still have

measurement error.
 Ability to use latent variables is the primary

strength of SEM
Multi-Indicator Approach
 Reduces the overall effect of measurement error of
any individual observed variable on the accuracy of
the results
 We distinguish between observed variables
(indicators) and underlying latent variables or
factors (constructs)
 measurement model: observed variables and the
latent variables
Error Error Error
Error Error Error VT AT ST
Var1 Var2 Var3
1 1 1
1 1 1

Vocabulary Analogies Writing


Observed Observed Observed Test Test Sample
Variable 1 Variable 2 Variable 3
1
1

Verbal
Abilities
Latent
Variable

Construct
of interest
Exploratory Factor Analysis (EFA)

 EFA also has to do with the “latent” structure of a set


of variables
 a set of variables are entered into analysis
 based on the correlations among these variables, factors are
extracted
 each factor represents a linear combination meant to
account for as much of the variance in the set of entered
variables as possible
 In EFA, no a priori specification of how many latent
factors or about how measures relate to these factors
Exploratory Factor Analysis (EFA)

 Problems
 Communality: must know the communality
before estimation, but communality is a
function of the loadings
 Number of factors
 Rotation: when there are two or more
factors, the solution is not unique
Exploratory Factor Analysis (EFA)

 Principal Components
 Communality is set to 1
 Factor is defined as the sum of the variables
 Loadings chose to maximize the explanation of the
variances of the measures
 Loadings are usually too “high” in that the predicted
correlations are larger than the observed correlations
Exploratory Factor Analysis (EFA)

 Principal Factors
 Communality iteratively estimated
 Factor is a latent variable
 Loadings chose to:
 to maximize the explanation of the correlations between the
measures.
 minimizes the sum of squared residuals
 residual = observed correlation minus predicted correlation
Exploratory Factor Analysis (EFA)

 Maximum Likelihood
 Solution is iteratively estimated
 Factor is a latent variable
 Loadings chosen to maximize the explanation of the
correlations between the measures
 tries harder to explain the larger correlations
 Statistical tests available
Exploratory Factor Analysis (EFA)
 Example: Bollen.sps

Variable Principal Components Principal Axis Maximum Likelihood


Overall .905 .877 .816
Clear .865 .811 .862
Color .921 .918 .938
Odor .805 .710 .684

Note that Principal Components loadings generally larger than the other methods.
PA and ML fairly similar.
EFA vs. CFA
 EFA is useful when the researcher does not know
how many factors there are or when it is uncertain
what measures load on what factors
 EFA typically used as a “data reduction” strategy
 Both EFA and CFA reduce a larger number of
observed variables into a smaller number of latent
factors
 However, EFA is done with little a priori hypothesis;
CFA requires a priori specification based on
hypothesis
Assumptions of CFA
 Multivariate normality
 Sufficient sample size
 Correct model specification
 Sampling Assumptions- Simple random sample
Confirmatory Factor Analysis
 Representation in SEM Error Error Error
Var1 Var2 Var3
 Latent variable represented by a circle
 Measured variables (indicators) represented by
1 1 1
a square
 Each indicator variable has an error term
 CFA Initial Specification Observed Observed Observed
Variable 1 Variable 2 Variable 3
 Each variable loads on one and only one factor
 Factors can (and typically are) correlated 1
 Errors across indicator variables are
independent
 Assumptions
Latent
 The factors are uncorrelated with the Variable
measurement errors
 Most (if not all) of the errors of different
indicators are uncorrelated with each other
Residuals In CFA

 Item level residuals are represented as latent variables. They are


not called disturbances. They represent measurement error in
EFA/CTT sense.
 This is a tremendous advantage in hybrid models, which combine
CFA and path models, because it separates measurement error
from error in the model.
Errors vs. Disturbances
 Are both residuals.
 Both necessary in their respective roles.
 Errors always represent measurement error.
 Disturbances
 Represent omitted variables (in hybrid model).
 If no error terms, measurement error will be in
disturbance (in path model).
“Standard” Specification of CFA

 One indicator treated as a marker or reference variable


(Brown p. 61, Kline p. 170): Its loading is fixed to one
 Which variable should you choose?
 closest in meaning to the factor
 most interpretable units of measurement
 empirical: strongest correlations with other indicators
 No test of statistical significance
 Factor variance is freely estimated
 Error paths are set to one
 Error variances are freely estimated
Standard Specification
e1 e2 e3 e4

1 1 1 1

c e m p

G
Identification
 Identification in CFA is largely determined by the
number of indicator variables used in the model
(more later).
 Number of indicators

◦ 2 is the minimum
◦ 3 is safer, especially if factor correlations are weak
◦ 4 provides safety
◦ 5 or more is more than enough (If too many indicators then
combine indicators into sets or parcels.)
Identification
Overidentified model = knowns > unknowns
Number of knowns = Number of variances and covariances of observed
variables computed by k(k+1)/2, where k is the number of observed
variables
Number of unknowns (free parameters) is based on the specified model. It is
typically a sum of the number of:
 exogenous variables (one variance estimated for each)

 endogenous variables (one error variance each)

 correlations between variables (one covariance for each pairing)

 regression paths (arrows linking exogenous variables to endogenous


variables)
Latent variables
 indicator variables (one error variance for each)

 paths from latent variables to indicator variables (excluding those fixed to


1)
Parameters in a CFA Model
 Factor loadings: the effect of latent variable on that observed
measure
 Can be unstandardized or standardized
 If measure loads on only one factor, standardized factor loading is the
correlation between the measure and the factor (and square root of
measure’s reliability).
 Factor covariances or correlations: association between each
pair of latent variables
 Error variance: variance in the observed measure that is not
explained by latent variable
 Error variance is variance not explained by the factor (but not
necessarily random or meaningless variance)
 Correlated errors: correlation between a pair of error terms
Parameters in a CFA Model
V2

BSI2
1
e2  Number of knowns = 28
W1 V3

(7*8)/2=28
1
BSI5 e5
V1 W4
V4
1 1
BSI8 e8
Depression
V5
Factor W2 1
BSI10 e10
W5

W6 BSI14
1
e14
V6  Number of unknowns =
W3
BSI16
1
e16
V7
14
1
V8 Variance of latent factor (1)
BSI18 e18
Free factor loadings (6)
Variances of error terms (7)

CONFIRMATORY FACTOR ANALYSIS: One Factor of BSI Depression


with all parameters labeled
If a researcher’s measurement model is reasonably
correct, then… (Kline)

1. Items (indicators) specified to measure a common


underlying factor should have relatively high
loadings on that factor (convergent validity)
2. Estimated correlations between the factors should
not be excessively high (>.85) (discriminant
validity)
Discriminant validity refers to the
distinctiveness of the factors measured by
different sets of indicators.
What to examine
 Parameter estimates (all should be statistically
significant different from zero)
 loadings
 factor variance (one-tailed test)
 error variances (one-tailed test)
 error correlations (two tailed)
 Check for Heywood cases!!!! (Negative error
variances)
Problems In Estimation: Heywood Cases

 Heywood Cases: Negative error variance (or a


standardized loading larger than 1)
 Why?
 Misspecification
 Outliers
 Small sample+2 indicators per factor
 Empirical under-identification
Problems In Estimation: Heywood Cases

 How to eliminate Heywood cases:


◦ Search for specification error
 Sometimes a measure has correlated error
 Create 2 factors
◦ Fix error variance to zero
 Creates an extra df as one parameter is not estimated
 Need to adjust chi-square and fit indices
◦ Non-linear constraints that error variances cannot be negative
(always in EQS)
◦ Set loadings equal (must use covariance matrix)
◦ Use an alternative estimation method beside ML
◦ Empirical underidentification: make sure correlations are not weak
AMOS Bollen Example
Respecification
 Simpler Model
 Set loadings equal: use covariance matrix and
variables must be in the same metric
 More Complex Model
 Correlated errors: a priori justification
 Second factor

Use Diagnostics: residuals


These are nested

 If the two-headed arrow in model b is set to 1, that would be saying there is


only one latent trait.
 Thus model b has one more path than model a.
Proposed 1 Factor Structure
Testing: Comparison of Nested Models
 Two models
 Base Model
 More Complex Model (e.g., base model with additional
paths)
 fewerdf
 If the base model is good fitting, then the more complex
model must also be good fitting.
 Chi square and degrees of freedom are subtracted to test
constraints made in the base model
 The more complex model should be a “good fitting”
model, otherwise the conclusion is that one model is less
poor than another.
Testing: Comparison of Nested Models

 Chi-square difference test


◦ Run both models
◦ Subtract Chi-square and df: Simple-Complex
 Complex: more parameterized, less parsimonious
 Simpler: less parameterized, more parsimonious

χ 2 diff = n.s.  favor parsimonious model


χ 2 diff = sig favor more parameterized model
Notes about Nested Models
 Models must have same variables
 Models will not be nested if your respecification
involves deleting a variable(s)
 Can add parameters, delete parameters, but cannot
do both
Nested Models?
Nested Models?
“Path Analytic” Specification
 All loadings are freely estimated
 Factor variance are set to one
 Error paths are freely estimated:
 (Standardized) error path equals the square root of one
minus the standardized factor loading squared
 Error variances are set to one
Path Analytic Specification
1 1 1 1

e1 e2 e3 e4

c e m p

G
4 basic steps- CFA:
1. Define the factor model.
 select the number of factors to determine the
nature of the paths between the factors and the
measures.
 Paths can be fixed at zero, fixed at another
constant value, allowed to vary freely, or be
allowed to vary under specified constraints (such
as being equal to each other).
2. Fit the model to the data.
4 Basic Steps for CFA
3. Evaluate model adequacy.
 When the factor model is fit to the data, the factor
loadings are chosen to minimize the discrepancy
between the correlation matrix implied by the model
and the actual observed matrix.
 The amount of discrepancy after the best parameters
are chosen can be used as a measure of how
consistent the model is with the data.
 Fit statistics
4 Basic Steps for CFA
4. Compare with other models.
 To compare two nested models, examine the
difference between their 2 statistics.
 Most tests of individual factor loadings can be
made as comparisons of full and reduced factor
models.
 For non-nested models, you can compare the Root
mean square error of approximation (RMSEA), an
estimate of discrepancy per degree of freedom in
the model, other fit indices, and the AIC and BIC.

You might also like