You are on page 1of 35

CFA SEM:

Modeling Causal Processes


BUSI6280
The term structural equation modeling
conveys two key aspects of the procedure:
That the causal processes under study are
represented by a series of structural (i.e.,
regression) equations,
That these structural relations can be
modeled pictorially to enable a clearer
conceptualization of the theory under
study.
Why Use SEM ?
SEM lends itself well to the analysis of data for
inferential purposes.
Whereas, traditional multivariate procedures are
incapable of either assessing or correcting for
measurement error, SEM provides explicit
estimates of these parameters.
SEM procedures can incorporate both unobserved
(i.e. latent) and observed variables.
The factor analytic model (EFA or CFA) focuses solely
on how the observed variables are linked to their
underlying latent factors.
Factor analysis is concerned with the extent to which
the observed variables are generated by the underlying
latent constructs and thus strength of the regression paths
from the factors to the observed variables (the factor
loadings) are of primary interest.
Although inter-factor relations are also of interest, any
regression structure among them is not considered in the
factor analytic model.
Purpose of Factor Analysis
Measurement model: latent variables and their observed
measures (i.e., the CFA model)
Structural model: Model with links among the latent
variables.
Full (Complete) Model: a measurement model and a
structural model
Recursive model: Direction of cause is from one direction
only
Non-recursive model: reciprocal or feedback effects (often
different from one another).
Type of Models
Three different scenarios or models
(Jreskog 1993)
Strictly Confirmatory (SC)
Alternative Models (AM)
Model Generating (MG)
The SC Scenario
The researcher postulates a single model
based on theory, collects the appropriate
data, and then tests the fit of the
hypothesized model to the sample data.
From the results of this test, the researcher
either rejects or fails to reject the model. No
further modifications to the model are made.
The AM Scenario
The researcher proposes several alternative
(i.e., competing) models, all of which are
grounded in theory.

Following analysis of a single set of empirical
data, the researcher selects one model as
most appropriate in representing the sample
data.
The MG Scenario
The researcher, having postulated and rejected a
theoretically derived model on the basis of its poor fit to
the sample data, proceeds in an exploratory (rather than
confirmatory) fashion to modify and re-estimate the
model.
The primary focus here is to locate the source of misfit
in the model. Jreskog noted that, although
respecification may be either theory- or data-driven, the
ultimate objective is to find a model that is both substantively
meaningful and statistically well fitting.
SEM procedures alternative computer programs
AMOS-Arbuckle, 1995
EQS-Bentler, 1995
LISCOMP-Muthn, 1998
CALIS-SAS Institute, 1992
RAMONA-Browne, Mels, & Coward, 1994
SEPATH-Steiger, 1994
LISREL program is the most widely used, 1970s
Exogenous latent variables are synonymous with independent
variables; they cause fluctuations in the values of other
latent variables in the model.
Changes in the values of exogenous variables are not
explained by the model. Rather, they are considered to be
influenced by other factors external to the model.
Endogenous latent variables are synonymous with dependent
variables and, as such, are influenced by the exogenous
variables in the model, either directly, or indirectly.
SEM - Language
By convention, observed measures are represented by Roman
letters and latent constructs by Greek letters:

Those that are exogenous are termed X-variables.
Those that are endogenous are termed Y-variables.
The measurement model may be specified either in terms
of LISREL exogenous notation (i.e., X-variables), or in
terms of its endogenous notation (i.e., Y-variables).
The exogenous latent constructs are termed as (xi).
The endogenous latent constructs are termed as (eta).
SEM - Language
SEM Language
The Measurement Model
x is a q x 1 vector of observed exogenous variables
y is a p x 1 vector of observed endogenous variables.
is an n x 1 vector of latent exogenous variables
is an m x 1 vector of latent endogenous variables.
is a q x n matrix of coefficients (ij) linking x and
.
is a q x 1 vector of random disturbance term
(errors of measurement) associated with x vector.
is a p x 1 vector of random disturbance term
(errors of measurement) associated with y vector.
SEM Language
The Structural Model
(gamma) is an m x n matrix of coefficients (ij) that
relates the n exogenous factors to the m endogenous
factors.
B(beta) is an m x m matrix of coefficients (ij) that relates
the m endogenous factors to one another.
(zeta) is an m x 1 vector of residuals (i) representing
errors in the equation relating and .
(phi) is an n x n matrix of coefficients (ij) that
captures the variance/covariance between s.
(psi) is the m x m matrix of covariance between s.
The Structural Model
Measurement Model for the X-variables (1):
x=
x
+

Measurement Model for the Y-variables (2):
y=
y
+

Structural Equation Model (3):
=B + +

The following minimal assumptions are
presumed to hold for the system of equations
is uncorrelated with (construct)
is uncorrelated with (construct)
is uncorrelated with and (construct)
, , and are mutually uncorrelated.
E() = 0
E() = 0
E() = 0
E() = 0
E() = 0
(I-B) is nonsingular so that (I-B) exists. This makes the
equation 3 to be written in the reduced form.

Symbol

Representation
Unobserved (latent) Factors
Observed Variable
Path coefficient for regression of
observed variable on unobserved
factors
Path coefficient for regression of
one factor on another.
Residual error (disturbance) in
prediction of unobserved factors
Measurement error associated
with observed variable.
Summary of Matrices, Greek Notation, and Programs Codes

Matrix Program Matrix
Greek Letter Matrix Element Code Type

Measurement Model

Lambda-X
x

x
LX

Regression
Lambda-Y
y

y
LY Regression
Theta delta Q

TD Var/cov
Theta epsilon Q
e

e
TE Var/cov

Structural Model
Gamma GA Regression
Beta BE Regression
Phi PH Var/cov
Psi PS Var/cov
Xi (or Ksi) --- --- Vector
Eta --- --- Vector
Zeta --- --- Vector
Var/cov = variance-covariance

The Structural Model -
1
predicted by
1

x11

x21

x31

1
X
1
X
2
X
3

1
Y
1
Y
2

y11

y21
An important corollary of SEM is that the
variances and covariance of dependent (or
endogenous) variables, whether they be
observed or unobserved, are never
parameters of the model; these are
explained by the exogenous variables.

In contrast, the variance and covariance of
independent variable are important
parameters that need to be estimated.
MEASUREMENT (CFA) MODELS

1
X
1

11



2
X
2

21

1



3
X
3

31



CFA Part




1


1

11
Y
1

1


21

Y
2

2



CFA Part

CFA Model
error ReadSC
ASC
error WriteSC

error TalkSC

SSC
error InteractSC
CFA with Greek Notation

1
x
1

11


1

2
x
2

11

21

3
x
3

32

2

4
x
4

42
Regression Equations (Xs)
1. x
1
=
11

1
+
1

2. x
2
=
21

1
+
2

3. x
3
=
32

2
+
3

4. x
4
=
42

2
+
4
Or in matrix form
X =
x
+

The parameters of this model are
x
,
,and

Where:

x
represents the matrix of regression
coefficients related to the s (described earlier).
(phi) is an x symmetrical variance-
covariance matrix among the exogenous
factors.


(theta-delta) is a symmetrical q x q variance-
covariance matrix among the error of
measurement for the q exogenous observed
variables



The general factor analytic model
can be expanded as:
X =
x
+

x
1

11
0
1

x
2

21
0


1

2

= +
x
3
0
32

2

3

x
4
0
42

4


is the Loadings Matrix
The matrix is often termed the factor-
loading matrix because it portrays the
pattern by which each observed variable is
linked to its respective factor.
The Ys
1. y
1
=
11

1
+

1

2. y
2
=
21

1
+

2

3. y
3
=
32

2
+

3

4. y
4
=
42

2
+

4

Or in matrix form

Y =
y
+

Matrix Notation for Loadings
with Regression Model
Y =
y
+
y
1

11
0
1

y
2 =

21
0
1
+
2

y
3
0
32

2

3
y
4
0
32

4


A just-identified model is one in which
there is a one-to-one correspondence between the
data and the structural parameters.
Number of data variances and covariances equal
number of parameters to be estimated.
However, despite the capability of the model to
yield a unique solution for all parameters, the just-
identified model is not scientifically interesting
because it has no degrees of freedom and
therefore can never be rejected.
Overidentified Model
An overidentified model is one in which the
number of estimable parameters is less than the
number of data points (i.e., variance, covariance of
the observed variable).
This situation results in positive degrees of
freedom that allows for rejection of the model,
thereby rendering it scientific use. The aim in
SEM, then, is to specify a model such that it
meets the criterion of overidentification.
Underidentified Model
An underidentified model is one in which the
number of parameters to be estimated exceeds the
number of variances and covariances.
As such, the model contains insufficient
information (from the input data) for the
purpose of attaining a determinate solution of
parameter estimation; that is, an infinite number
of solutions are possible for an underidentified
model.
Suppose there are 12 observed variable, this means that
we have 12(12+ 1)/2=78 data points.

Suppose that there are 30 unknown parameters.
Thus, with 78 data points and 30 parameters to be
estimated, we have an overidentified model with 48
degrees of freedom.

It is important to point out, however, that the
specification of an overidentified model is a
necessary, but not sufficient condition to resolve the
identification problem. Indeed, the imposition of
constraints on particular parameters can sometime be
beneficial in helping the researcher to attain an
overidentified model
No Scale Set for Constructs
Linked to the issue of identification is the
requirement that every latent variable have
its scale determined. This requirement
arises because these variable are unobserved
and therefore have no definite metric scale;
Assume CFA Model with 12 variables (items) and
4 factors (3 items per factor).
We can assume that there are 12 regression coefficient (s)
There are 12 error variance (s).
There are 4 factors variances (which may be standardized
and therefore set to 1).
There are 6 covariances between factors.

If the factor variances are not set to 1 then then one of the
parameters for each factor can be fixed to a value of
1.00 (they are therefore not to be estimated). The rationale
underlying this constraint is tied to the issue of statistical
identification. In total, then, there are 30 parameters to be
estimated for this CFA model.