An Introduction To Structural Equation Modeling (SEM) : Using Multivariate Statistics

An Introduction to Structural Equation Modeling (SEM)
SEM is a combination of factor analysis and multiple regression. It also goes by

the aliases causal modeling and analysis of covariance structure. Special cases of
SEM include confirmatory factor analysis and path analysis. You are already familiar
with path analysis, which is SEM with no latent variables.
The variables in SEM are measured (observed, manifest) variables (indicators)
and factors (latent variables). I think of factors as weighted linear combinations that we
have created/invented. Those who are fond of SEM tend to think of them as underlying
constructs that we have discovered.
Even though no variables may have been manipulated, variables and factors in
SEM may be classified as independent variables or dependent variables. Such
classification is made on the basis of a theoretical causal model, formal or informal.
The causal model is presented in a diagram where the names of measured variables
are within rectangles and the names of factors in ellipses. Rectangles and ellipses are
connected with lines having an arrowhead on one (unidirectional causation) or two (no
specification of direction of causality) ends.
Dependent variables are those which have one-way arrows pointing to them and
independent variables are those which do not. Dependent variables have residuals (are
not perfectly related to the other variables in the model) indicated by es (errors) pointing
to measured variables and ds pointing to latent variables.
The SEM can be divided into two parts. The measurement model is the part
which relates measured variables to latent variables. The structural model is the part
that relates latent variables to one another.
Statistically, the model is evaluated by comparing two variance/covariance
matrices. From the data a sample variance/covariance matrix is calculated. From this
matrix and the model an estimated population variance/covariance matrix is computed.
If the estimated population variance/covariance matrix is very similar to the known
sample variance/covariance matrix, then the model is said to fit the data well. A Chisquare statistic is computed to test the null hypothesis that the model does fit the data
well. There are also numerous goodness of fit estimators designed to estimate how
well the model fits the data.
Sample Size. As with factor analysis, you should have lots of data when
evaluating a SEM. As usual, there are several rules of thumb. For a simple model, 200
cases might be adequate. When relationships among components of the model are
strong, 10 cases per estimated parameter may be adequate.
Assumptions. Multivariate normality is generally assumed. It is also assumed
that relationships between variables are linear, but powers of variables may be included
in the model to test polynomial relationships.
Problems. If one of the variables is a perfect linear combination of the other
variables, a singular matrix (which cannot be inverted) will cause the analysis to crash.
Multicollinearity can also be a problem.
An Example. Consider the model presented in Figure 14.4 of Tabachnick and
Fidell [Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.).
SEM-Intro.doc
2
Boston: Allyn & Bacon.]. There are five measurement variables (in rectangles) and two
latent variables (in ellipses). Two of the variables are considered independent (and
shaded), the others are considered dependent. From each latent variable there is a
path pointing to two indicators. From one measured variable (SenSeek) there is a path
pointing to a latent variable (SkiSat). Each measured variable has an error path leading
to it. Each latent variable has a disturbance path leading to it.
Parameters. The parameters of the model are regression coefficients for paths
between variables and variances/covariances of independent variables. Parameters
may be fixed to a certain value (usually 0 or 1) or may be estimated. In the
diagram, an represents a parameter to be estimated. A 1 indicates that the
parameter has been fixed to value 1. When two variables are not connected by a
path the coefficient for that path is fixed at 0.
Tabachnick and Fidell used EQS to arrive at the final model displayed in their
Figure 14.5.
Model Identification. An identified model is one for which each of the
estimated parameters has a unique solution. To determine whether the model is
identified or not, compare the number of data points to the number of parameters to be
estimated. Since the input data set is the sample variance/covariance matrix, the
number of data points is the number of variances and covariances in that matrix, which
can be calculated as
m(m 1)
, where m is the number of measured variables. For
2
T&Fs example the number of data points is 5(6)/2 = 15.

If the number of data points equals the number of parameters to be estimated,
then the model is just identified or saturated. Such a model will fit the data
perfectly, and thus is of little use, although it can be used to estimate the values of the
coefficients for the paths.
If there are fewer data points than parameters to be estimated then the model is
under identified. In this case the parameters cannot be estimated, and the
researcher needs to reduced the number of parameters to be estimated by deleting or
fixing some of them.
When the number of data points is greater than the number of parameters to be
estimated then the model is over identified, and the analysis can proceed.
Identification of the Measurement Model. The scale of each independent
variable must be fixed to a constant (typically to 1, as in z scores) or to that of one of the
measured variables (a marker variable, one that is thought to be exceptionally well
related to the this latent variable and not to other latent variables in the model). To fix
the scale to that of a measured variable one simply fixes to 1 the regression coefficient
for the path from the latent variable to the measured variable. Most often the scale of
dependent latent variables is set to that of a measured variable. The scale of
independent latent variables may be set to 1 or to the variance of a measured variable.
The measurement portion of the model will probably be identified if:
There is only one latent variable, it has at least three indicators that load on it,
and the errors of these indicators are not correlated with one another.
There are two or more latent variables, each has at least three indicators that
load on it, and the errors of these indicators are not correlated, each indicator
loads on only one factor, and the factors are allowed to covary.
There are two or more latent variables, but there is a latent variable on which
only two indicators load, the errors of the indicators are not correlated, each
indicator loads on only one factor, and none of variances or covariances between
factors is zero.
Identification of the Structural Model. This portion of the model may be identified
if:
None of the latent dependent variables predicts another latent dependent

variable.
When a latent dependent variable does predict another latent dependent
variable, the relationship is recursive, and the disturbances are not correlated.
A relationship is recursive if the causal relationship is unidirectional (one line
pointing from the one latent variable to the other). In a nonrecursive
relationship there are two lines between a pair of variables, one pointing from
A to B and the other from B to A. Correlated disturbances are indicated by
being connected with a single line with arrowhead on each end.
When there is a nonrecursive relationship between latent dependent variables
or disturbances, spend some time with: Bollen, K.A. (1989). Structural
equations with latent variables. New York: John Wiley & Sons -- or hire an
expert in SEM.
If your model is not identified, the SEM program will throw an error and then you
must tinker with the model until it is identified.
Estimation. The analysis uses an iterative procedure to minimize the
differences between the sample variance/covariance matrix and the estimated
population variance matrix. Maximum Likelihood (ML) estimation is that most
frequently employed. Among the techniques available in the software used in this
course (SAS and AMOS), the ML and Generalized Least Squares (GLS) techniques
have fared well in Monte Carlo comparisons of techniques.
Fit. With large sample sizes, the Chi-square testing the null that the model fits
the data well may be significant even when the fit is good. Accordingly there has
been great interest in developing estimates of fit that do not rely on tests of
significance. In fact, there has been so much interest that there are dozens of such
indices of fit. Tabacknick and Fidell discuss many of these fit indices. You can also
find some discussion of them in my document Conducting a Path Analysis With
SPSS/AMOS.
Model Modification and Comparison. You may wish to evaluate two nested
models. Model R is nested within Model F if Model R can be created by deleting
one or more of the parameters from Model F. The significance of the difference in fit
can be tested with a simple Chi-square statistic. The value of this Chi-square equals
the Chi-square fit statistic for Model F minus the Chi-square statistic for Model R.
The degrees of freedom equal degrees of freedom for Model F minus degrees of
freedom for Model R. A nonsignificant Chi-square indicates that removal of the
4
parameters that are estimated in Model F but not in Model R did not significantly
reduce the fit of the model to the data.
The Lagrange Multiplier Test (LM) can be used to determine whether or not the
model fit would be significantly improved by estimating (rather than fixing) an
additional parameter. The Wald Test can be used to determine whether or not
deleting a parameter would significantly reduce the fit of the model. The Wald test is
available in SAS Calis, but not in AMOS. One should keep in mind that adding or
deleting a parameter will likely change the effect of adding or deleting other
parameters, so parameters should be added or deleted one at a time. It is
recommended that one add parameters before deleting parameters.
Reliability of Measured Variables. The variance in each measured variable is
assumed to stem from variance in the underlying latent variable. Classically, the
variance of a measured variable can be partitioned into true variance (that related to
the true variable) and (random) error variance. The reliability of a measured variable
is the ratio of true variance to total (true + error) variance. In SEM the reliability of a
measured variable is estimated by a squared correlation coefficient, which is the
proportion of variance in the measured variable that is explained by variance in the
latent variable(s).
Return to Wuenschs Stats Lessons Page
SEM with AMOS
SEM with SAS Proc Calis
SEM Fit short introduction, David A. Kenney.
Karl L. Wuensch
Dept. of Psychology, East Carolina University, Greenville, NC USA
December, 2014

An Introduction To Structural Equation Modeling (SEM) : Using Multivariate Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Introduction To Structural Equation Modeling (SEM) : Using Multivariate Statistics

Uploaded by

Copyright:

Available Formats

An Introduction to Structural Equation Modeling (SEM)

SEM is a combination of factor analysis and multiple regression. It also goes by

T&Fs example the number of data points is 5(6)/2 = 15.

None of the latent dependent variables predicts another latent dependent

Return to Wuenschs Stats Lessons Page

SEM with AMOS

SEM with SAS Proc Calis

SEM Fit short introduction, David A. Kenney.

You might also like