You are on page 1of 42

Research Training

Series
Structural Equation
Modeling using AMOS
WAQAR AKBAR
ASSISTANT PROFESSOR

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
1
SEM- Introduction
SEM is a statistical methodology that takes a confirmatory
approach to the analysis of a structural theory bearing on
some phenomenon.
Typically, this theory represents “causal” processes that
generate observations on multiple variables (Bentler, 1988).
The term “structural equation modeling” conveys two
important aspects of the procedure: (
◦ a) that the causal processes understudy are represented by a series
of structural (i.e., regression) equations, and
◦ (b) that these structural relations can be modeled pictorially to
enable a clearer conceptualization of the theory under study.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
2
The hypothesized model can then be tested
statistically in a simultaneous analysis of the entire
system of variables to determine the extent to
which it is consistent with the data.
◦ If goodness-of-fit is adequate, the model argues for the
plausibility of postulated relations among variables;
◦ if it is inadequate, the tenability of such relations is
rejected.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
3
SEM - Basic Concepts
Latent versus Observed Variables
In the behavioral sciences, researchers are often interested
in studying theoretical constructs that cannot be observed
directly.
These abstract phenomena are termed latent variables, or
factors.
Examples of latent variables in psychology are self-concept
and motivation; in sociology, powerlessness and anomie; in
education, verbal ability and teacher expectancy; in
economics, capitalism and social class.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
4
Because latent variables are not observed directly, it follows
that they cannot be measured directly.
Thus, the researcher must operationally define the latent
variable of interest in terms of behavior believed to
represent it.
As such, the unobserved variable is linked to one that is
observable, thereby making its measurement possible.
Assessment of the behavior, then, constitutes the direct
measurement of an observed variable, albeit the indirect
measurement of an unobserved variable (i.e., the
underlying construct).
RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,
SZABIST
5
Exogenous v/s Endogenous
Latent Variables
Exogenous latent variables are synonymous with
independent variables:
◦ they “cause” fluctuations in the values of other latent
variables in the model. Changes in the values of
exogenous variables are not explained by the model.
◦ Rather, they are considered to be influenced by other
factors external to the model.
◦ Background variables such as gender, age, and
socioeconomic status are examples of such external
factors.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
6
Endogenous latent variables are synonymous with
dependent variables and, as such, are influenced by
the exogenous variables in the model, either directly or
indirectly.
◦ Fluctuation in the values of endogenous variable is
said to be explained by the model because all latent
variables that influence them are included in the
model specification

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
7
The Factor Analytic Model
The oldest and best-known statistical procedure for
investigating relations between sets of observed and
latent variables is that of factor analysis.
In using this approach to data analyses, the researcher
examines the covariation among a set of observed
variables in order to gather information on their
underlying latent constructs (i.e., factors).
There are two basic types of factor analyses:
exploratory factor analysis (EFA) and confirmatory
factor analysis (CFA).

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
8
Exploratory factor analysis (EFA) is designed for the situation where links
between the observed and latent variables are unknown or uncertain.
◦ The analysis thus proceeds in an exploratory mode to determine how, and to
what extent, the observed variables are linked to their underlying factors.

Confirmatory factor analysis (CFA) is appropriately used when the


researcher has some knowledge of the underlying latent variable structure.
◦ Based on knowledge of the theory, empirical research, or both, he or she
postulates relations between the observed measures and the underlying factors
a priori and then tests this hypothesized structure statistically.

Because the CFA model focuses solely on the link between factors and
their measured variables, within the framework of SEM, it represents what
has been termed a measurement model.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
9
The Full Latent Variable
Model
The full latent variable (LV) model allows for the
specification of regression structure among the latent
variables.
The researcher can hypothesize the impact of one latent
construct on another in the modeling of causal direction.
It comprises both a measurement model and a structural
model:
◦ the measurement model depicting the links between the latent
variables and their observed measures (i.e., the CFA model),
and
◦ the structural model depicting the links among the latent
variables themselves.
RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,
SZABIST
10
The General Structural
Equation Model

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
11
Researchers use these symbols within the framework of
four basic configurations, each of which represents an
important component in the analytic process. These
configurations, each accompanied by a brief description, are
as follows:

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
12
A general structural equation
model

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
13
RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,
SZABIST
14
Using the Amos Program
The program name, Amos, is actually an acronym for
“Analysis of Moment Structures” or, in other words, the
analysis of mean and covariance structures, the essence of
SEM analyses.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
15
Model Specification Using
Amos Graphics
Amos Modeling Tools
Amos provides you with all the tools that you will ever need
in creating and working with SEM path diagrams. Each tool
is represented by an icon (or button) and performs one
particular function; there are 42 icons from which to
choose.
Let’s practice
(Example 1)

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
16
Understanding the Basic
Components of Model 1
One extremely important caveat in working with structural
equation models is to always tally the number of
parameters in the model being estimated prior to running
the analyses.
This information is critical to your knowledge of whether or
not the model that you are testing is statistically identified.
Thus, as a prerequisite to the discussion of identification,
let’s count the number of parameters to be estimated for
the model portrayed in Figure (example 1).

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
17
From a review of the figure, we can ascertain that
◦ there are 12 regression coefficients (factor loadings),
◦ 16 variances (12 error variances and 4 factor variances), and
◦ 6 factor covariances.
◦ The 1s assigned to one of each set of regression path parameters
represent a fixed value of 1.00; as such, these parameters are not
estimated.
◦ In total, then, there are 30 parameters to be estimated for the CFA
model depicted in example 1

Let’s now turn to a brief discussion of the critically


important concept of model (or statistical) identification.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
18
The Concept of Model
Identification
The issue of identification focuses on whether or not there is a unique
set of parameters consistent with the data.
This question bears directly on the transposition of the variance–
covariance matrix of observed variables (the data) into the structural
parameters of the model under study.
If a unique solution for the values of the structural parameter can be
found, the model is considered to be identified.
◦ As a consequence, the parameters are considered to be estimable and the
model therefore testable.
◦ If, on the other hand, a model cannot be identified, thus, the model cannot
be evaluated empirically.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
19
just-identified, overidentified,
or underidentified
A just-identified model is one in which there is a
one-to-one correspondence between the data
and the structural parameters.
◦ That is to say, the number of data variances and
covariances equals the number of parameters to be
estimated. However, despite the capability of the
model to yield a unique solution for all parameters, the
just-identified model is not scientifically interesting
because it has no degrees of freedom and therefore
can never be rejected.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
20
An overidentified model is one in which the
number of estimable parameters is less than
the number of data points (i.e., variances
and covariances of the observed variables).
◦ The situation results in positive degrees of
freedom that allow for rejection of the model,
thereby rendering it of scientific use. The aim in
SEM, then, is to specify a model such that it
meets the criterion of overidentification.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
21
An underidentified model is one in which the
number of parameters to be estimated exceeds the
number of variances and covariances (i.e., data
points).
◦ As such, the model contains insufficient information
(from the input data) for the purpose of attaining a
determinate solution of parameter estimation; that is, an
infinite number of solutions are possible for an
underidentified model.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
22
Reviewing the CFA model in
Example 1
let’s now determine how many data points we have to work with
(i.e., how much information do we have with respect to our
data?).
As noted above, these constitute the variances and covariances
of the observed variables; with p variables, there are p(p + 1)/2
such elements.
Given that there are 12 observed variables, this means that we
have 12(12 + 1)/2 = 78 data points. Prior to this discussion of
identification, we determined a total of 30 unknown parameters.
Thus, with 78 data points and 30 parameters to be estimated, we
have an overidentified model with 48 degrees of freedom.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
23
Example2
Let’s again determine if we have an identified model.
Given that we have 11 observed measures, we know that we
have 66 (11[11 + 1]/2) pieces of information from which to derive
the parameters of the model.
Counting up the unknown parameters in the model, we see that
we have 26 parameters to be estimated: 7 measurement
regression paths; 3 structural regression paths; 2 factor variances;
11 error variances; 2 residual error variances; and 1 covariance.
We therefore have 40 (66−26) degrees of freedom and, thus, an
overidentified model.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
24
How Large a Sample Size Do I Need?

Rules of Thumb
◦ Ratio of Sample Size to the Number of Free Parameters
◦ Tanaka (1987): 20 to 1 (Most analysts now think that is
unrealistically high.)
◦ Goal: Bentler & Chou (1987): 5 to 1 
◦ Several published studies do not meet this goal.
◦ Sample Size 200 is seen as a goal for SEM research
◦ Lower sample sizes can be used for
◦ Models with no latent variables
◦ Models where all loadings are fixed (usually to one)
◦ Models with strong correlations
◦ Simpler models 

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
25
Application1: Testing the Factorial
Validity of a Theoretical Construct
Step 1: Model Specification
Step 2: Data Specification
Step 3: Calculation of Estimates
Note
◦ Check Standardized estimates and Modification indices
◦ Put 10 in the box “threshold for Modification indices”

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
26
Amos Text Output:
Hypothesized 4 Factor Model
Model Summary
See output : Notes for Model
Model Evaluation
◦ Of primary interest in structural equation modeling is the
extent to which a hypothesized model “fits” or, in other
words, adequately describes the sample data.
◦ Two Evaluation Criteria:
◦ Parameter Estimates
◦ The model as a whole

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
27
Parameter Estimates
Feasibility of parameter estimates
◦ The initial step in assessing the fit of individual
parameters in a model is to determine the viability of
their estimated values.
◦ Any estimates falling outside the admissible range signal
a clear indication that either the model is wrong or the
input matrix lacks sufficient information.
◦ Examples of parameters exhibiting unreasonable
estimates are correlations >1.00, negative variances, and
covariance or correlation matrices that are not positive
definite.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
28
Appropriateness of standard errors.
◦ Standard errors reflect the precision with which a
parameter has been estimated, with small values
suggesting accurate estimation.
◦ Thus, another indicator of poor model fit is the presence
of standard errors that are excessively large or small.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
29
Statistical significance of parameter estimates.
◦ The test statistic as reported in the Amos output is the critical ratio
(C.R.), which represents the parameter estimate divided by its
standard error; as such, it operates as a z-statistic in testing that the
estimate is statistically different from zero.
◦ Based on a probability level of .05, then, the test statistic needs to be
>±1.96 before the hypothesis (that the estimate equals 0.0) can be
rejected.
◦ Nonsignificant parameters, with the exception of error variances, can
be considered unimportant to the model.
◦ Nonsignificant parameters, with the exception of error variances, can
be considered unimportant to the model

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
30
Model as a whole: Goodness
of Fit Statistics
CMIN: as chi-square test
◦ a probability of <.05 , suggesting that the fit of the data
to the hypothesized model is not entirely adequate.
◦ We desire not to reject this test. We need value more
than .05
◦ However another criteria can be seen as ‘cmin/df’
◦ < 3 good, < 5 permissible

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
31
NFI & CFI
Bentler and Bonett’s (1980) NormedFit Index (NFI) has been the
practical criterion of choice. However, addressing evidence that the NFI
has shown a tendency to underestimate fit in small samples, Bentler
(1990) revised the NFI to take sample size into account and proposed
the Comparative Fit Index
A value >.90 was originally considered representative of a well-fitting
model (see Bentler, 1992), a revised cutoff value close to .95 has
recently been advised (Hu & Bentler, 1999).
Both indices of fit are reported in the Amos output; however, Bentler
(1990) has suggested that, of the two, the CFI should be the index of
choice.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
32
RFI, IFI, TLI
The Relative Fit Index (RFI; Bollen, 1986) represents a derivative of the NFI;
as with both the NFI and CFI, the RFI coefficient values range from zero to
1.00, with values close to .95 indicating superior fit T
The Incremental Index of Fit (IFI) was developed by Bollen (1989b) to
address the issues of parsimony and sample size which were know to be
associated with the NFI.
◦ As such, its computation is basically the same as the NFI, with the exception that
degrees of freedom are taken into account. Thus, it is not surprising that finding
of IFI should be consistent with that of the CFI in reflecting a well-fitting model.

The Tucker–Lewis Index (TLI; Tucker & Lewis, 1973), values close to .95 (for
large samples) being indicative of good fit

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
33
GFI & AGFI
The Goodness-of-fit Index (GFI) is a measure of the relative amount of
variance and covariance.
AGFI differs from the GFI only in the fact that it adjusts for the number
of degrees of freedom in the specified model.
Values close to 1.00 being indicative of good fit. However, value of AGFI
can be close to .8 (but above .8) due to degree of freedom penalty.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
34
SRMR
The standardized Root mean square Residual (SRMR) represents the
average value across all standardized residuals,
Ranges from zero to 1.00; in a well-fitting model this value will be small,
say, .05 or less.
◦ You cannot find SRMR directly in AMO output. Use plugin menu

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
35
RMSEA (most important!!)
Root Mean Square Error of Approximation (RMSEA) takes into
account the error of approximation in the population and asks the
question, “how well would the model, with unknown but optimally
chosen parameter values, fit the population covariance matrix if it
were available?” (Browne & Cudeck, 1993, pp. 137–8).
Values less than .05 indicate good fit, and values as high as .08
represent reasonable errors of approximation in the population
(Browne & Cudeck, 1993).
MacCallum et al. (1996) have recently elaborated on these
cutpoints and noted that RMSEA values ranging from .08 to .10
indicate mediocre fit, and those greater than .10 indicate poor fit.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
36
MacCallum and Austin (2000) have strongly recommended routine use
of the RMSEA for at least three reasons:
◦ (a) it appears to be adequately sensitive to model misspecification (Hu &
Bentler, 1998);
◦ (b) commonly used interpretative guidelines appear to yield appropriate
conclusions regarding model quality (Hu & Bentler, 1998, 1999); and
◦ (c) it is possible to build confidence intervals around RMSEA values.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
37
Confidence Interval for RMSEA
Addressing Steiger’s (1990) call for the use of confidence intervals to
assess the precision of RMSEA estimates,
Amos reports a 90% interval around the RMSEA value.
Presented with a small RMSEA, albeit a wide confidence interval, a
researcher would conclude that the estimated discrepancy value is
quite imprecise, negating any possibility to determine accurately the
degree of fit in the population.
In contrast, a very narrow confidence interval would argue for good
precision of the RMSEA value in reflecting model fit in the population
(MacCallum et al., 1996).

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
38
P-Close
In addition to reporting a confidence interval around the RMSEA value,
Amos tests for the closeness of fit. That is, it tests the hypothesis that
the RMSEA is “good” in the population
Jöreskog and Sörbom (1996a) have suggested that the p-value for this
test should be >.50.

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
39
In a nutshell

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
40
Model Misspecification:
Modification Indices

RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,


SZABIST
41
RESEARCH TRAINING SERIES, FACULTY OF MANAGEMENT SCIENCE,
SZABIST
42

You might also like