You are on page 1of 12

structural equation modeling among model constructs can be investigated

simultaneously using SEM, and it is straightfor-


Hans Baumgartner ward to examine different forms of mediation in
complex multistage models. Furthermore, tests
INTRODUCTION of overall model fit are available, which indicate
how well the specified model represents the data.
Structural equation modeling (SEM) with Second, constructs such as beliefs, attitudes,
latent variables, also known as latent variable and intentions are not directly observable. Using
path analysis, is a technique for investigating single-item measures to assess the constructs
relationships between latent (unobserved) of interest (e.g., asking people whether their
variables or constructs that are measured attitude toward using coupons is favorable or
by multiple manifest (observed) variables unfavorable) does not do justice to even rela-
or indicators. Special cases are CONFIRMA- tively simple constructs such as coupon attitudes
TORY FACTOR ANALYSIS (see EXPLORATORY (the situation becomes more problematic when
FACTOR ANALYSIS), in which no directional a construct is more abstract and when it has
relationships between latent variables (factors) many facets), and combining multiple measures
are specified, and manifest variable path into averages is ad hoc and makes assumptions
analysis, in which directional relationships that cannot be verified. Furthermore, observed
between observed variables are modeled. As variables always contain error, both random
an illustration, consider the model shown in and systematic, and this measurement error
Figure 1. This model is an attempt to explain has to be taken into account in the analysis
consumers’ usage of coupons for grocery because it can have serious distorting influ-
shopping via a number of antecedent constructs. ences on investigations of relationships between
According to the model, actual coupon usage the constructs in the model. Although SEM is
is a function of consumers’ intentions to not a panacea for these problems, it encour-
use coupons, which in turn depend on their ages researchers to think more explicitly about
attitudes toward using coupons. Attitudes are measurement error, enables assessment of the
influenced by three types of beliefs about quality of construct measurement (see VALIDITY
the consequences of using coupons: rewards AND RELIABILITY), and facilitates the study of
(positive consequences associated with coupon structural relationships between constructs in
usage such as saving money on the grocery the presence of measurement error.
bill); inconveniences (one type of negative In this article, we provide an overview of
consequence associated with coupon usage such SEM. The presentation is organized around the
as the time and effort required to use coupons); steps involved in using SEM for theory testing:
and encumbrances (another type of negative model specification; preliminary data analysis;
consequence associated with coupon usage such model estimation; overall model evaluation;
as the need to buy nonpreferred brands in order model modification; and model interpretation
to take advantage of coupon offers). (Bagozzi and Baumgartner, 1994; Baumgartner
This simple model illustrates two of the and Homburg, 1996). The article concludes
major advantages of SEM. First, SEM makes with a brief discussion of more advanced uses of
it possible to study complex patterns of rela- SEM and recent developments.
tionships among the constructs in one’s model
in an integrative fashion. If regression anal-
MODEL SPECIFICATION
ysis (see MULTIPLE REGRESSION) were used
to analyze the model in Figure 1, one would Graphical model specification. Models can be
have to estimate three different regressions, and specified graphically or algebraically, but we
it would be quite cumbersome to show that focus on graphical specification, because many
attitudes and intentions mediate the effects of researchers find it more revealing. Figure 2
beliefs on coupon usage (i.e., that there are shows the graphical specification of a particular
no direct effects of beliefs on intentions and version of the model in Figure 1, assuming that
coupon usage). In contrast, all relationships two indicators each are available to measure

Wiley International Encyclopedia of Marketing, edited by Jagdish N. Sheth and Naresh K. Malhotra.
Copyright © 2010 John Wiley & Sons Ltd
2 structural equation modeling

Rewards
x1

Inconveniences Attitudes Intentions Coupon usage


x2 h1 h2 h3

Encumbrances
x3

Figure 1 Theoretical model of the antecedents of coupon usage.

rewards, inconveniences, and intentions, three not always necessary). The relationships among
indicators to assess encumbrances, and four the exogenous and endogenous latent variables
indicators to capture attitudes. Coupon usage, shown as ellipses constitute the so-called latent
although shown as a latent variable, is actually variable model (sometimes called the structural
treated as an observed variable. Latent variables model), which represents the theoretical model
that are of theoretical interest are shown as studied in the research.
ellipses (or sometimes as circles). Attitudes, To examine the theoretical model empirically,
intentions, and coupon usage are referred to each latent variable of interest has to have at
as endogenous latent variables (denoted by the least one observed measure. The indicators of
Greek symbol η) because the model is designed the exogenous (endogenous) variables are called
to explain the variation in these variables. x (y), and by convention they are enclosed in
Rewards, inconveniences, and encumbrances rectangles (or squares). The coefficients relating
are called exogenous latent variables (denoted the observed variables to their underlying latent
by ξ ) because they are not explained within variables (so-called factor loadings) are denoted
the context of the model. The exogenous
by λx and λy . The error terms (errors in variables,
latent variables are usually allowed to covary
unique factors) associated with the x’s and y’s are
freely, and these covariances are shown as
called δ and ε, respectively, and their variances
double-headed arrows (denoted by ϕ ij ). The
are denoted by θ δ and θ ε . In the present case,
relationships of primary theoretical interest
the error terms were specified to be uncorrelated,
are the effects of the exogenous variables on
the endogenous variables (denoted by γ ) and but this assumption can be relaxed. The model
the effects of the endogenous variables on each linking the exogenous and endogenous latent
other (denoted by β). For example, γ 11 refers variables to their observed measures is called the
to the effect of rewards on attitudes, and β 21 to measurement model.
the effect of attitudes on intentions. Associated The model in Figure 2 is a complete specifica-
with each endogenous latent variable is an error tion of the structural equation model of interest,
term (error in equation or equation disturbance, assuming that all relationships between vari-
denoted by ζ ), because it cannot be expected ables are linear. The model contains five types
that the antecedents of the latent variable can of parameters: the factor loadings (λx and λy );
explain it completely. Arrows that emanate from the measurement error (unique factor) variances
and point to the same variable are variances, and (θ δ and θ ε ); the variances and covariances of
the variances of the ξ ’s and ζ ’s are called ϕ ii the exogenous latent variables (ϕ); the latent
and ψ ii , respectively (the ζ ’s are assumed to be variable model coefficients (γ and β); and the
uncorrelated in the present case, although that is (co)variances of the errors in equations (ψ).
j11
d x1
q11 d1 1
x1
y11 y22 y33
d x2 x
q22 d2 l21 g11
z1 z2 z3
j21
j22
d x3
q33 d3 1
g12 b21 b32
j31 x2 h1 h2 h3

x
d x4 l42 y
q44 d4 l41 y
1 l62 1
j32 1 y y
l21 l31
d
g13 y5 y6 y7
q55 d5 x5 1
x y1 y2 y3 y4
l63
d x6 x3
q66 d6 e5 e6

d x e1 e2 e3 e4
q77 d7 x7 l73 e e
q55 q66
j33
e e e e
q11 q22 q33 q44

Figure 2 A structural equation model of coupon usage.


structural equation modeling 3
4 structural equation modeling
Model identification. For the model to be mean- by a single indicator, as long as the error
ingful, it has to be identified. This means that variance is fixed to a certain value (usually
all parameters in the model are uniquely deter- 0, in which case the latent variable is really
mined so that the conclusions derived from the an observed variable), the model is identified.
analysis are not arbitrary. As a first step, the Using these guidelines, the measurement model
scale of the latent variables has to be fixed (since corresponding to Figure 2 is identified (with 63
the scale in which they are measured cannot be degrees of freedom).
observed directly). One way to do this is to set Once the model in the first step is shown to be
one loading per latent variable to 1, as shown in identified, the exogenous and endogenous latent
Figure 2. In addition, the coefficients relating the variables can be treated as observed and the task
error terms (both measurement errors and errors in the second step is to show that the latent vari-
in equations) to observed or latent variables are able model is identified. If the model is recursive
set to 1 (not shown explicitly in Figure 2 because (i.e., the matrix containing the β coefficients is
it is always done by convention). Furthermore, subdiagonal, which means there are no reciprocal
if there is only a single indicator, as in the case paths or feedback loops so that latent variables
of coupon usage, measurement error is usually later in the sequence do not influence latent vari-
ignored and the error (unique) variance is set ables earlier in the sequence, and the errors in
to 0 (another possibility is to assume a certain equations are uncorrelated), the latent variable
amount of error variance, preferably based on model is identified. It can be checked that the
prior research). model in Figure 2 is recursive and therefore
A necessary requirement for identification identified. If the model is nonrecursive, other
is that the number of parameters to be esti- rules, such as the rank rule sometimes used to
mated should not be greater than the number identify systems of simultaneous equations, can
of unique variances and covariances among the be employed.
observed variables. For the model in Figure 2, 35 At times, it is too difficult to show identifi-
parameters have to be estimated, and there are cation explicitly and in that case one may have
105 unique elements in the observed covariance to rely on the computer program used to flag
matrix. Therefore, the model has 70 degrees of a nonidentified model. A useful strategy is to
freedom and the necessary condition for identi- start with a simpler model that is known to be
fication is satisfied (since the number of degrees identified and to then complicate it by intro-
of freedom is nonnegative). ducing, one by one, the additional parameters to
Unfortunately, this simple rule does not arrive at the desired specification, provided the
guarantee identification. For certain types of modification indices associated with the added
models, it is easy to check identification (see parameters are nonzero (see the discussion of
Bollen (1989), for an extended discussion), but model modification later in the chapter).
in general identification is a nontrivial issue. For
general structural equation models consisting of Measurement model specification issues. In
both measurement and latent variable models Figure 2, it is assumed that the observed
the so-called two-step rule (which is sufficient variables are functions (or manifestations) of
but not necessary) is often applied. In the first the latent variables. This is called a reflective
step, the researcher tries to ascertain that a measurement model specification. If an indicator is
measurement model in which no restrictions on specified as reflective (i.e., as an effect indicator),
the latent variable model are imposed and all one assumes that the different indicators of a
covariances between (exogenous, endogenous) construct are interchangeable, correlated with
latent variables are freely estimated is identified. each other, and contaminated by measurement
If there are at least two indicators per construct, error. It is not always meaningful to regard
each indicator loads on one and only one indicators as manifestations of an underlying
construct, the measurement error terms are latent variable. Sometimes, it is more appro-
uncorrelated, and the constructs are allowed priate to conceptualize a given set of indicators
to freely correlate, the measurement model as defining characteristics of a construct, in
is identified. Even if a construct is measured which case the latent variable depends on its
structural equation modeling 5
indicators. This is called a formative measurement the constructs of interest than to present a single
model specification. Formative indicators (also structural model that supposedly best represents
called cause indicators) are not interchangeable, the data. In other words, it is desirable that
need not be correlated, and are usually assumed researchers adopt a model comparison approach
to contain no measurement error. Since to SEM (see MacKenzie, Lutz and Belch (1986),
statistical means of trying to distinguish between for a good example). Furthermore, researchers
reflective and formative measurement models have to keep in mind that different specifications
are of limited usefulness, the decision whether that can have very different substantive implica-
indicators should be modeled as reflective or tions may fit the data equally or nearly equally
formative has to be made based on a conceptual well (MacCallum et al., 1993).
analysis of the items in question. The issue is
important because research shows that incorrect Sample size. Researchers should make sure
specification of the measurement model can prior to data collection that the sample size will
lead to significant bias in model parameters be adequate to avoid estimation problems and
(see Diamantopoulos, Riefler, and Roth (2008), to get reliable fit statistics, parameter estimates,
for a recent review and additional references). and estimates of standard errors. Although many
Although indicators should not be specified as kinds of factors can be expected to influence the
required sample size, two kinds of heuristics for
reflective when they are not (e.g., satisfaction
sample size determination have been proposed in
with different aspects of a job as measures of
the literature. Absolute guidelines are based on
overall job satisfaction), there are many unre-
the notion that the sample size should be greater
solved issues in formative measurement models,
than a certain minimum number (e.g., there
which makes their routine use problematic.
should be at least 200 observations). Relative
Another issue that has to be considered care-
guidelines specify that the number of obser-
fully is how many indicators of each latent
vations per parameter estimated should exceed
variable should be included in the model and
a certain minimum (e.g., ratios of 5 : 1 to as
how these indicators should be related to the
high as 20 : 1 have been mentioned). If suffi-
latent variable. In general, it is better to have
cient prior knowledge is available, sample size
more rather than fewer indicators per construct,
determination can be based on desired levels
but there are practical constraints on how many
of power (MacCallum, Browne, and Sugawara,
items can be put in a questionnaire. Further-
1996; see also STATISTICAL APPROACHES TO
more, including too many individual indicators
DETERMINING SAMPLE SIZES).
in a model might make the model too complex,
and it is also difficult to get acceptable overall PRELIMINARY DATA ANALYSIS
model fits when there are too many indicators per
construct. An alternative is to use item parcels, One common mistake in applications of SEM
where subsets of individual items are averaged seems to be that researchers fail to carefully
and then used as (multiple) indicators in the examine their raw data before calculating the
model. This is acceptable if the items that are sample covariance matrix on which subsequent
combined are known to be unidimensional (i.e., analyses are usually based. Space constraints
form a homogenous set), or if items measuring prohibit a detailed discussion of the issues
the same facet of a multidimensional construct involved (Hair et al., 2006), but researchers
are combined. If the factor structure of a set of should ensure that the data have been coded
measures is poorly understood, parceling is not appropriately, that missing values are dealt with
a good idea (see Bandalos and Finney (2001), for using modern methods such as full information
additional discussion of item parceling). maximum likelihood or Bayesian multiple
imputation, that outliers do not distort the
Latent variable model specification issues. It is sample covariances, and that the necessary
generally much more informative to propose distributional assumptions (e.g., multivariate
alternative latent variable models that, based on normality) are adequately satisfied.
prior research or new theorizing, could be plau- For the purpose of illustration, we use a
sible representations of the relationships among dataset collected from 262 female staff members
6 structural equation modeling
at two American universities to study the model well in practice (see Bentler and Dudgeon 1996,
shown in Figure 2 (see Bagozzi, Baumgartner, for more details).
and Yi, 1992, for additional information). Potential estimation problems are nonconver-
Respondents completed a questionnaire mea- gence (a solution cannot be found within a given
suring beliefs, attitudes, and intentions about number of iterations or within a given time limit)
using coupons for grocery shopping during and improper solutions (the values of sample
the upcoming week (on the basis of 7-point estimates are not possible in the population,
scales, except for the second intention item, such as negative variance estimates). Common
which was measured on a 11-point scale but causes are poorly specified models, small sample
linearly transformed to a range of 1–7), and one sizes, and few indicators per factor.
week later indicated how many coupons from There were no estimation problems with the
6 different sources in 21 product categories present data set and LISREL 8.80 converged
(plus an ‘‘other’’ category) they had used during to a proper solution in less than one second
the previous week. For simplicity, respondents (using a normal-theory-based fitting function).
with missing values were eliminated, leaving an Ideally, more than two indicators should be
effective sample size of 250, which, combined available for all of the constructs, but this was
with a ratio of data points per parameter not possible in the present case. Fortunately,
estimated of about 7, seemed adequate. Not the small number of indicators did not result in
surprisingly, coupon usage was strongly skewed estimation problems.
to the right, so a square root transformation
was used to normalize the data. Although the
data are not multivariate normal (the data are OVERALL MODEL EVALUATION
discrete, the univariate skewnesses ranged from
Before the estimated model is interpreted in
−1.35 to 0.83, the univariate kurtoses from
detail, researchers should ascertain that the spec-
−1.43 to 1.91, the relative multivariate kurtosis
ification is reasonably consistent with the data.
was 1.17, and tests of univariate and multivariate
If there are serious misspecifications and they
normality suggested rejection of normality), it
are not attended to, the conclusions derived
appears that the assumption is not too seriously
from the model can be seriously misleading. Of
violated. Robustness checks are reported later.
course, not all model misspecifications can be
detected based on global goodness-of-fit indices,
MODEL ESTIMATION because alternative specifications that may have
The goal of estimation is to find values for the very different substantive interpretations can be
five types of unknown parameters, based on the equally consistent with the data (MacCallum
observed covariance matrix, such that the covari- et al., 1993). However, if the fit of the model is
ance matrix implied by the estimated model found to be deficient, it has to be dealt with.
parameters is as close as possible to the sample Global fit assessment is based on a summary
covariance matrix. Model estimation also yields measure of the discrepancy between the sample
various goodness-of-fit statistics and standard and model-implied covariance matrices. Theo-
errors for all parameters. A variety of estima- retically, the fit of the model can be assessed using
tion procedures have been proposed, but by far a statistic T which, under appropriate assump-
the most frequently used method is maximum tions, has a central chi-square distribution under
likelihood assuming that the data have a multi- the null hypothesis that the specified model fits
variate normal distribution. Simulations show perfectly in the population. Depending on the
that the estimates tend to be robust to viola- particular assumptions made, there are different
tions of normality, but the chi-square test of T statistics, but ideally they will lead to similar
overall model fit and the estimates of the standard results. On the basis of the likelihood ratio
errors may not be. Estimation procedures that do criterion, one compares the likelihood of the
not require multivariate normality are available, hypothesized model to the likelihood of a model
but adjustments to the normal-theory-based with perfect fit (the saturated model) and hopes
methods, which are easy to use, seem to work that T will not be significantly greater than the
structural equation modeling 7
number of overidentifying restrictions (i.e., the An intuitively appealing index is the standard-
degrees of freedom of the model). ized root mean square residual (SRMR), which
Unfortunately, there are practical problems summarizes the average size of the standardized
with the chi-square test of overall model fit. residuals. It is a stand-alone badness-of-fit index,
First, there is evidence that the test is not robust normed to fall between 0 and 1, and does not
to violations of assumptions such as multi- take into account model parsimony. Values up
variate normality, although promising correc- to 0.05 or maybe 0.10 are often considered to
tions to the traditional chi-square test, such as reflect satisfactory fit.
the Satorra–Bentler scaled (robust) test statistic, Another index is the root mean square error
have been proposed in the literature. Second, of approximation (RMSEA), which estimates
the test is based on the accept-support logic, how well the fitted model approximates the
meaning that failure to reject the null hypothesis population covariance matrix per degree of
provides support for the researcher’s model. On freedom, using the noncentrality parameter to
the one hand, this implies that a model is more index
 error of approximation. It is computed
likely to be supported when sample size and as (T − df )/((N − 1)df ), where T is the
power are low, even though the chi-square test test statistic, df the degrees of freedom of the
is only asymptotically valid. On the other hand, model, and N the sample size. A confidence
since most models are unlikely to be literally interval for RMSEA is available, which provides
true in the population, larger sample sizes will information about the precision of the point
ultimately lead to the rejection of a model even estimate. RMSEA is a nonnormed, stand-alone
when the misspecification is relatively minor. badness-of-fit index which imposes a penalty
Because of these problems, many alterna- for fitting additional parameters. Models with
tive (mostly descriptive) fit indices have been RMSEA values below 0.05 are assumed to have
suggested. They can be classified on the basis of close fit, and values up to 0.08 or maybe 0.10 are
(i) whether they are goodness- or badness-of-fit considered acceptable.
indices (depending on whether an increase or The comparative fit index (CFI) is a normed,
decrease in the index signals a better fit), (ii) incremental goodness-of-fit index without a
whether they adjudge fit in an absolute or correction for model parsimony. It is also
relative sense (stand-alone vs. incremental fit based on the idea of noncentrality and can be
indices, where the model of complete inde- max[(Tt −dft ), 0]
computed as 1 − max[(T b −dfb ), (Tt −dft ), 0]
, where
pendence of all observed variables is generally
the subscripts refer to the target and baseline
used as the baseline model for the incremental
models, respectively. CFI should be at least
indices), (iii) whether they are normed, approx-
imately normed, or nonnormed (i.e., always or 0.9 and maybe 0.95 or higher for a well-fitting
usually constrained to fall within a 0–1 interval model.
in sample data, or unconstrained), and (iv) Finally, the nonnormed fit index (NNFI,
whether or not the fit index imposes a penalty the extension of the original Tucker–Lewis
for fitting additional parameters (correction for index in exploratory factor analysis to SEM)
model parsimony or not). A detailed discus- is an approximately normed incremental
sion is beyond the scope of this article. Some goodness-of-fit index which penalizes models
researchers question the value of even those 
containing more parameters.  It is given
(Tt −dft )/((N−1)dft )
indices which have traditionally been recom- by 1 − (Tb −dfb )/((N−1)dfb ) ; the recommended
mended in the literature (including the cutoff levels are the same as for CFI.
values associated with their use; see the recent A special class of fit indices are those based
exchange in the May 2007 issue of Personality on information theory, such as the Akaike infor-
and Individual Differences), but there is some mation criterion (AIC). They can be used to
consensus that a few indices based on different compare (even non-nested) models and take into
conceptual rationales should be used to assess account model parsimony. The model with the
overall model fit, and we briefly mention some smallest value of the fit index (which with some
of the more promising indices. definitions may be negative) is selected.
8 structural equation modeling
One common problem often observed in it estimates the predicted decrease in the
the application of many of these fit indices is T statistic when a fixed parameter is freely
that, after having to conclude that the specified estimated or an equality constraint is relaxed.
model does not fit based on the chi-square Associated with each modification index is an
test, researchers marshal evidence based on expected parameter change (in original and
some alternative fit indices that the model is a various standardized scale units), which shows
good enough approximation. The goal seems the predicted value of the freely estimated
to be to justify the initially proposed structure parameter. Assuming a chi-square distribution
rather than to learn something new from the for T, a modification index exceeding 3.84
data. Furthermore, it is often unclear how suggests a significant improvement of the model
much the original model was modified to bring and a significant parameter estimate when the
the implied covariance matrix into reasonable parameter in question is freely estimated (at an
congruence with the sample covariance matrix, α-level of 0.05).
in which case the overall model evaluation may Residuals show the elements in the observed
not be trustworthy because of the dangers of covariance matrix that are over- or underfitted,
capitalization on chance. and this may also alert the researcher to compo-
The fit statistics for the illustrative model nents of the model that require attention. The
are as follows. The T statistic based on raw residuals depend on the scale in which
normal-theory weighted least squares (the the variables are measured and sampling fluctu-
minimum of the normal-theory fitting function) ations, apart from possible model misspecifica-
was 92.60 (93.63) at 70 degrees of freedom, tion, but dividing each residual by the square root
yielding a p-value of 0.04 (0.03). Thus, the of the estimated asymptotic variance corrects
fit based on the conventional chi-square test for these confounds. These so-called ‘‘standard-
is borderline at an α-level of 0.05. When ized’’ residuals (not to be confused with the
the normal-theory-based T was corrected for standardized residuals on which SRMR is based)
nonnormality using the Satorra–Bentler scaled can be interpreted as z-values and indicate which
test statistic, the resulting value was 83.57 residuals are larger than expected.
(p = 0.13). The SRMR was 0.05, RMSEA 0.036 Although modification indices and residuals
(with a confidence interval ranging from 0.0096 can be very useful, there are two potential
to 0.054), and both the CFI and NNFI were dangers. First, if model modifications are
0.99. All these indices suggest a very good fit primarily driven by the goal to improve the
of the model. This is somewhat unusual, but fit of the model (i.e., additional parameters
there are several reasons for this result. The are added based on the size of the associated
model underlying Figure 1 is well established, modification indices), models that are hard to
the assumed belief structure is based on prior interpret substantively or theoretically may
research, and the measurement of the remaining result. Furthermore, research has shown that
constructs is standard. Even when the model data-based model modifications are not always
seems to fit well, it is advisable to inspect the able to recover the ‘‘true’’ underlying structure,
results in greater detail to evaluate whether and often the values of different modification
modifications are warranted, as described next. indices are of similar magnitude, making it
difficult to decide which parameter to free.
MODEL MODIFICATION It is therefore crucial that data-based model
modifications be tempered by knowledge of the
Model modifications are usually motivated by substantive area and theoretical considerations.
mediocre overall fits of the initially specified Second, data-based model modifications may
model. Two primary tools are used for overemphasize idiosyncrasies of the particular
this purpose: modification indices (Lagrange data set and therefore might not hold up in
multiplier tests) and residuals (the differences future studies (the problem of capitalization on
between the sample and implied covariance chance). Ideally, data-based model modifications
matrices). A modification index is reported should be reevaluated via more confirmatory
for each fixed or constrained parameter and follow-up research, or a cross-validation strategy
structural equation modeling 9
should be used in which the full data set is split parameter estimates and z-values in the revised
into calibration and validation samples and the model were very close to those in the initial
generalizability of modifications introduced in solution, except that the magnitude of the coef-
the former is examined in the latter. ficient from intentions to behavior decreased
Some authors (Anderson and Gerbing, 1988) from 0.49 to 0.41. The variance accounted for
believe that structural equation models should in coupon usage increased from 0.34 to 0.37 in
be evaluated in a two-step process (as many the revised model. There were still 10 signifi-
as four steps have been proposed), where in cant modification indices and several significant
the first step no restrictions are imposed on negative residuals in the revised model, but
the latent variable model (i.e., all covariances the suggested changes were hard to interpret.
among the latent variables of substantive interest Therefore, no additional model modifications
are freely estimated) and attention is focused were considered. Ideally, the added path from
on assessing the adequacy of the measurement rewards to coupon usage should be validated in
model (e.g., whether indicators load on the subsequent research, but since there is evidence
‘‘right’’ constructs, nontarget loadings are small from at least one prior study that rewards can
and nonsignificant, the items individually and as influence behavior directly, there is precedence
sets are sufficiently reliable, and the constructs for this modification.
are discriminant). In the second step, the more
constrained structural specification of interest is
imposed on the latent variable model and the MODEL INTERPRETATION
target model is compared to alternative speci-
A detailed evaluation of the measurement model
fications that are either more or less restrictive
than the target model (the endpoints being the should involve information about the estimated
null and saturated latent variable models). If factor loadings and measurement error (unique)
this detailed evaluation of the measurement and variances (including the variability of the esti-
latent variable models reveals serious problems, mates and T-values), evidence about measure-
appropriate model modifications to either model ment reliability (both for individual items and all
will be required (see Anderson and Gerbing indicators of a given construct combined), and
(1988), for details). some indication that the constructs in the model
For simplicity, we did not conduct separate have discriminant validity (see also VALIDITY
analyses for the measurement and structural AND RELIABILITY). Individual-item reliability
models in our illustrative example (the measure- is simply the squared correlation between a
ment model corresponding to Figure 2 had a T construct ξ j and an indicator xi . It can be
statistic based on normal-theory weighted least computed as ρ ii = λij 2 var(ξ j )/[λij 2 var(ξ j ) +
squares of 62.90 with 63 degrees of freedom). θ ii ]. It would be desirable if at least half of the
For the model in Figure 2 (which, as can be variance of an observed variable were substan-
recalled, had a T statistic of 92.60 with 70 tive variance rather than measurement error
degrees of freedom), there were 11 significant (unique variance), but this is often not the case.
modification indices (out of 131) and 9 significant Two summary measures of reliability for all
standardized residuals (out of 105). The largest indicators of a construct are in common use.
modification index (12.67) suggested a direct Composite reliability is the squared correla-
path from rewards to behavior, not mediated by tion between a construct and an unweighted
attitudes and intentions. This was also supported sum of its indicators.
It can be obtained as ρ c
by significant positive residuals from the two = ( λij )2 var(ξ j )/[( λij )2 var(ξ j ) + θ ii ].
reward indicators to coupon usage (i.e., the initial Average variance extracted (AVE) is the propor-
model does not fully account for the observed tion of the total variance in all indicators of a
covariance between the two reward indicators construct accounted for by the construct, and it
and coupon usage). When the path in ques- is calculated as the average of the individual-item
tion was freely estimated (γ 31 = 0.30, z-value reliabilities. Composite reliability is a generaliza-
of 3.6), the T statistic based on normal-theory tion of coefficient alpha and therefore the same
weighted least squares was 79.21 (p = 0.19). Most guidelines apply. Values below 0.6 indicate poor
10 structural equation modeling
reliability and above 0.8 are desirable. For AVE, between the constructs test the hypotheses that
values greater than 0.5 are deemed satisfactory. the research was designed to investigate. In
Discriminant validity is commonly assessed addition, researchers should report the variance
in the following two ways. First, the correlation accounted for in each of the endogenous variables
between two constructs should be significantly so that readers can get an impression of the size of
different from unity. This can be tested by the effects (given that the estimated coefficients
constructing a confidence interval around the often do not have a natural interpretation).
estimated correlation and checking whether the Summary information about the latent vari-
interval excludes 1. A more stringent test is able model in the illustrative example is shown
based on AVE. Specifically, the AVE of the in Table 1. Reward beliefs have a positive
two constructs involved in a correlation should influence on attitudes and inconvenience beliefs
be greater than the squared correlation. This is affect attitudes negatively. Encumbrance beliefs
based on the intuitive notion that a construct do not have a significant effect on attitudes.
should have more in common with its own indi- More favorable attitudes lead to higher inten-
cators than with other constructs. tions, and more positive intentions encourage
In the illustrative example, all freely esti- greater coupon usage. In addition to the indirect
mated loadings were highly significant, the effect of reward beliefs on behavior via attitudes
individual-item reliabilities were mostly 0.5 or and intentions (point estimate of 0.20, with a
higher (the two exceptions were the second standard error of 0.04), there is also a direct effect
indicator of rewards with a reliability of 0.48 (point estimate of 0.30, with a standard error of
and the first indicator of encumbrances with 0.08). Thus, the total effect of 0.50 is composed
a reliability of 0.24). In future research, the of a direct effect (61% of the total effect) and an
latter indicator should probably be modified. indirect effect (39%). In other words, attitudes
The composite reliabilities and AVEs were and intentions only partially mediate the effect
0.76 and 0.61 for rewards; 0.88 and 0.78 for of reward beliefs on coupon usage.
inconveniences; 0.70 and 0.44 for encum- The final model was also analyzed using the
brances; 0.88 and 0.65 for attitudes; and 0.92 Satorra–Bentler robust standard errors for the
and 0.85 for intentions. With the possible coefficients in the latent variable model and
exception of encumbrances, all constructs seem the results were almost identical. In addition,
to have been measured adequately. The largest 100 bootstrap samples were drawn and analyzed
(disattenuated) correlation is between attitudes and again the results were very similar and the
and intentions (0.69) and discriminant validity substantive conclusions remained exactly the
is satisfied, based on both the criteria. same.
The latent variable model is what researchers It should be noted that any causal interpre-
will generally be most interested in. The sign, tations implied in the foregoing description are
magnitude, and significance of the relationships based on theoretical grounds and facilitated

Table 1 Summary information for the latent variable model.


Path Parameter Parameter Estimate Standardized z-Value
(Standard Error) Estimate

Rewards → attitudes γ 11 0.44 (0.08) 0.47 5.59


Inconveniences → attitudes γ 12 –0.28 (0.06) –0.38 –4.85
Encumbrances → attitudes γ 13 –0.04 (0.10) –0.03 –0.41
Rewards → coupon usage γ 31 0.30 (0.08) 0.24 3.60
Attitudes → intentions β 21 1.09 (0.11) 0.69 9.88
Intentions → coupon usage β 32 0.41 (0.05) 0.48 7.96
R2 (attitudes) — 0.42 — —
R2 (intentions) — 0.48 — —
R2 (behavior) — 0.37 — —
structural equation modeling 11
to some extent by the design of the study Traditional multigroup SEM assumes that
(see also CAUSAL RESEARCH; CONCEPT each observation in the sample can be assigned
OF CAUSALITY AND CONDITIONS FOR to a population of interest a priori. Sometimes,
CAUSALITY ). SEM per se does not warrant however, the researcher may believe that the
these conclusions. For example, coupon usage observations in a sample come from multiple
was measured one week after respondents populations, but the population membership
indicated their beliefs, attitudes, and intentions. of individual observations is not known. In
Although the self-report of coupon usage may this case, latent variable mixture modeling is
have conceivably been influenced by people’s applicable. The goal is to determine whether
memory of their previous responses, it is a mixture of multiple populations gave rise to
difficult to imagine that coupon usage (measured the mean and covariance structure from which
at a later point) could have influenced the the sample under investigation was drawn, and
other constructs in the model (measured at to recover the separate model parameters for
an earlier point). On the other hand, beliefs, each of the multiple populations as well as
attitudes, and intentions were all measured in estimate the mixing proportions. Such anal-
the same questionnaire and the specification yses may be useful for market segmentation
of the direction of the effects is based on one and other investigations of unobserved hetero-
well-established theory (the theory of reasoned geneity (Jedidi, Jagpal, and DeSarbo, 1997; see
action), although one could probably adduce also UNOBSERVED HETEROGENEITY).
other theories to justify different patterns of Structural equation models usually involve
effects. only linear relationships (e.g., LISREL stands
for linear structural relationships). However,
OTHER USES OF SEM AND RECENT sometimes it is of interest to investigate
DEVELOPMENTS multiplicative or quadratic relationships of
latent variables. For example, theories often
In this final section we briefly mention some specify moderator effects of one variable on the
other uses of SEM and recent developments, relationships between other variables. If such a
which cannot be dealt with fully because of space theory is to be tested, an interaction model has
constraints (see the edited book by Hancock and to be specified. Although moderator hypotheses
Mueller (2006), for review chapters on many of can be tested with multigroup structural
the topics in this section). equation models when the moderator is binary
The models described so far are covariance or the number of levels of the moderator is
structure models that ignore the means of the small, a different approach is needed when the
observed and latent variables. Furthermore, they moderator is continuous. Several approaches
are models for a single population. It is possible have been proposed to specify nonlinear effects
to extend the models to multiple known popu- in SEM. One key advantage of these models
lations, in which case the means of the observed is that measurement error, which has even
and latent variables can be incorporated as well. more damaging effects when interactions and
If latent means and path coefficients are to quadratic effects are present, is taken into
be compared across different populations, it is account explicitly. However, these models are
necessary to first establish measurement invari- also quite complex and they frequently require
ance. Depending on the type of comparison to assumptions that might offset their perceived
be conducted, different degrees of measurement advantages.
invariance are necessary (see Steenkamp and The basic structural equation model assumes
Baumgartner (1998), for details). For example, that the data are a simple random sample
the data in the illustrative application were from a single underlying population. However,
collected at two different universities and it complex survey designs are sometimes used,
might be of interest to establish whether the where observations are sampled in multiple
means of the different constructs or the strength stages (e.g., first companies are sampled, and
of the relationships between constructs are the then salespeople within companies), the total
same for the two groups of respondents. population is stratified (e.g., by male and female
12 structural equation modeling
salespeople), and selection probabilities are Bagozzi, R.P. and Baumgartner, H. (1994) The evaluation
unequal across strata (e.g., female salespeople of structural equation models and hypothesis testing,
have a higher probability of being included in in Principles of Marketing Research (ed R.P. Bagozzi),
the sample; see also SAMPLING TECHNIQUES; Blackwell Publishers, Cambridge, pp. 386–422.
PROBABILITY SAMPLING ). Multilevel SEM Bagozzi, R.P., Baumgartner, H., and Yi, Y. (1992) State
versus action orientation and the theory of reasoned
and sampling weights can be used to deal with
action: an application to coupon usage. Journal of
these complications. For example, a researcher Consumer Research, 18, 505–518.
could specify a multilevel structural equation Bandalos, D.L. and Finney, S.J. (2001) Item parceling
model and study (i) whether more motivated issues in structural equation modeling, in New
salespeople in a company are more likely to Developments and Techniques in Structural Equation
attain higher sales (within-company analysis) Modeling (eds G.A. Marcoulides and R.E. Schu-
and (ii) whether companies whose salespeople macker), Lawrence Erlbaum, Mahwah, pp. 269–296.
are more motivated on average tend to have Baumgartner, H. and Homburg, C. (1996) Applications
higher sales (between-company analysis). of structural equation modeling in marketing and
Finally, SEM can be used for longitudinal consumer research: a review. International Journal of
analyses in which a researcher wants to study Research in Marketing, 13, 139–161.
the trajectory of change in some variable(s) of Bentler, P.M. and Dudgeon, P. (1996) Covariance
interest and explain various aspects of the change structure analysis: statistical practice, theory, and
trajectory (e.g., the linear rate of change) based directions. Annual Review of Psychology, 47, 563–592.
on other variables, or use the change trajectory Bollen, K.A. (1989) Structural Equations with Latent Vari-
ables, John Wiley & Sons, Inc., New York.
as an antecedent variable. In such an analysis,
Diamantopoulos, A., Riefler, P., and Roth, K.P. (2008)
which is referred to as latent curve or latent
Advancing formative measurement models. Journal
growth modeling, the latent variables indicated by of Business Research, 61, 1203–1218.
the repeated measurements (e.g., amount of cola Hair, J.F. Jr, Black, W.C., Babin, B.J. et al. (2006) Multi-
consumed per year over a number of years) are variate Data Analysis, 6th edn, Pearson Prentice Hall,
individual-level curve parameters (e.g., inter- Upper Saddle River.
cepts and slopes if a linear trajectory is assumed, Hancock, G.R. and Mueller, R.O. (eds) (2006) Structural
although other specifications are possible), which Equation Modeling: A Second Course, Information Age
can subsequently serve as endogenous or exoge- Publishing, Greenwich.
nous factors in more detailed investigations of Jedidi, K., Jagpal, H.S., and DeSarbo, W.S. (1997)
the change process. Finite-mixture structural equation models for
response-based segmentation and unobserved hetero-
CONCLUSION geneity. Marketing Science, 16, 39–59.
MacCallum, R.C., Browne, M.W., and Sugawara, H.M.
SEM has become a valuable addition to the (1996) Power analysis and determination of sample
methodological toolbox of researchers in social size for covariance structure modeling. Psychological
sciences in general and marketing in particular. Methods, 1, 130–149.
It combines a concern with measurement, which MacCallum, R.C., Wegener, D.T., Uchino, B.N., and
takes into account the inherent fallibility of single Fabrigar, L.R. (1993) The problem of equivalent
indicators of constructs, with the opportunity to models in applications of covariance structure models.
model complex patterns of relationships among Psychological Bulletin, 114, 185–199.
constructs. The scope of SEM is constantly MacKenzie, S.B., Lutz, R.J., and Belch, G.E. (1986) The
being expanded and it can be expected that SEM role of attitude toward the ad as a mediator of adver-
will continue to flourish in empirical research in tising effectiveness: a test of competing explanations.
marketing. Journal of Marketing Research, 23, 130–143.
Steenkamp, J.-B.E.M. and Baumgartner, H. (1998)
Assessing measurement invariance in cross-national
Bibliography consumer research. Journal of Consumer Research, 25,
78–90.
Anderson, J.C. and Gerbing, D.W. (1988) Structural
equation modeling in practice: a review and recom-
mended two-step approach. Psychological Bulletin,
103, 411–423.

You might also like