You are on page 1of 22

Journal of Operations Management 24 (2006) 148–169

www.elsevier.com/locate/dsw

Use of structural equation modeling in operations


management research: Looking back and forward§
Rachna Shah *, Susan Meyer Goldstein 1
Operations and Management Science Department, Carlson School of Management,
321, 19th Avenue South, University of Minnesota, Minneapolis, MN 55455, USA
Received 10 October 2003; received in revised form 28 March 2005; accepted 3 May 2005
Available online 5 July 2005

Abstract

This paper reviews applications of structural equation modeling (SEM) in four major Operations Management journals
(Management Science, Journal of Operations Management, Decision Sciences, and Journal of Production and Operations
Management Society) and provides guidelines for improving the use of SEM in operations management (OM) research. We
review 93 articles from the earliest application of SEM in these journals in 1984 through August 2003. We document and assess
these published applications and identify methodological issues gleaned from the SEM literature. The implications of
overlooking fundamental assumptions of SEM and ignoring serious methodological issues are presented along with guidelines
for improving future applications of SEM in OM research. We find that while SEM is a valuable tool for testing and advancing
OM theory, OM researchers need to pay greater attention to these highlighted issues to take full advantage of its potential.
# 2005 Elsevier B.V. All rights reserved.

Keywords: Empirical research methods; Structural equation modeling; Operations management

1. Introduction brought to the attention of a much wider audience of


marketing and consumer behavior researchers. While
Structural equation modeling as a method for Operations Management (OM) researchers were slow
measuring relationships among latent variables has to use this new statistical approach, structural equation
been around since early in the 20th century originating modeling (SEM) has more recently become one of the
in Sewall Wright’s 1916 work (Bollen, 1989). Despite preferred data analysis methods among empirical OM
a slow but steady increase in its use, it was not until the researchers, and articles that employ SEM as the
monograph by Bagozzi in 1980 that the technique was primary data analytic tool now routinely appear in
major OM journals.
§
Note: List of reviewed articles is available upon request from the Despite its regular and frequent application in the
authors. OM literature, there are few guidelines for the
* Corresponding author. Tel.: +1 612 624 4432.
E-mail addresses: rshah@csom.umn.edu (R. Shah),
application of SEM and even fewer standards that
smeyer@csom.umn.edu (S.M. Goldstein). researchers adhere to in conducting analyses and
1
Tel.: +1 612 626 0271. presenting and interpreting results, resulting in a large

0272-6963/$ – see front matter # 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.jom.2005.05.001
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169 149

variance across articles that use SEM. To the best of fication (Anderson and Gerbing, 1988), key assump-
our knowledge, there are no reviews of the applica- tions underlying model specification (Bagozzi and Yi,
tions of SEM in the OM literature, while there are 1988; Fornell, 1983), and other methodological issues
regular reviews in other research areas that use this of evaluation and fit (MacCallum, 1986; MacCallum
technique. For instance, focused reviews have et al., 1992).
appeared periodically in psychology (Hershberger, At the outset, we point to a distinction in the use of
2003), marketing (Baumgartner and Homburg, 1996), two terms that are often used interchangeably in OM:
MIS (Chin and Todd, 1995; Gefen et al., 2000), covariance structure modeling (CSM) and structural
strategic management (Shook et al., 2004), logistics equation modeling (SEM). CSM represents a general
(Garver and Mentzer, 1999), and organizational class of models that include ARMA (autoregressive
research (Medsker et al., 1994). These reviews have and moving average) time series models, multi-
revealed vast discrepancies and serious flaws in the use plicative models for multi-faceted data, circumplex
of SEM. Steiger (2001) notes that even SEM textbooks models, as well as all SEM models (Long, 1983).
ignore many important issues, suggesting that Thus, SEM models are a subset of CSM models. We
researchers may not have sufficient guidance to use restrict the current review to SEM models because
SEM appropriately. other types of CSM models are rarely used in OM
Due to the complexities involved in using SEM and research.
problems uncovered in its use in other fields, a review Structural equation modeling is a technique to
specific to OM literature seems timely and warranted. specify, estimate, and evaluate models of linear
Our objectives in conducting this review are three- relationships among a set of observed variables in
fold. First, we characterize published OM research in terms of a generally smaller number of unobserved
terms of relevant criteria such as software used, variables (see Appendix A for detail). SEM models
sample size, parameters estimated, purpose for using consist of observed variables (also called manifest or
SEM (e.g. measurement model development, struc- measured, MV for short) and unobserved variables
tural model evaluation), and fit measures used. In (also called underlying or latent, LV for short) that can
using SEM, researchers have to make subjective be independent (exogenous) or dependent (endogen-
choices on complex elements that are highly inter- ous) in nature. LVs are hypothetical constructs that
dependent in order to align research objectives with cannot be directly measured, and in SEM are typically
analytical requirements. Therefore, our second objec- represented by multiple MVs that serve as indicators
tive is to highlight these interdependencies, identify of the underlying constructs. The SEM model is an a
problem areas, and discuss their implications. Third, priori hypothesis about a pattern of linear relationships
we provide guidelines to improve analysis and among a set of observed and unobserved variables.
reporting of SEM applications. Our goal is to promote The objective in using SEM is to determine whether
improved usage of SEM, standardize terminology, and the a priori model is valid, rather than to ‘find’ a
help prevent some common pitfalls in future OM suitable model (Gefen et al., 2000).
research. Path analysis and confirmatory factor analysis are
two special cases of SEM that are regularly used in
OM. Path analysis (PA) models specify patterns of
2. Overview of structural equation modeling directional and non-directional relationships among
MVs. The only LVs in such models are error terms
To provide a basis for subsequent discussion, we (Hair et al., 1998). Thus, PA provides for the testing of
present a brief overview of structural equation structural relationships among MVs when the MVs are
modeling along with two special cases frequently of primary interest or when multiple indicators for LVs
used in the OM literature. The overview is intended to are not available. Confirmatory factor analysis (CFA)
be a brief synopsis rather than a comprehensive requires that LVs and their associated MVs be
detailing of mathematical model specification. There specified before analyzing the data. This is accom-
are a number of books (Maruyama, 1998; Bollen, plished by restricting the MVs to load on specific LVs
1989) and articles dealing with mathematical speci- and by designating which LVs are allowed to correlate.
150 R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169

Fig. 1. Illustrations of PA, CFA, and SEM models.

A CFA model allows for directional influences 3. Review of published SEM research
between LVs and their MVs and (only) non-
directional (correlational) relationships between Our review focuses on empirical applications of
LVs. Long (1983) provides a detailed (mathematical) SEM which include: (1) CFA models alone, such as
treatment of each of these techniques. Fig. 1 shows in measurement or validation research; (2) PA
graphical illustrations of SEM, PA and CFA models. models (provided they are estimated using software
Throughout this paper, we use the term SEM to refer to which allows latent variable modeling); and (3) SEM
all three model types (SEM, PA, CFA) and note any models that combine both measurement and struc-
exceptions to this. tural components. We exclude theoretical papers,
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169 151

papers using simulation, conventional exploratory 3.2. Time horizon and article selection
factor analysis (EFA), structural models estimated by
regression models (e.g. models estimated by two Rather than use specific search terms for selecting
stage least squares), and partial least squares (PLS) articles, we manually checked each article of the
models. EFA models are not included because the reviewed journals. Although more time consuming,
measurement model is not specified a priori (MVs the manual search gave us more control and better
are not restricted to load on a specific LV and a MV coverage than a ‘‘keyword’’ based search because
can load on multiple LVs),1 whereas in SEM, the there is no widely accepted terminology for research
model is explicitly defined a priori. The main methods in OM to conduct such a search. In selecting
objective of regression and PLS models is prediction an appropriate time horizon, we started with the most
of variance explanation in the dependent variable(s) recent issue of each journal available until August
compared to theory development and testing in the 2003 and moved backwards in time. Using this
form of structural relationships (i.e. parameter approach, we reviewed all published issues of JOM
estimation) in SEM. This philosophical distinction from 1982 (Volume 1, Number 1) to 2003 (Volume 21,
between these approaches is critical in deciding Number 4) and POM from 1992 (Volume 1, Number
whether to use PLS or SEM (Anderson and Gerbing, 1) to 2003 (Volume 12, Number 1). For MS and DS,
1988). In addition, because assumptions underlying we moved backward in time until we no longer found
PLS and regression are less constraining than applications of SEM. The earliest application of SEM
SEM, the problems and concerns in conducting in DS was found in 1984 (Volume 15, Number 2) and
these analyses are significantly different. Therefore, the most recent issue reviewed is Volume 34, Number
we do not include regression and PLS models in our 1 (2003). The incidence of SEM in MS began in 1987
review. (Volume 34, Number 6) and we reviewed all issues
through Volume 49, Number 8 (2003). The earliest
3.1. Journal selection publication in these two journals corresponds with our
knowledge of the field and seems to have face validity
We considered all OM journals that are recognized as such because it coincides with the general
as publishing high quality and relevant empirical OM timeframe when SEM was beginning to gain attention
research. Recently, Barman et al. (2001) ranked of the wider audience in other literature streams.
Management Science (MS), Operations Research In total, we found 93 research articles that satisfied
(OR), Journal of Operations Management (JOM), our selection criteria. Fig. 2 shows the number of
Decision Sciences (DS), and Journal of Production articles stacked by journal for the years we reviewed.
and Operations Management Society (POMS) as the This figure is very informative: overall, it is clear that
top OM journals in terms of quality. In the past decade, the number of SEM articles has increased significantly
several additional reviews have examined the quality over the past 20 years in the four journals individually
and/or relevance of OM journals and have consistently
ranked these journals in the top tier (Vokurka, 1996;
Goh et al., 1997; Soteriou et al., 1998; Malhotra and
Grover, 1998). We do not include OR in our review as
its mission does not include publishing empirical
research. We selected MS, JOM, DS, and POMS as the
journals most representative of high quality and
relevant empirical research in OM. In our review, we
include articles from these four journals that meet our
methodology criteria and do not exclude articles due
to topic of research.
1
Target rotation, rarely used in OM research, is an instance of
EFA in which the model is specified a priori. Fig. 2. Number of articles by journal and year.
152 R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169

and cumulatively. To assess the growth trend in the use model to the analysis. When more than one model is
of SEM, we regress the number of articles on an index evaluated (using single, multiple, or split samples)
of year of publication (beginning with 1984). We use each distinct model is included in our analysis. In this
both linear and quadratic effects of time in the situation, each article contributed more than one
regression model. model to the analysis. A total of 143 models were
The regression model is significant (F 2,17 = 39.93, drawn from the 93 research articles, thus the overall
p = 0.000) and indicates that 82% of the variance in sample size for the remainder of the paper is 143. Of
the number of SEM publications is explained by the the 143 models, we could not determine the method
linear and quadratic effects of time. Further, the linear used for four models. Of the remaining 139 models, 26
trend is not significant (t = 0.850, p = 0.41), whereas are PAs, 38 are CFAs, and 75 are SEMs. There are a
the quadratic effect is significant (t = 2.94, p = .009). small number of articles that reported models that
So the use of SEM has not grown linearly as a function never achieved adequate fit (by the authors’ descrip-
of time, rather it has accelerated over time. In contrast, tions), and while we include these articles in our
the use of SEM in marketing and psychology grew review, the fit measures are omitted from our analysis
steadily over time and there is no indication of its to avoid inclusion of data related to models with
accelerated use in more recent years (Baumgartner inadequate fit.
and Homburg, 1996; Hershberger, 2003).
There are several software programs available for
conducting SEM analysis, and each has idiosyncra- 4. Critical issues in the application of SEM
sies and fundamental requirements for conducting
analysis. In our database, 19.6% of the articles did There are many important issues to consider when
not report the software used. Of the articles that using SEM, whether for evaluating a measurement
reported the software, LISREL accounted for 48.3% model or examining the fit of structural relationships,
followed by EQS (18.9%), SAS (9.1%), AMOS separately or simultaneously. Our discussion of issues
(2.8%), RAMONA (0.7%) and SPSS (0.7%). is organized into three groups: (1) issues to consider or
LISREL was the first software developed to solve address prior to analysis are categorized under the
structural equation models and seems to have ‘‘pre-analysis’’ stage; (2) issues and concerns to
capitalized on its first mover advantage not only in address during analysis; and (3) issues related to the
psychology (MacCallum and Austin, 2000) and post-analysis stage, which includes issues related to
marketing (Baumgartner and Homburg, 1996) but evaluation, interpretation and presentation of results.
also in OM. Decisions made at each stage are highly interdepen-
dent and significantly impact the quality of results, and
3.3. Unit of analysis we cross-reference and discuss these interdependen-
cies whenever possible.
In our review we found that multiple models were
sometimes presented in one article. Therefore, the unit 4.1. Issues related to pre-analysis stage
of analysis from this point forward (unless specified
otherwise) is the actual applications (one or more Issues related to the pre-analysis stage need to be
models for each article). A single model is included in considered prior to conducting SEM analysis and
our data set in the following situations: (1) when a include conceptual issues, sample size issues, mea-
single model is proposed and evaluated using a single surement model specification, latent model specifica-
sample; (2) when multiple alternative or nested tion, and degrees of freedom issues. A summary of
models are evaluated using a single sample, only pre-analysis data from the reviewed OM studies is
the final model is included in our analysis; (3) when a presented in Table 1.
single model is evaluated with either multiple samples
or by splitting a sample, only the model tested with the 4.1.1. Conceptual issues
verification sample is included in our analysis. Thus, An underlying assumption of SEM analysis is that
in these three cases, each article contributed only one the items or indicators used to measure a LV are
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169 153

Table 1
Issues related to pre-analysis stage
Path analysis Confirmatory Structural All modelsa
models factor analysis equation
models models
Number of models revieweda 26 38 75 143
Sample size
Median 125.0 141.0 202.0 176.0
Mean 251.2 245.4 246.4 243.3
Range (18, 2338) (63, 902) (52, 840) (16, 2338)
Number of parameters estimated
Median 10.0 31.0 34.0 26.0
Mean 11.3 38.3 37.5 31.9
Range (2, 34) (8, 98) (11, 101) (2, 101)
Sample size/parameters estimated
Median 9.6 6.2 5.6 6.4
Mean 33.5 8.8 7.4 13.2
Range (2.9, 389.7) (2.3, 36.1) (1.6, 25.4) (1.6, 389.7)
Number of manifest variables
Median 6.0 12.5 12.0 11.0
Mean 6.3 13.5 16.3 14.0
Range (3, 10) (4, 32) (5, 80) (3, 80)
Number of latent variables
Median Not relevant 3.0 4.0 4.0
Mean Not relevant 3.66 4.7 4.4
Range Not relevant (1, 10) (1, 12) (1, 12)
Manifest variables/latent variable
Median Not relevant 4.0 3.3 3.6
Mean Not relevant 5.2 4.1 4.5
Range Not relevant (1.3, 16.0) (1.3, 9.0) (1.3, 16.0)
Number of single indicator Not relevant Reported for 1 model Reported for 25 models Reported for 28 models
latent variables b
Correlated measurement 1 model unknownc 11 models (28.9%) 8 models (10.7%), 19 models (13.3%),
errors (CMEs) 4 models unknownc 6 models unknownc
Theoretical justification Not relevant 0 (0% of CFA models 4 (50% of SEM models 4 (21% of all models
for CMEs with CMEs) with CMEs) with CMEs)
Recursiveness 127 (88.8%) recursive; 13 (9.1%) nonrecursive; not reported or could not be determined from model
description for 3 (2.1%) models
Evidence of model identification Reported by 3.8% Reported by 26.3% Reported by 5.3% Reported by 10.5%
Degrees of freedom (d.f.)
Median 4.5 62.0 52.5 48.0
Mean 4.6 90.1 124.5 99.7
Range (1, 11) (5, 367) (4, 690) (1, 690)
Proportion reporting 53.8% 52.6% 88.0% 71.3%
a
The type of analysis performed could not be determined for 4 of 143 models published in 93 articles.
b
The number of latent variables modeled using a single measured variable (i.e. single indicator).
c
Presence of CMEs could not be determined due to inadequate model description.
154 R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169

reflective (i.e. caused by the same underlying LV) in techniques such as EFA and regression analysis may
nature. Yet researchers frequently apply SEM to be more appropriate (Hurley et al., 1997).
formative indicators. Formative (also called causal)
indicators are measures that form or cause the creation 4.1.2. Sample size issues
of a LV (MacCallum and Browne, 1993; Bollen, Adequacy of sample size has a significant impact
1989). An example of formative measures is the on the reliability of parameter estimates, model fit,
amount of beer, wine and hard liquor consumed to and statistical power. Using a simulation experiment
indicate level of mental inebriation (Chin, 1998). It to examine the effect of varying sample size to
can be hardly argued that mental inebriation causes the parameter estimate ratios, Jackson (2003) reports that
amount of beer, wine and hard liquor consumption. On smaller sample sizes are generally characterized by
the contrary, the amount of each type of alcoholic parameter estimates with low reliability, greater
beverage affects the level of mental inebriation. bias in x2 and RMSEA fit statistics, and greater
Formative indicators do not need to be highly uncertainty in future replication. How large a sample
correlated or have high internal consistency (Bollen, should be for SEM is deceptively difficult to
1989). In this example, an increase in beer consump- determine because it is dependent upon several
tion does not imply an increase in wine or hard liquor characteristics such as number of MVs per LV
consumption. Measurement of formative indicators (MacCallum et al., 1996), degree of multivariate
requires an index (as opposed to developing a scale normality (West et al., 1995), and estimation method
when using reflective indicators), and can be modeled (Tanaka, 1987). Suggested approaches for determin-
using SEM, but requires additional constraints ing sample size include establishing a minimum (e.g.,
(Bollen, 1989; MacCallum and Browne, 1993). Using 200), having a certain number of observations per
SEM without additional constraints makes the MV, having a certain number of observations per
resulting estimates invalid (Fornell et al., 1991) and parameters estimated (Bentler and Chou, 1987;
the model statistically unidentified (Bollen and Bollen, 1989; Marsh et al., 1988), and through
Lennox, 1991). conducting power analysis (MacCallum et al., 1996).
Another underlying assumption for SEM is that the While the first two approaches are simply rules
theoretical relationships hypothesized in the models of thumbs, the latter two have been studied
being tested represent actual relationships in the extensively.
studied population. SEM assesses how closely the Table 1 reports the results of analysis of SEM
observed data correspond to the expected patterns applications in the OM literature related to sample size
and requires that relationships represented by the and number of parameters estimated. The smallest
model are well established and amenable to accurate sample sizes for PA (n = 18), CFA (n = 63), and SEM
measurement in the population. SEM is not recom- (n = 52) are significantly smaller than established
mended for exploratory research when the measure- guidelines for models with even minimal complexity
ment structure is not well defined or when the theory (MacCallum et al., 1996; Marsh et al., 1988).
that underlies patterns of relationships among LVs is Additionally, 67.9% of all models have ratios of
not well established (Brannick, 1995; Hurley et al., sample size to parameters estimates of less than 10:1
1997). and 35.7% of models have ratios of less than 5:1. The
Thus, researchers need to carefully consider: (1) lower end of both sample size and sample size to
type of items, (2) state of underlying theory, and (3) parameter estimate ratios are significantly smaller in
stage of development of measurement instrument, the reviewed OM research than those studied by
prior to using SEM. For formative measurement items, Jackson (2003), indicating that the OM literature may
researchers should consider alternative techniques be highly susceptible to the negative outcomes
such as SEM using formative indicators (MacCallum reported in his study.
and Browne, 1993) and components-based approaches Statistical power (i.e. the ability to detect and
such as partial least squares (Cohen et al., 1990). reject a poor model) is critical to SEM analysis
When the underlying theory or the measurement because, in contrast to traditional hypothesis testing,
structure is not well developed, simpler data analytic the goal in SEM analysis is to produce a non-
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169 155

significant result between sample data and the parameter estimates they provide are not reliable and
implied covariance matrix derived from model overall fit statistics cannot be interpreted (Rigdon,
parameter estimates. Yet, a non-significant result 1995). For models in which there are fewer unknowns
may also be due to a lack of ability (i.e. power) to than equations (degrees of freedom are one or greater),
detect model misspecification. Few studies in our the model is ‘‘over-identified’’. An over-identified
review mentioned power and none estimated power model is highly desirable because more than one
explicitly. Therefore, we employed MacCallum et al. equation is used to estimate at least some of the
(1996), who define minimum sample size as a parameters, significantly enhancing reliability of the
function of degrees of freedom that is needed for estimate (Bollen, 1989).
adequate power (0.80) to detect close model fit, to Model identification is a complex issue and while
assess the power of models in our sample. (We were non-negative degrees of freedom is a necessary
not able to assess power for 41 of 143 models due to condition, additional conditions such as establishing
insufficient information.) Our analysis indicates that a scale for each LV are frequently required (for a
37% of the models have adequate power and 63% do detailed discourse on sufficiency conditions, see Long,
not. These proportions are consistent with similar 1983; Bollen, 1989). In our review, degrees of freedom
analyses in psychology (MacCallum and Austin, were not reported for 41 (28.7%) models (see Table 1).
2000), MIS (Chin and Todd, 1995), and strategy We recalculated the degrees of freedom independently
(Shook et al., 2004), and have not changed since for each reviewed model to assess discrepancies
1960 (Sedlmeier and Gigenrenzer, 1989). We between the reported and our calculated degrees of
recommend that future researchers use MacCallum freedom. We were not able to reproduce the degrees of
et al. (1996) to calculate the minimum sample size freedom for 18 applications based on authors’
needed to ensure adequate statistical power. descriptions of their models. This lack of reproduci-
bility may be due in part to poor model description or
4.1.3. Degrees of freedom and model to correlated errors in the measurement or latent
identification variable models that are not stated in the text. We also
Degrees of freedom are calculated as follows: examined whether the issue of identification was
d:f: ¼ ð1=2Þf pð p þ 1Þg  q, where p is the number explicitly addressed for each model. One author
of MVs, ð1=2Þf pð p þ 1Þg is the number of equations reported that the estimated model was not identified
(or alternately, the number of distinct elements in the and only 10.5% mentioned anything about model
input matrix ‘‘S’’), and q is the effective number of identification. Perhaps the issue of identification was
free (unknown) parameters to be estimated minus the considered implicitly because many software pro-
number of implied variances. As the formula grams provide a warning message if a model is not
indicates, degrees of freedom is a function of model identified.
specification in terms of the number of equations and Model identification has a significant impact on
the effective number of free parameters that need to be parameter estimates: in an unidentified model, more
estimated. than one set of parameter estimates could generate the
When the effective number of free parameters is observed data and a researcher has no way to choose
exactly equal to the number of equations (that is, the among the various solutions because each is equally
degrees of freedom are zero), the model is said to be valid (or invalid, if you wish). Degrees of freedom are
‘‘just-identified’’ or ‘‘saturated’’. Just-identified mod- critically linked to the minimum sample size required
els provide an exact solution for parameters (i.e. point for adequate model fit; the greater the degrees of
estimates with no confidence intervals). When the freedom, the smaller the sample size needed for a
effective number of free parameters is greater than the given level of model fit (MacCallum et al., 1996).
number of equations (degrees of freedom are less than Calculating and reporting the degrees of freedom are
zero), the model is ‘‘under-identified’’ and sufficient fundamental to understanding the specified model, its
information is not available to uniquely estimate the identification, and its fit. Thus, we recommend that
parameters. Under-identified models may not con- degrees of freedom and model identification should be
verge during model estimation, and when they do, the reported for every tested model.
156 R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169

4.1.4. Measurement model specification but the substantive meaning is significantly different.
4.1.4.1. Number of items (MVs) per LV. It is generally Double loading implies that each MV is affected by
accepted that multiple MVs should measure each LV two underlying LVs. Fundamental to LV unidimen-
but the number of MVs that should be used is less sionality is that each MV load on one LV with loadings
clear. A ratio of fewer than three MVs per LV is of on all other LVs restricted to zero. Because adding
concern because the model is statistically unidentified correlated measurement errors to SEM models nearly
in the absence of additional constraints (Long, 1983). always improves model fit, they are often used post
A large number of MVs per LV is advantageous as it hoc without improving the substantive interpretation
helps to compensate for a small sample (Marsh et al., of the model (Fornell, 1983; Gerbing and Anderson,
1988) but disadvantageous as it means more para- 1984) and making reliability estimates ambiguous
meters to estimate, requiring a larger sample size for (Bollen, 1989 p. 222).
adequate power. A large number of MVs per LV also To the best of our knowledge, our sample contains
makes it difficult to parsimoniously represent the no instances of double loading MVs but we found a
measurement structure constituting the set of MVs number of models with correlated measurement
(Anderson and Gerbing, 1984). In cases where a large errors: 3.8% of PA, 28.9% of CFA, and 10.7% of
number of MVs are needed to represent a LV, Bagozzi SEM models. We read the text of each article carefully
and Heatherton (1994) suggest four methods to reduce to determine whether the authors provided any
the number of MVs per LV. In our review, 24% of CFA theoretical justification for using correlated errors or
models (9 of 38) and 39% of SEM models (29 of 75) whether they were introduced simply to improve
had a MV:LV ratio of less than 3. Generally, these model fit. In more than half of the applications, no
applications did not explicitly discuss identification justification was provided. Correlated measurement
issues or additional constraints. The number of MVs errors should be used only when warranted on
per LV characteristic is not applicable to PA models. theoretical or methodological grounds (Fornell,
1983) and their statistical and substantive impact
4.1.4.2. Single indicator constructs. We identified should be explicitly discussed.
LVs represented by a single indicator in 2.6% of CFA
models and 33.3% of SEM models in our sample (not 4.1.5. Latent model specification
applicable to PA models). The low occurrence of 4.1.5.1. Recursive/non-recursive models. Models are
single indicator variables for CFA is not surprising non-recursive when they contain reciprocal causation,
because the central objective of CFA is construct feedback loops, or correlated error terms (Bollen,
measurement. However, the relatively high occurrence 1989, p. 83). In such models, the matrix representing
of single indicator constructs in SEM models is latent exogenous variables (B; see Appendix A for more
troublesome because single indicators ignore mea- detail) has non-zero elements both above and below the
surement reliability, one of the challenges SEM is diagonal. If B is lower triangular and the errors in
designed to circumvent (Bentler and Chou, 1987). The equations are uncorrelated, then the model is called
single indicator issue is also tied to model identifica- recursive (Hair et al., 1998). Non-recursive models
tion as discussed above. Single indicators are only require additional restrictions for the model to be
sufficient when one measure perfectly represents a identified, for the stability of estimated reciprocal
concept, a rare situation, or when measurement effects, and for the interpretation of measures of
reliability is not an issue. Generally, single MVs variation accounted for in the endogenous variables (for
should be modeled as MVs rather than LVs. a more detailed treatment of non-recursive models, see
Long, 1983; Teel et al., 1986). In our review, we
4.1.4.3. Correlated measurement errors. Measure- examined each application for recursive and non-
ment errors should sometimes be modeled as recursive models due to either simultaneous effects or
correlated, for instance, in a longitudinal study when correlated errors in equations. While we did not observe
the same item is measured at two points in time any instances of simultaneous effects, we found that in
(Bollen, 1989 p. 232). The statistical effect of 9.1% of the models, either the authors defined their
correlated error terms is the same as double loading, model as non-recursive or a careful reading of the article
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169 157

led to such a conclusion. However, even when authors We found that for 26.6% of applications, normality
explicitly stated that they were testing a non-recursive was discussed qualitatively in the text of the reviewed
model, we saw little if any explanation of issues such as articles. Estimation methods such as maximum
model identification in the text. We recommend that if likelihood ratio and generalized least square assume
non-recursive models are specified, additional restric- normality, although some non-normality can be
tions and implications for model identification are accommodated (Hu and Bentler, 1998; Lei and
explicitly stated in the paper. Lomax, 2005). Weighted least square, ordinary least
square, and asymptotically distribution free estimation
4.2. Issues related to data analysis methods do not require normality. Additionally, ‘‘ML,
Robust’’ in EQS software adjusts model fit and
Data analysis issues comprise examining sample parameter estimates for non-normality. Finally,
data for distributional characteristics and generating researchers can transform non-normal data, although
an input matrix. Distributional characteristics of the serious problems have been noted with data transfor-
data impact researchers’ choices of estimation mation (cf. Satorra, 2001). We suggest that some
method, and the type of input matrix impacts the discussion of data screening methods be included
selection of software used for analysis. generally, and normality be discussed specifically in
relation to the choice of estimation method.
4.2.1. Data screening
Data screening is critical to prepare data for SEM 4.2.2. Type of input matrix
analysis (Hair et al., 1998). Screening through While raw data can be used as input for SEM
exploratory data analysis includes investigating for analysis, a covariance (S) or correlation (R) matrix is
missing data, influential outliers, and distributional generally used. In our review of the OM literature, no
characteristics. Significant missing data result in papers report using raw data, 30.8% report using S,
convergence failures, biased parameter estimates, and 25.2% report using R (44.1% of applications did
and inflated fit indices (Brown, 1994; Muthen et al., not report the type of matrix used to conduct analysis).
1987). Influential outliers are linked to normality and Seven of 44 applications using S and 25 of 36
skewness issues with MVs. Assessing data normality applications using R provide the input matrix in the
(along with skewness and kurtosis) is important paper. Not providing the input matrix makes it
because many model estimation methods are based on impossible to replicate the results reported by the
an assumption of normality. Non-normal data may author(s).
result in inflated goodness of fit statistics and While conventional estimation methods in SEM are
underestimated standard errors (MacCallum et al., based on statistical distribution theory that is
1992), although these effects are lessened with larger appropriate for S but not for R, there are interpreta-
sample sizes (Lei and Lomax, 2005). tional advantages to using R: if MVs are standardized
In our review, only a handful of applications and the model is fit to R, then parameter estimates can
discussed missing data. In the psychology literature, be interpreted in terms of standardized variables.
listwise deletion, pairwise deletion, data imputation However, it is not correct to fit a model to R while
and full information maximum likelihood (FIML) treating R as if it were a covariance matrix. Cudeck
methods are commonly used to manage missing data (1989) conducted exhaustive analysis on the implica-
(Marsh, 1998). Results from Monte Carlo simulation tions of treating R as if it were S and concludes that the
examining the performance of these four methods consequences depend on the properties of the model
indicate the superiority of FIML, leading to the lowest being fitted: standard errors, confidence intervals and
rate of convergence failures, least bias in parameter test statistics for the parameter estimates are incorrect
estimates, and lowest inflation in goodness of fit in all cases. In some cases, parameter estimates and
statistics (Enders and Bandalos, 2001; Brown, 1994). values of fit indices are also incorrect.
FIML method is currently available in LISREL Software programs commonly used to conduct
(version 8.50 and above), SYSTAT (RAMONA) and SEM deal with this issue in different ways. Correct
AMOS. estimation of a correlation matrix can be done in
158 R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169

LISREL (Jöreskog and Sörbom, 1996) but requires the relatively unbiased under moderate violations of
user to introduce specific parameter constraints. normality (Bollen, 1989). GLS assumes normality
Although not widely used in OM, RAMONA (Browne but does not impose the restriction of a positive definite
and Mels, 1998), EQS (Bentler, 1989) and SEPATH input matrix. ADF has few distributional assumptions
(Steiger, 1999) automatically provide correct estima- but requires very large sample sizes for accurate
tion with a correlation matrix. Currently, AMOS estimates. OLS, the simplest method, has no distribu-
cannot analyze correlation matrices. In our review, we tional assumptions and is computationally the most
found 24 instances where authors reported using a robust, but it is scale invariant and does not provide fit
correlation matrix with LISREL (out of 69 models run indices or standard errors for estimates.
with LISREL) but most did not mention the necessary Forty-eight percent of applications in our review
additional constraints. We found one instance of using did not report the estimation method used. Of the
AMOS with a correlation matrix. applications that reported the estimation method, a
Given the lack of awareness among users about the majority (68.9%) used ML. Estimation method, data
treatment of R versus S by various software programs, normality, sample size, and model specification are
we direct readers’ attention to a test devised by inextricably linked and must be considered simulta-
MacCallum and Austin (2000) to help users determine neously by the researcher. We suggest that authors
whether a particular SEM program provides correct explicitly state the estimation method used and link it
estimation of a model fit to a correlation matrix. to the properties of the observed variables.
Otherwise, it is preferable to fit models to covariance
matrices, thus insuring correct results. 4.3. Issues related to post-analysis

4.2.3. Estimation methods Post-analysis issues include evaluating the solution


A variety of estimation methods such as maximum achieved from model estimation, model fit, and
likelihood ratio (ML), generalized least square (GLS), respecification of the model. Reports of these data
weighted and unweighted least square (WLS and ULS), from the studied sample are summarized in Tables 2a
asymptotically distribution free (ADF), and ordinary and 2b.
least square (OLS) are available. Their use depends
upon the distributional properties of the MVs, and each 4.3.1. Evaluation of solution
has computational advantages and disadvantages We have organized our discussion of evaluation of
relative to the others. For instance, ML assumes data solutions into overall model fit, measurement model
are univariate and multivariate normal and requires that fit, and structural model fit. To focus solely on the
the input data matrix be positive definite, but it is overall fit of the model while overlooking important

Table 2a
Issues related to data analysis for structural model
Number of models reporting (n = 143) Proportion reporting (%) Results: mean; median Range
x2 107 74.8 204.0; 64.2 (0.0, 1270.0)
x2, p-value 76 53.1 0.21; 0.13 (0.0, 0.94)
GFI 84 58.7 0.93; 0.94 (0.75, 0.99)
AGFI 59 41.3 0.89; 0.90 (0.63, 0.97)
RMR (or RMSR) 51 35.7 0.052; 0.050 (0.01, 0.14)a
RMSEA 51 35.7 0.058; 0.060 (0.00, 0.13)
NFI 49 34.3 0.91; 0.92 (0.72, 0.99)
NNFI (or TLI) 62 43.4 0.95; 0.95 (0.73, 1.07)
CFI 73 51.0 0.96; 0.96 (0.88, 1.00)
IFI (or BL89) 16 11.2 0.94; 0.95 (0.88, 0.98)
Normed x2 (x2/d.f.) reported 52 36.4 1.82; 1.59 (0.02, 4.80)
Normed x2 calculated 98b 68.5 2.17; 1.62 (0.01, 21.71)
a
One model reported RMR = 145.4; this data point omitted as an outlier relative to other reported RMRs.
b
Data not available to calculate others.
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169 159

Table 2b incremental fit indices measure the proportional


Issues related to data analysis for measurement model
improvement in fit when the hypothesized model is
Number of Proportion compared with a restricted, nested baseline model (Hu
models reporting reporting
and Bentler, 1998).
(n = 143) (%)
Absolute measures of fit: The most basic measure of
Reliability assessment 123 86.0
absolute fit is the x2-statistic. Other commonly used
Unidimensionality assessment 94 65.7
Discriminant validity addressed 99 69.2 measures include root mean square error of approx-
Validity issues addressed 76 53.1 imation (RMSEA), root mean square residual (RMR or
(R2; variance explained) SRMR), goodness-of-fit index (GFI) and adjusted
Path coefficients 138 (3) 96.5 (2.1) goodness of fit (AGFI). GFI and AGFI increase as
(confidence intervals)
goodness of fit increases and are bounded above by
Path t-statistics (standard errors) 90 (21) 62.9 (14.7)
Residual information/analysis 19 13.3 1.00, while RMSEA and RMR decrease as goodness of
provided fit increases and are bounded below by zero (Browne
Specification search conducted 20 14.0 and Cudeck, 1989). Ninety-four percent of the
for model respecification applications we reviewed report at least one of these
Modification indices used 21 14.7
measures (Table 2a). Although the frequency of use and
for model respecification
Alternative models compared 29 20.3 the magnitude of each of these measures are similar to
Inconsistency between described 31 21.7 those reported in marketing by Baumgartner and
and tested models Homburg (1996), the ranges in our sample are much
Cross-validation sample used 22 15.4 wider indicating greater variability in empirical OM
Split sample approach used 27 18.9
research. The variability may be an indication of more
complex models and/or a less established theory base.
information about parameters is a common error that Incremental fit measures: Incremental fit measures
we encountered in our review. A model with good compare the model under study to two reference
overall fit but yielding nonsensical parameter esti- models: (1) a worst case or null model, and (2) an ideal
mates is not a useful model. model that perfectly represents the modeled phenom-
ena in the studied population. While there are many
4.3.1.1. Overall model fit. Assessing a model’s fit is incremental fit indices, some of the most popular are
one of the more complicated aspects of SEM because, normed fit index (NFI), non-normed fit index (NNFI or
unlike traditional statistical methods, it relies on non- TLI), comparative fit index (CFI) and incremental fit
significance. Historically, the most popular index used index (IFI or BL89). Sixty-nine percent of the
to assess the overall goodness of fit has been the x2- reviewed studies report at least one of the four
statistic, although its conclusions regarding model measures (Table 2a). An additional fit index that is
significance are generally ignored. The x2-statistic is frequently used is the normed x2 which is reported for
inherently biased when the sample size is large but is 36.4% of models. Because the x2-statistic by itself is
dependent on distributional assumptions associated beset with problems, the ratio of x2 to degrees of
with large samples. Additionally, a x2-test offers a freedom (x2/d.f.) is informative because it corrects for
dichotomous decision strategy (accept /reject) for model size. Additionally, we calculated the normed x2
assessing the adequacy of fit implied by a statistical for all models that reported x2 and either reported
decision rule (Bollen, 1989). In light of these issues, degrees of freedom or enough model specification
numerous alternative fit indices have been developed information to allow us to ascertain the degrees of
to quantify the degree of fit along a continuum (see freedom (68.5% of all applications) and found a
Jöreskog, 1993; Tanaka, 1993; Bollen, 1989, pp. 256– median of 1.62 (range 0.01, 21.71). Small values of
289; Mulaik et al., 1989 for comprehensive reviews). normed x2 (<1.0) can indicate an over-fitted model
Fit indices are commonly distinguished as either and higher values (>3.0–5.0) can indicate an under-
absolute or incremental (Bollen, 1989). In general, parameterized model (Jöreskog, 1969).
absolute fit indices indicate the degree to which the A brief summary of the effects on fit indices of
hypothesized model reproduces the sample data, and small samples, normality violations, model misspe-
160 R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169

cification, and estimation method are reported in perspectives. x2 should be reported with its corre-
Table 3. An ongoing debate about superiority or even sponding degrees of freedom in order to be insightful.
appropriateness of one index over another makes the RMR and RMSEA, two measures that reflect the
issue of selecting which to use in assessing fit very residual differences between the input and implied
complex. For instance, Hu and Bentler (1998) advise (reproduced) matrices, indicate how well matrix
against using GFI and AGFI because they are covariance terms are predicted by the tested model.
significantly influenced by sample size and are RMR in particular performs well under many
insufficiently sensitive to model misspecification. conditions (Hu and Bentler, 1998; Marsh et al.,
Most fit indices are influenced by sample size and 1988). Researchers might also report a summary of
should not be interpreted independently of sample standardized (correlation) residuals because when
size (Hu and Bentler, 1998; Marsh et al., 1988). most or all are ‘‘quite small’’ relative to correlations in
Therefore, no consistent criteria (i.e. cut-offs) can be the tested sample (Browne et al., 2002, p. 418), they
defined to apply in all (or most) instances (Marsh indicate good model fit (Bollen, 1989, p. 258).
et al., 1988).
Until definitive fit indices are developed, research-
ers should report multiple measures of fit so reviewers 4.3.1.2. Measurement model fit. Measurement model
and readers have the opportunity to evaluate the fit can be evaluated in two ways: first, by assessing
underlying fit of the data to the model from multiple constructs’ reliability and convergent and discriminant

Table 3
Influence of sample and estimation characteristics on model fit indices
Small sample (n) bias a Violations of Model misspecification c Estimation General comments
normalityb method effectd
Absolute
x2 Bias establishedf No preference
GFI Poor for small ne Problematic with Misspec’s not ML preferred Use of index not
can be used f ADFe identified by ADFe recommended e
AGFI Poor for small ne,f Problematic with Misspec’s not ML preferred Use of index not
ADFe identified by ADFe recommended e
RMR (or SRMR) ML preferred for Misspec’s identified ML preferred Recommended for all
small ne analysese
RMSEA Tends to over reject Misspec’s identified No preference Use with ADF not
modele recommended e
Incremental
NFI Poor for small ne Some misspec’s ML preferred Use of index not
identified recommended e
NNFI (or TLI) Best index for small nf Misspec’s identified ML preferred
tends to over reject
modele
CFI ML preferred for Misspec’s identified ML preferred
small ne
IFI (or BL89) ML preferred for Misspec’s identified ML preferred
small ne
Normed x2 Bias establishedf No preference
a
While all fit indexes listed suffer small sample bias (approximately n < 250), we consolidate findings by leading researchers.
b
Most normality violations have insignificant effects on fit indexes, except those noted.
c
Identifying model misspecification is a positive characteristic; fit indexes that do not identify misspecification are considered poor choices.
d
The following estimation methods investigated: maximum likelihood ratio (ML), generalized least square (GLS), asymptotically
distribution free (ADF)e,f.
e
Hu and Bentler (1998).
f
Marsh et al. (1988).
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169 161

validity, and second, by examining the individual path informative and we recommend their use in future
(parameter) estimates (Bollen, 1989). studies. In our review, we found that 96.5% of the
Various indices of reliability can be computed to applications report path coefficients, 62.9% provide t
summarize how well LVs are measured by their MVs statistics, 14.7% provide standard errors, and 2.1%
individually or jointly (individual item reliability, report confidence intervals.
composite reliability, and average variance extracted;
cf. Bagozzi and Yi, 1988; Fornell and Larcker, 1981). 4.3.1.3. Structural model fit. In SEM models, the
Our initial attempt to report reliability measures used latent variable model represents the structural model
by the authors proved difficult due to the diversity of fit, and generally, the hypotheses of interest. In PA
methods used. Therefore, we limit our review to models that do not have LVs, the hypotheses of interest
whether authors report at least one of the various are generally represented by the paths between MVs.
measures. Overall, 86.0% of the applications describe Like measurement model fit, the sign, magnitude and
some form of reliability assessment. We recommend statistical significance of the structural path coeffi-
that authors report at least one measure of construct cients are examined in testing the hypotheses.
reliability based on estimated model parameters (e.g. Researchers should recognize the important distinc-
composite reliability or average variance extracted) tion between variance fit (explained variance in
(Bollen, 1989). endogenous variables as measured by R2 for each
Cronbach alpha is an inferior measure of reliability structural equation) and covariance fit (overall good-
because in most cases it is only a lower bound on ness of fit, such as that tested by a x2-test). Authors
reliability (Bollen, 1989). In our review we found that emphasize covariance fit a great deal more than
Cronbach alpha was frequently presented as proof to variance fit; in our review, 53.1% of the models
establish unidimensionality. It is not sufficient for this presented evidence of the variance fit compared to 96%
purpose because a scale may not be unidimensional that presented at least one index of overall fit. It is
even if it has high reliability (Gerbing and Anderson, important to distinguish between these two types of fit
1984). Our review also examined how published because a model might fit well but not explain a
research dealt with the issue of discriminant validity. significant amount of variation in endogenous variables
We found that 69.2% of all applications included or conversely, fit poorly and explain a large amount of
evidence of discriminant validity. Our review indicates variation in endogenous variables (Fornell, 1983).
that despite a lack of standardization in the reported In summary, we suggest that fit indices should
measures, most published research in OM includes not be regarded as measures of usefulness of a model.
some measure of reliability, unidimensionality and They each contain some information about model fit but
validity. none about model plausibility (Browne and Cudeck,
Another way to assess measurement model fit is to 1993). Rather than establishing that fit indices meet
evaluate path estimates. In evaluating path estimates, arbitrarily established cut-offs, future research should
sign (positive or negative), strength, and significance report a variety of absolute and incremental fit indices
should be aligned with theory. The magnitude of for measurement, structural, and overall models and
standard errors associated with path estimates should include a discussion of interpretation of fit indices
be small; a large standard error indicates an unstable relative to the study design. We found many instances in
parameter estimate that is subject to sampling error. which authors conclude that a particular model had
Although recommended but rarely used in practice, better fit than alternative models based on comparing fit
the 90% confidence interval (CI) around each path indices. While some fit indices can be useful for such
estimate is very useful (Browne and Cudeck, 1993). comparisons, most commonly employed fit indices
The CI provides an explicit indication of the degree of cannot be compared across models in this manner (e.g. a
parameter estimate precision. Additionally, the sta- model with a lower RMSEA does not indicate better fit
tistical significance of path estimates can be inferred than a model with a higher RMSEA). For nested
from the 90% CI: if the 90% CI includes zero, then the alternate models, x2 difference test or Target Coeffi-
path estimate is not significantly different from zero cient can be used (Marsh and Hocevar, 1985). For
(at a = 0.05). Overall, confidence intervals are very alternate models that are not nested, parsimony fit
162 R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169

measures such as Parsimonious NFI, Parsimonious alternate a priori models (either nested or unnested) to
GFI, Akaike information criterion (AIC) and normed uncover the model that the observed data support best
x2 can be used (Hair et al., 1998). rather than use specification searches (Browne and
Cudeck, 1989). Such practices may have a lower
4.3.2. Model respecification probability of identifying models with great fit, but
Although no model fits the real world exactly, a they increase the alignment of modeling results with
desirable outcome in SEM analysis is to show that a our existing knowledge and theories. Leading journals
hypothesized model provides a good approximation of must show a willingness to publish poor fitting models
real world phenomena, as represented by an observed for such advancement of knowledge and theory.
set of data. When an initial model of interest does not
satisfy this objective, researchers often alter the model
to improve its fit to the data. Modification of a 5. Presentation and interpretation of results
hypothesized model to improve its parsimony and/or
fit to the data is termed a ‘‘specification search’’ We encountered many difficulties related to
(Leamer, 1978; Long, 1983). A specification search is presentation and interpretation of models, methods,
designed to identify and eliminate errors from the analysis, and results in our review. In a majority of
original specification of the hypothesized model. articles, we had difficulty determining either the
Jöreskog and Sörbom (1996) describe three complete model (e.g. correlated measurement errors)
strategies in model specification (and evaluation): or the complete set of MVs. Whether the model was fit
(1) strictly confirmatory, where a single a priori model to a correlation or covariance matrix could not be
is studied; (2) model generation, where an initial ascertained for nearly half of the models, and reporting
model is fit to data and then modified (frequently with of fit results was incomplete in a majority of models.
the use of modification indices) until it fits adequately; In addition, issues of causation in cross-sectional
and (3) alternative models, where multiple a priori designs, generalizability, and confirmation bias also
models are studied. Although not improper, the raise concerns and are discussed in detail below.
‘‘strictly confirmatory’’ approach is highly restrictive
and does not leave the researcher any latitude if the 5.1. Causality
model does not work. The model generation approach
is troublesome because of the potential for abuse, Each of the applications we reviewed used a cross-
results that lack validity (MacCallum, 1986), and high sectional research design. The debate over whether
susceptibility to capitalization on chance (MacCallum concurrent measurement of variables can be used to
et al., 1992). Simulation work by MacCallum (1990) infer causality is vibrant but unresolved (Gollob and
and Homburg and Dobartz (1992) indicates that only Reichardt, 1991; Gollob and Reichardt, 1987;
half of specification searches (even with correct MacCallum and Austin, 2000). One point of agree-
restrictions and large samples) are successful in ment is that causal interpretation must be based on the
recovering the correct underlying model. theoretical grounding of and empirical support for a
In our review, 28.7% (41 of 143) of the applications model (Pearl, 2000). In light of this ongoing debate,
reported making post hoc changes to respecify the we suggest that OM researchers describe the theory
model. We also examined the published articles for they are testing and its expected manifested results as
inconsistency between the model that was tested clearly as possible prior to conducting analysis.
versus the model described in the text. In 31 out of 143
cases we found such inconsistency, where we could 5.2. Generalizability
not match the described model with the tested model.
We suspect that in many cases, authors made post hoc ‘‘Generalizability of findings’’ refers to the applic-
changes (perhaps to improve model fit), but those ability of findings from one study with a finite, often
changes were not well described. We found only small sample to a population (or other populations).
20.3% of the models were tested using alternate Findings from single studies are subject to limitations
models. We recommend that researchers compare due to sample or selection effects and their impact on
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169 163

the conclusions that can be drawn. In our review, such 5.3. Confirmation bias
limitations were seldom acknowledged and results were
usually interpreted and discussed as if they were Confirmation bias is defined as a prejudice in favor
expansively generalizable. Sample and selection effects of the evaluated model (Greenwald et al., 1986). Our
are controlled (but not eliminated) by identifying a review suggests that OM researchers (not unlike
specific population and from it selecting a sample that is researchers in other fields) are highly susceptible to
appropriate to the objectives of the study. Rather than confirmation bias. Researchers evaluate a single
identifying a specific population, the articles we model, give an overly positive evaluation of model
reviewed focused predominantly on describing their fit, and are reluctant to consider alternative explana-
samples. However, a structural equation model is a tions of data. An associated problem in this context is
hypothesis about the structure of relationships among the existence of equivalent models, alternative models
MVs and LVs in a specific population, and this that are indistinct from the original model in terms of
population should be explicitly identified. goodness of fit to the data but with a distinct
Another aspect of generalizability involves repli- substantive meaning in terms of the underlying theory
cating the results of a study in a different sample from (MacCallum et al., 1993). In a study of 53 published
the same population. We found that 15.4% of the applications in psychology, MacCallum et al. (1993)
reviewed applications used cross-validation and showed that equivalent models exist routinely in large
18.9% used a split sample approach. Given the numbers and are universally ignored by researchers. In
difficulty in obtaining responses from multiple order to mitigate problems related to confirmation
samples from a given population, the expected bias, we recommend that OM researchers generate
cross-validation index (ECVI), an index computed multiple alternate, equivalent models a priori and if
from a single sample, can indicate how well a solution one or more of these models cannot be eliminated due
obtained in one sample is likely to fit an independent to theoretical reasons or poor fit, to explicitly discuss
sample from the same population (Browne and the alternate explanation(s) underlying the data rather
Cudeck, 1989; Cudeck and Browne, 1983). than confirming and presenting results from one
Selecting the most appropriate set of measurement definitive model (MacCallum et al., 1993).
items to represent the domain of underlying LVs is
critical when using SEM. However, there are few
standardized instruments for LVs, making progress in 6. Discussion and conclusion
empirical OM research slow and difficult. Appropriate
operationalization of LVs is as critical as their repeated SEM has rapidly become an important and widely
use: repetition helps to establish validity and used research tool in the OM literature. Its attractive-
reliability. (For a detailed discussion and guidelines ness to OM researchers can be attributed to two
on the selection effects related to good indicators, see factors. From CFA, SEM draws upon the notion of
Little et al., 1999; for OM measurement scales, see Roth unobserved or latent variables, and from PA, SEM
andSchroeder,inpress.)A challenging issueariseswhen adopts the notion of modeling direct and indirect
researchers are unable to validate previously used scales. relationships. These advantages, combined with the
In such situations, we suggest a two-pronged strategy. availability of ever more user-friendly software, make
First, a priori the researcher must examine the it likely that SEM will enjoy widespread use in the
assumptions employed in developing the previous future. We have provided both a review of the OM
scales and state their impact on replication. Second, literature employing SEM as well as discussion and
upon failure to replicate with validity, the researcher guidelines for improving its future use. Table 4
must use an exploratory means to develop modified contains a summary of some of the most important
scales to be validated by future researchers. However, issues discussed here, their implications, and recom-
this respecified model should not be given the status mendations for resolving these challenges. Below, we
of a hypothesized model and would need to be validated briefly discuss these issues.
in the future with another sample from the same As researchers, we should ensure that SEM is the
population. correct method for examining the research question at
164 R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169

Table 4
Implications and recommendations for select SEM issues
Implications Recommendations
Formative (causal indicators) Bollen (1989), MacCallum and Model as causal indicators
Browne (1993): (MacCallum and Browne, 1993)
Without additional constraints, the model Report appropriate conditions and modeling issues
is generally unidentified
Poorly developed or weak Hurley et al. (1997): Use alternative methods that demand less
relationships More likely to result in a poor fitting rigorous model specification such as EFA
model requiring specification searches and and Regression Analysis (Hurley et al., 1997)
post hoc model respecification
Violating multivariate MacCallum et al. (1992): Use estimation methods that adjust for violation
normality Inflated goodness of fit such as ‘‘ML, Robust’’ available in EQS;
statistics; Underestimated Use estimation methods that do not assume
standard errors multivariate normality such as GLS, ADF
Correlation matrix LISREL is inappropriate without Type of input matrix and software must be reported
as input data additional constraints (Cudeck, 1989): RAMONA in SYSTAT (Browne and Mels, 1998),
Standard errors, confidence intervals and EQS (Bentler, 1989), SEPATH (Steiger, 1999)
test statistics for parameter estimates are can be used LISREL can be used with additional
incorrect in all cases constraints (LISREL 8.50)
Parameter estimates and fit indices are AMOS cannot be used
incorrect in some cases
Small sample size MacCallum et al. (1996), Marsh et al. (1988), Conduct and report statistical power
Hu and Bentler (1998): Simpler models (fewer parameters estimated, higher
Associated with lower power, ceteris paribus degrees of freedom) are associated with higher
Parameter estimates have lower reliability power (MacCallum et al., 1996)
Fit indices are overestimated Use fit indices that are less biased to small sample size
such as NNFI; avoid fit indices that are more biased,
such as x2, GFI and NFI (Hu and Bentler, 1998)
Few degrees of freedom (d.f.) MacCallum et al. (1996): Report degrees of freedom
Associated with lower power, ceteris paribus Conduct and report statistical power
Parameter estimates have lower reliability Simpler models (fewer parameters estimated, higher
Fit indices are overestimated degrees of freedom) are associated with higher
power (MacCallum et al., 1996)
Model identification d.f. = 0, results are not generalizable Desirable condition (d.f. > 0)
d.f. < 0, model cannot be estimated unless Assess and report model identification
some parameters are fixed or held constant Explicitly discuss implication of unidentified models
on generalizability of results
Number of MVs/LV To provide adequate representation of Have at least three MVs per LV for CFA/SEM
content domain, need sufficient MVs/LV (Rigdon, 1995)
One MV per LV May not provide adequate representation of Model as MV (not LV)
content domain Poor reliability and Single MV can be modeled as LV only when MV is the
validity because error variance cannot be perfect representation of the LV; specific conditions must
estimated (Maruyama, 1998) be imposed for identification purposes (LISREL 8.50)
Model is generally unidentified
Correlated measurement Gerbing and Anderson (1984): Report correlated errors Justify their theoretical
errors validity a priori
Alters measurement and structural Discuss the impact on measurement and structural
parameter estimates parameter estimates and model fit
Almost always improves model fit
Changes the substantive meaning of the model
Non-recursive models Without additional constraints the model Explicitly report model is non-recursive and its cause
is unidentified Add constraints and report their impact (Long, 1983)
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169 165

hand. When theory development is at a nascent stage reporting of results and, in numerous instances, our
and patterns of relationships among LVs are relatively inability to reconstruct the tested model based on the
weak, SEM should be used with caution so that model description in the text and the reported degrees of
confirmation and theory testing do not degenerate into freedom. These issues can be resolved by attention to
extensive model respecification. Likewise, it is published guidelines for presenting results of SEM (e.g.
important that we use appropriate measurement Hoyle and Panter, 1995). To assist both during the
methods and understand the distinction between review process and in building a cumulative tradition in
formative and reflective variables. the OM field, sufficient information needs to be
Determining minimum sample size is, in part, provided to understand (1) the population from which
dependent upon the number of parameter estimates in the data sample was obtained, (2) the distribution of the
the hypothesized model. But emerging research in this data, (3) the hypothesized measurement and structural
area indicates that the relationship between sample models, and (4) statistical results to corroborate the
size and number of parameter estimates is complex subsequent interpretation and conclusions.
and dependent upon MV characteristics (MacCallum We recommend that every published application of
et al., 2001). Likewise, guidelines on degrees of SEM provide a clear and complete specification of the
freedom and model identification are not simple or model(s) and variables, preferably in the form of a
straightforward. Researchers must be cognizant of graphical figure, including the measurement model
these issues and we recommend that all studies discuss linking LVs to MVs, the structural model connecting
them explicitly. As the powerful capabilities of SEM LVs, and specification of which parameters are being
derive partly from its highly restrictive simplifying estimated and which are fixed. It is helpful to identify
assumptions, it is important that assumptions such as specific research hypotheses on the graphical figure,
normality and skewness are carefully assessed prior to both to clarify the model and to reduce the text needed
generating an input matrix and conducting analysis. to describe them. In addition to including a statement
With regard to model estimation, researchers about the type of input data matrix, software and
should recognize that parameter estimates are not estimation method used, we recommend the input
fixed values, but rather depend upon the estimation matrix be included in paper for future replications and
method. For instance, parameter estimates obtained by meta-analytical research studies, but we recognize this
using maximum likelihood ratio are different from is an editorial decision subject to space constraints. In
those obtained using ordinary least squares (Browne terms of statistical results, we suggest researchers
and Arminger, 1995). Further, in evaluating model fit, include multiple measures of fit and criteria for
the correspondence between the hypothesized model evaluating fit along with parameter estimates, and
and the observed data should be assessed using a associated confidence intervals and standard errors.
variety of absolute and incremental fit indices for Finally, interpretation of results should be guided by
measurement, structural, and overall models. In an understanding that models are imperfect and cannot
addition to path coefficients, confidence intervals be made to be exactly correct.
and standard errors should be assessed. We can enrich our knowledge by reviewing the use
Rather than hypothesizing a single model, multiple of SEM in more mature research fields such as
alternate models should be evaluated when possible, psychology and marketing, including methodological
and research results should be cross validated using split advances. Some advances worthy of mention are
or multiple samples. Given the very real possibility of validation studies using the multi-trait multi-method
alternate, equivalent models, researchers should be (MTMM) matrix method (cf. Cudeck, 1988; Wida-
cautious in over-interpreting results. Because no model man, 1985), measurement invariance (Widaman and
represents the real world exactly, we must be more Reise, 1997), and using categorical (Muthen, 1983) or
forthright about the ‘‘imperfection’’ inherent in any experimental data (Russell et al., 1998).
model and acknowledge the literal implausibility of the Our review of published SEM applications in the
model more explicitly (MacCallum, 2003). OM literature suggests that while reporting has
One of the most poignant observations in conduct- improved over time, we need to pay attention to
ing this study was the inconsistency in the published methodological issues in using SEM. Like any
166 R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169

statistical technique or tool, it is important that SEM (matrix), d the error of measurement in exogenous
be used prudently if researchers want to take full manifest variables, y the measures of endogenous
advantage of its potential. SEM is a useful tool to manifest variables, Ly the effect of endogenous LVs
represent multidimensional unobservable constructs on their MVs (matrix), e the error of measurement in
and simultaneously examine structural relationships endogenous manifest variables, j the latent exogenous
that are not well captured by traditional research constructs, h the latent endogenous constructs, G the
methods (Gefen et al., 2000, p. 6). In the future, effect of exogenous constructs on endogenous con-
utilizing the guidelines presented here will improve structs (matrix), B the effect of endogenous constructs
the use of SEM in OM research, and thus, improve our on each of the other endogenous constructs (matrix)
collective understanding of OM theory and practice. and z is the errors in equations or residuals.
It is also necessary to define the following
covariance matrices:
Acknowledgements
(a) f = E(jj0 ) is a covariance matrix for the exogen-
We thank Michael Browne and Sriram Thirumalai ous LVs.
for helpful comments on this paper. We also thank (b) ud = E(dd0 ) is a covariance matrix for the
Carlos Rodriguez for assistance with article screening measurement errors in the exogenous MVs.
and data coding. (c) ue = E(ee0 ) is a covariance matrix for the measure-
ment errors in the endogenous MVs.
(d) c = E(zz0 ) is a covariance matrix for the errors in
Appendix A. Mathematical specification of equation for the endogenous LVs.
structural equation modeling

A structural equation model can be defined as a Given this mathematical representation, it can be
hypothesis of a specific pattern of relations among a shown that the population covariance matrix for the
set of measured variables (MVs) and latent variables MVs is a function of eight parameter matrices: Lx, Ly,
(LVs). The three equations presented below are G, B, f, ud, ue and c. Thus, given a hypothesized model
fundamental to SEM. Eq. (1) represents the directional in terms of fixed and free parameters of the eight-
influences of the exogenous LVs (j) on their indicators parameter matrices, and given a sample covariance
(x). Eq. (2) represents the directional influences of the matrix for the MVs, one can solve for estimates of the
endogenous LVs (h) on their indicators (y). Thus, free parameters of the model. The most common
Eqs. (1) and (2) link the observed (manifest) variables approach for fitting the model to data is to obtain
to unobserved (latent) variables through a factor maximum likelihood estimates of parameters, and an
analytic model and constitute the ‘‘measurement’’ accompanying likelihood ratio x2-test of the null
portion of the model. Eq. (3) represents the hypothesis that the model holds in the population.
endogenous LVs (h) as linear functions of other The notation above uses SEM as developed by
exogenous LVs (j) and endogenous LVs plus residual Jöreskog (1974) and represented in LISREL (Jöreskog
terms (z). Thus, Eq. (3) specifies relationships between and Sörbom, 1996).
LVs through a structural equation model and
constitutes the ‘‘structural’’ portion of the model
References
x ¼ Lx j þ d (1)
Anderson, J.C., Gerbing, D.W., 1988. Structural equation modeling
y ¼ Ly h þ e (2) in practice: a review and recommended two step approach.
Psychological Bulletin 103 (3), 411–423.
h ¼ Bh þ G j þ z (3) Anderson, J.C., Gerbing, D.W., 1984. The effects of sampling error
on convergence, improper solutions, and goodness-of-fit indices
where x is the measures of exogenous manifest vari- for maximum likelihood confirmatory factor analysis. Pyscho-
ables, Lx the effect of exogenous LVs on their MVs metrika 49, 155–173.
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169 167

Bagozzi, R.P., Heatherton, T.F., 1994. A general approach to Cudeck, R., Browne, M.W., 1983. Cross-validation of covariance
representing multifaceted personality constructs: application structures. Multivariate Behavioral Research 18 (2), 147–167.
to state self-esteem. Structural Equation Modeling 1 (1), 35–67. Enders, C.K., Bandalos, D.L., 2001. The relative performance of full
Bagozzi, R.P., Yi, Y., 1988. On the evaluation of structural equation information maximum likelihood estimation for missing data in
models. Journal of the Academy of Marketing Science 16 (1), structural equation models. Structural Equation Modeling 8 (3),
74–94. 430–457.
Barman, S., Hanna, M.D., LaForge, R.L., 2001. Perceived relevance Fornell, C., 1983. Issues in the application of covariance structure
and quality of POM journals: a decade later. Journal of Opera- analysis. Journal of Consumer Research 9 (4), 443–448.
tions Management 19 (3), 367–385. Fornell, C., Larcker, D.F., 1981. Evaluating structural equation
Baumgartner, H., Homburg, C., 1996. Applications of structural models with unobservable variables and measurement errors.
equation modeling in marketing and consumer research: a review. Journal of Marketing Research 18 (1), 39–50.
International Journal of Research in Marketing 13 (2), 139–161. Fornell, C., Rhee, B., Yi, Y., 1991. Direct regression, reverse
Bentler, P.M., 1989. EQS: Structural Equations Program Manual. regression, and covariance structural analysis. Marketing Letters
BMDP Statistical Software, Los Angeles, CA. 2 (3), 309–320.
Bentler, P.M., Chou, C.P., 1987. Practical issues in structural Garver, M.S., Mentzer, J.T., 1999. Logistics research methods:
modeling. Sociological Methods and Research 16 (1), 78–117. employing structural equation modeling to test for construct
Bollen, K.A., 1989. Structural Equations with Latent Variables. validity. Journal of Business Logistics 20 (1), 33–57.
Wiley, New York. Gefen, D., Straub, D.W., Boudreau, M., 2000. Structural equation
Bollen, K.A., Lennox, R., 1991. Conventional wisdom on measure- modeling and regression: guidelines for research practice. Com-
ment: a structural equation perspective. Psychological Bulletin munications of the AIS 1 (7), 1–78.
110, 305–314. Gerbing, D.W., Anderson, J.C., 1984. On the meaning of within-
Brannick, M.T., 1995. Critical comments on applying covariance factor correlated measurement errors. Journal of Consumer
structure modeling. Journal of Organizational Behavior 16 (3), Research 11, 572–580.
201–213. Goh, C., Holsapple, C.W., Johnson, L.E., Tanner, J.R., 1997.
Brown, R.L., 1994. Efficacy of the indirect approach for estimating Evaluating and classifying POM journals. Journal of Operations
structural equation models with missing data: a comparison of Management 15 (2), 123–138.
five methods. Structural Equation Modeling 1, 287–316. Gollob, H.F., Reichardt, C.S., 1987. Taking account of time lags in
Browne, M.W., Arminger, G., 1995. Specification and estimation of causal models. Child Development 58 (1), 80–92.
mean and covariance structure models. In: Arminger, G., Clogg, Gollob, H.F., Reichardt, C.S., 1991. Interpreting and estimating
C.C., Sobel, M.E. (Eds.), Handbook of Statistical Modeling for indirect effects assuming time lags really matter. In: Collins,
the Social and Behavioral Sciences. Plenum, New York, pp. L.M., Horn, J.L. (Eds.), Best Methods for the Analysis of
185–249. Change. American Psychological Association, Washington,
Browne, M.W., Cudeck, R., 1989. Single sample cross-validation DC, pp. 243–259.
indices for covariance structures. Multivariate Behavioral Greenwald, A.G., Pratkanis, A.R., Leippe, M.R., Baumgartner,
Research 24 (4), 445–455. M.H., 1986. Under what conditions does theory obstruct
Browne, M.W., Cudeck, R., 1993. Alternative ways of assessing research progress? Psychological Review 93 (2), 216–229.
model fit. In: Bollen, K.A., Long, J.S. (Eds.), Testing Structural Hair Jr., J.H., Anderson, R.E., Tatham, R.L., Black, W.C., 1998.
Equation Models. Sage, Newbury Park, CA, pp. 136–161. Multivariate Data Analysis. Prentice-Hall, New Jersey.
Browne, M.W., Mels, G., 1998. Path analysis: RAMONA. In: Hershberger, S.L., 2003. The growth of structural equation model-
SYSTAT for Windows: Advanced Applications (Version 8), ing: 1994–2001. Structural Equation Modeling 10 (1), 35–46.
SYSTAT, Evanston, IL. Homburg, C., Dobartz, A., 1992. Covariance structure analysis via
Browne, M.W., MacCallum, R.C., Kim, C., Anderson, B.L., Glaser, specification searches. Statistical Papers 33 (1), 119–142.
R., 2002. When fit indices and residuals are incompatible. Hoyle, R.H., Panter, A.T., 1995. Writing about structural equation
Psychological Bulletin 7 (4), 403–421. modeling. In: Hoyle, R.H. (Ed.), Structural Equation Modeling:
Chin, W.W., 1998. Issues and opinion on structural equation mod- Concepts, Issues, and Applications. Sage, Thousand Oaks, CA,
eling. MIS Quarterly 22 (1), vii–xvi. pp. 158–176.
Chin, W.W., Todd, P.A., 1995. On the use, usefulness, and ease of Hu, L., Bentler, P.M., 1998. Fit indices in covariance structure
use of structural equation modeling in MIS research: a note of modeling: sensitivity to under-parameterized model misspeci-
caution. MIS Quarterly 19 (2), 237–246. fication. Psychological Methods 3 (4), 424–453.
Cohen, P., Cohen, J., Teresi, J., Marchi, M., Velez, C.N., 1990. Hurley, A.E., Scandura, T.A., Schriesheim, C.A., Brannick, M.T.,
Problems in the measurement of latent variables in structural Seers, A., Vandenberg, R.J., Williams, L.J., 1997. Exploratory
equations causal models. Applied Psychological Measurement and confirmatory factor analysis: guidelines, issues, and
14 (2), 183–196. alternatives. Journal of Organizational Behavior 18 (6), 667–
Cudeck, R., 1988. Multiplicative models and MTMM matrices. 683.
Multivariate Behavioral Research 13, 131–147. Jackson, D.L., 2003. Revisiting the sample size and number of
Cudeck, R., 1989. Analysis of correlation matrices using covariance parameter estimates: some support for the N:q hypothesis.
structure models. Psychological Bulletin 105, 317–327. Structural Equation Modeling 10 (1), 128–141.
168 R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169

Jöreskog, K.G., 1969. A general approach to confirmatory max- Marsh, H.W., 1998. Pairwise deletion for missing data in structural
imum likelihood factor analysis. Psychometrika 34 (2 Part 1), equation models: nonpositive definite matrices, parameter esti-
183–202. mates, goodness of fit, and adjusted sample sizes. Structural
Jöreskog, K.G., 1974. Analyzing psychological data by structural Equation Modeling 5, 22–36.
analysis of covariance matrices. In: Atkinson, R.C., Krantz, Marsh, H.W., Balla, J.R., McDonald, R.P., 1988. Goodness-of-fit
D.H., Luce, R.D., Suppes, P. (Eds.), Contemporary develop- indexes in confirmatory factor analysis: the effect of sample size.
ments in mathematical psychology, vol. II. W.H. Freeman, San Psychological Bulletin 103 (3), 391–410.
Francisco, pp. 1–56. Marsh, H.W., Hocevar, D., 1985. Applications of confirmatory
Jöreskog, K.G., 1993. Testing structural equation models. In: Bol- factor analysis to the study of self concept: first and higher
len, K.A., Long, J.S. (Eds.), Testing Structural Equation Models. order factor models and their invariance across groups. Psycho-
Sage, Newbury Park, CA, pp. 294–316. logical Bulletin 97, 562–582.
Jöreskog, K.G., Sörbom, D., 1996. LISREL 8: User’s Reference Maruyama, G., 1998. Basics of Structural Equation Modeling. Sage,
Guide. Scientific Software International Inc., Chicago, IL. Thousand Oaks, CA.
Leamer, E.E., 1978. Specification Searches: Ad-hoc Inference with Medsker, G.J., Williams, L.J., Holahan, P., 1994. A review of current
Non-experimental Data. Wiley, New York. practices for evaluating causal models in organizational beha-
Lei, M., Lomax, R.G., 2005. The effect of varying degrees of vior and human resources management research. Journal of
nonnormality in structural equation modeling. Structural Equa- Management 20 (2), 439–464.
tion Modeling 12 (1), 1–27. Mulaik, S.S., James, L.R., Van Alstine, J., Bennett, N., Lind, S.,
Little, T.D., Lindenberger, U., Nesselroade, J.R., 1999. On select- Stillwell, C.D., 1989. An evaluation of goodness of fit indices for
ing indicators for multivariate measurement and modeling structural equation models. Psychological Bulletin 105 (3), 430–
with latent variables: when ’good’ indicators are bad and 445.
’bad’ indicators are good. Psychological Methods 4 (2), Muthen, B., 1983. Latent variable structural equation modeling
192–211. with categorical data. Journal of Econometrics 22 (1/2),
Long, J.S., 1983. Covariance Structure Models: An Introduction to 43–66.
LISREL. Sage, Beverly Hill, CA. Muthen, B., Kaplan, D., Hollis, M., 1987. On structural equation
MacCallum, R.C., 2003. Working with imperfect models. Multi- modeling with data that are not missing completely at random.
variate Behavioral Research 38 (1), 113–139. Psychometrika 52, 431–462.
MacCallum, R.C., 1990. The need for alternative measures of fit in Pearl, J., 2000. Causality: Models, Reasoning, and Inference. Cam-
covariance structure modeling. Multivariate Behavioral bridge University Press, Cambridge, UK.
Research 25 (2), 157–162. Rigdon, E.E., 1995. A necessary and sufficient identification rule for
MacCallum, R.C., 1986. Specification searches in covariance structural models estimated in practice. Multivariate Behavioral
structure modeling. Psychological Bulletin 100 (1), 107– Research 30 (3), 359–383.
120. Roth, A., Schroeder, R., in press. Handbook of Multi-item Scales for
MacCallum, R.C., Austin, J.T., 2000. Applications of structural Research in Operations Management. Sage.
equation modeling in psychological research. Annual Review Russell, D.W., Kahn, J.H., Spoth, R., Altmaier, E.M., 1998. Analyz-
of Psychology 51 (1), 201–226. ing data from experimental studies: a latent variable structural
MacCallum, R.C., Browne, M.W., 1993. The use of causal indicators equation modeling approach. Journal of Counseling Psychology
in covariance structure models: some practical issues. Psycho- 45, 18–29.
logical Bulletin 114 (3), 533–541. Satorra, A., 2001. Goodness of fit testing of structural equations
MacCallum, R.C., Browne, M.W., Sugawara, H.M., 1996. Power models with multiple group data and nonnormality. In: Cudeck,
analysis and determination of sample size for covariance struc- R.C., du Toit, S., Sörbon, D. (Eds.), Structural Equation Model-
ture modeling. Psychological Methods 1 (1), 130–149. ing: Present and Future. Scientific Software International, Lin-
MacCallum, R.C., Roznowski, M., Necowitz, L.B., 1992. Model colnwood, IL, pp. 231–256.
modifications in covariance structure analysis: the problem of Sedlmeier, P., Gigenrenzer, G., 1989. Do studies of statistical power
capitalization on chance. Psychological Bulletin 111 (3), 490– have an effect on the power of the studies? Psychological
504. Bulletin 105 (2), 309–316.
MacCallum, R.C., Wegener, D.T., Uchino, B.N., Fabrigar, L.R., Shook, C.L., Ketchen, D.J., Hult, G.T.M., Kacmar, K.M., 2004. An
1993. The problem of equivalent models in applications of assessment of the use of structural equation modeling in strategic
covariance structure analysis. Psychological Bulletin 114 (1), management research. Strategic Management Journal 25 (4),
185–199. 397–404.
MacCallum, R.C., Widaman, K.F., Preacher, K.J., Hong, S., 2001. Soteriou, A.C., Hadijinicola, G.C., Patsia, K., 1998. Assessing
Sample size in factor analysis: the role of model error. Multi- production and operations management related journals: the
variate Behavioral Research 36 (4), 611–637. European perspective. Journal of Operations Management 17
Malhotra, M.K., Grover, V., 1998. An assessment of survey research (2), 225–238.
in POM: from constructs to theory. Journal of Operations Steiger, J.H., 1999. Structural equation modeling (SEPATH). Sta-
Management 16 (4), 407–425. tistica for Windows, vol. III. StatSoft, Tulsa, OK.
R. Shah, S.M. Goldstein / Journal of Operations Management 24 (2006) 148–169 169

Steiger, J., 2001. Driving fast in reverse. Journal of American West, S.G., Finch, J.F., Curran, P.J., 1995. Structural equation
Statistical Association 96, 331–338. models with nonnormal variables: problems and remedies. In:
Tanaka, J.S., 1987. How big is big enough? Sample size and Hoyle, R.H. (Ed.), Structural Equation Modeling: Issues,
goodness of fit in structural equation models with latent Concepts, and Applications. Sage, Newbury Park, CA, pp.
variables. Child Development 58, 134–146. 56–75.
Tanaka, J.S., 1993. Multifaceted conceptions of fit in structural Widaman, K.F., 1985. Hierarchically nested covariance structure
equation models. In: Bollen, K.A., Long, J.S. (Eds.), Testing models for multitrait-multimethod data. Applied Psychological
Structural Equation Models. Sage, Newbury Park, CA, pp. 10–39. Measurement 9, 1–26.
Teel, J.E., Bearden, W.O., Sharma, S., 1986. Interpreting LISREL Widaman, K.F., Reise, S., 1997. Exploring the measurement invar-
estimates of explained variance in non-recursive structural equa- iance of psychological instruments: applications in the substance
tion models. Journal of Marketing Research 23 (2), 164–168. use domain. In: Bryant, K.J., Windle, M., West, S.G. (Eds.), The
Vokurka, R.J., 1996. The relative importance of journals used in Science of Prevention: Methodological Advances from Alcohol
operations management research: a citation analysis. Journal of and Substance Abuse. American Psychological Association,
Operations Management 14 (3), 345–355. Washington, DC, pp. 281–324.

You might also like