MacKenzie et al./Validation Procedures in MIS & Behavioral Research
It has been over 20 years since Straub (1989) made thefollowing observation about the MIS literature:Instrument validation has been inadequately ad-dressed in MIS research. Only a few researchershave devoted serious attention to measurementissues over the last few decades…and while thedesirability of verifying findings through internalvalidity checks has been argued by Jarvenpaa, et al.(1984), the primary and prior value of instrumentvalidation has yet to be widely recognized (p. 147).Approximately a dozen years later, in a retrospective on theStraub article, Boudreau et al. (2001) surveyed the MIS litera-ture again to assess whether there had been any improvementin the use of construct validation techniques, and concludedthat their “findings suggest that the field has advanced inmany areas, but, overall, it appears that a majority of pub-lished studies are still not sufficiently validating their instru-ments.” (p. 1). Similar concerns regarding the practices usedto validate constructs have also been expressed in the field of management by Scandura and Williams (2000), who com- pared the methodological practices reported in three top journals in two different time periods (1985–1987 and 1995– 1997), and concluded that there had actually been a decreasein the proportion of studies that reported information aboutconstruct validity and reports of discriminant, convergent, and predictive validity. Therefore, the observation that Bagozziand Phillips (1982, p. 468) made over 25 years ago still ringstrue: “Scientists have found it very difficult to translate thisseemingly simple notion [of construct validity] into opera-tional terms.”The reason for the apparent lack of progress in this area cer-tainly is not due to a shortage of articles written on the tech-nical procedures that should be used to validate scales (e.g.,Anderson and Gerbing 1988; Anderson et al. 1987; Bagozziet al. 1991; Diamantopoulos and Winklhofer 2001; Edwards2001; Fornell and Larcker 1981; Gerbing and Anderson 1988; Nunnally and Bernstein 1994; Straub et al. 2004). However,one possibility is that the researchers reading these articlesabsorb only a portion of what is said because many of thesearticles are complex and require a fairly well-developed tech-nical knowledge of structural equation modeling procedures.The result is that readers may not understand how to imple-ment the recommendations made in these articles. An evenmore likely possibility is that there is simply so much work onthe topic of scale development and evaluation that it isdifficult for researchers to prioritize what needs to be done.Indeed, we believe that one reason Churchill’s (1979) seminalarticle has proven to be so useful to researchers is that heoutlined an organized set of activities that set priorities for what needs to be done in the scale development and evalua-tion process. Therefore, in the spirit of Churchill, the goal of this research is to provide an updated set of recommendationsthat can be used to give researchers a framework for developing valid scales.We believe that there are several reasons why an updated setof recommendations would be useful. First, many of the scaledevelopment procedures advocated in the literature fail toadequately discuss how to develop appropriate conceptualdefinitions of a focal construct. Second, many of the recom-mendations are based on an improper specification of themeasurement model
that relates the latent variable repre-senting a construct to its measures.
Finally, techniques that provide evidence that the scale actually measures what it purports to measure have been underutilized in the manage-ment and MIS literatures. In the sections that follow, we will briefly elaborate on each of the limitations identified above.Following this, we will discuss each of the steps in the scaledevelopment process while paying particular attention to thedifferences that are required when one is attempting todevelop scales for constructs with formative indicators asopposed to constructs with reflective indicators. Finally, wediscuss several steps that should be taken after the initialdevelopment of a scale to examine its generalizability and toenhance its usefulness.
Limitations of Current ScaleDevelopment Procedures
Failure to Adequately Define theConstruct Domain
Even though virtually every discussion of the constructvalidation or scale development process assumes that it beginswith a clear articulation of the construct domain, the existing
For the purposes of our paper, we use the term
to refer to a model specifying the relationships between a latent construct and itsindicators. Note that some (e.g., Borsboom 2005) prefer to use the termmeasurement model in a more restricted sense to refer only to instances inwhich a latent construct has a causal impact on its indicators.
For the purposes of our paper, we use the term
to refer to a standardused to determine or assess the magnitude of an attribute possessed by anentity. This term will sometimes be used interchangeably with the terms
depending on the context, because an
of anattribute and a response to it can be used as an
of a latent construct.
MIS Quarterly Vol. 35 No. 2/June 2011