You are on page 1of 4

157

QUASI -EXPERIMENTAL DESIGN1

Donald T. Campbell

Northwestern University

This phrase refers to the application of (persons, etc.) under study. This independence
an experimental mode of analysis and interpreta- makes resulting differences interpretable as
tion to bodies of data not meeting the full effects of the differences in treatment. In the
requirements of experimental control because social sciences this independence of prior status
experimental units are not assigned at random to is assured by randomization in assignments to
at least two "treatment" conditions. The set- treatments. Experiments meeting these require-
tings to which it isappropriate are those of ments, and thus representing "true" experiments,
experimentation in social settings, including are much more possible in the social sciences
planned interventions such as specific communi- than is generally realized. Wherever, for ex-
cations, persuasive efforts, changes in condi- ample, the treatments can be applied to individ-
tions and policies, efforts at social remedia- uals or small units (such as precincts.or class-
tion, etc. Unplanned conditions and events may rooms) without the respondents' being aware of
also be analyzed in this way where an exogeneous experimentation or that other units are getting
variable has such discreteness and abruptness as different treatments, very elegant experimental
to make appropriate its consideration as an ex- control can be achieved. An increased accept-
perimental treatment applied at a specific point ance by administrators of randomization as the
in time to a specific population. When properly democratic method of allocating scarce resources
done, when attention is given to the specific (be these new housing, therapy, or fellowships)
implications of the specific weaknesses of the will make possible field experimentation in many
design in question, quasi -experimental analysis settings. Where innovations are to be introduced
can provide a valuable extension of the experi- throughout a social system, and where the intro-
mental method. duction cannot in any event be simultaneous, a
use of randomization in the staging can provide
While efforts to interpret field data as an experimental comparison of the new and the
experiments go back much farther, the first old, using the groups receiving the delayed in-
prominent methodology of this kind in the social troduction as controls. Nothing in this article
sciences was Chapin's Ex Post Facto Experiment should be interpreted as minimizing the impor-
(Chapin & Queen, 1937; Chapin, 1955; Greenwood, tance of increasing the use of true experimenta-
1945), although it should be noted that due to tion. However, where true experimental design
the failure to control regression artifacts, with random assignment of persons to treatments
this mode of analysis is no longer regarded as is not possible, due to ethical considerations,
acceptable. The American Soldier volumes lack of power, or in feasibility, application of
(Stouffer et al., 1949) provide prominent an- quasi -experimental analysis has much to offer.
alyses of the effects of specific military ex-
periences, where it is implausible that differ- The social sciences must do the best they
ences in selection explain the results. Thorn - can with the possibilities open to them. Infer-
dike's efforts to demonstrate the effects of ences must frequently be made from data lacking
specific course work upon other intellectual complete control. Too often a scientist trained
achievements provide an excellent early model in experimental method rejects out of hand any
(e.g., Thorndike Woodworth, 1901; Thorndike research in which complete control is lacking.
& Ruger, 1923). Extensive analysis and review Yet in practice no experiment is perfectly exe-
of this literature are provided elsewhere cuted, and the practicing scientist overlooks
(Campbell, 1957; 1963; Campbell & Stanley, 1963) those imperfections which seem to him to offer
and serve as the basis for the present abbrevia- no plausible rival explanation of the results.
ted presentation. In the light of modern philosophies of science,
no experiment ever proves a theory, it merely
The core requirement of a "true" experi- probes it. Seeming proof results from that
ment lies in the experimenter's ability to apply condition in which there is no available plausi-
at least two experimental treatments in complete ble rival hypothesis to explain the data. The
independence of the prior states of the materials general program of quasi-experimental analysis
is to specify and examine those plausible rival
explanations of the results which are provided
by the uncontrolled variables. A failure of
control which does not in fact provide a plausi-
1The preparation of this review was ble rival interpretation is not regarded as in-
supported in part by Project C -998, Contract validating.
3 -20 -001, with the Media Research Branch, Office
of Education, U.S. Department of Health, Educa- It is well to remember that we do make
tion, and Welfare, under provisions of Title VII assured causal inferences in many settings not
of the National Defense Education Act. This sym- involving randomization: (The earthquake caused
posium presentation is essentially the same as the brick building to crumble; the automobile
the current draft of my article for the Inter- crashing into it caused the telephone pole to
national Encyclopedia of the Social Sciences. break; the language patterns of the older models
158

and mentors caused this child to speak English changes in the observers or scorers used may
rather than Kwakiutl; etc.) While these are all produce changes in the obtained measurements.
potentially erroneous inferences, they are of the 5. Statistical regression: operating where
same type as experimental inferences. We are groups have been selected on the basis of their
confident that were we to intrude experimentally, extreme scores. 6. Selection: biases resulting
we could confirm the causal laws involved. Yet in differential recruitment of respondents for
they have been made assuredly by a nonexperi- the comparison groups. 7. Experimental mortal-
menting observer. This assurance is due to the the differential loss of respondents from
effective absence of other plausible causes. the comparison groups. 8. Selection -maturation
Consider the inference as to crashing auto and interaction: In certain of the multiple -group
the telephone pole: we rule out combinations of quasi -experimental designs, such as the non-
termites and wind because the other implications equivalent control group design, such interaction
of these theories (e.g., termite tunnels and is confounded with, i.e., might be mistaken for,
debris in the wood, wind records at nearby the effect of the experimental variable.
weather stations) do not occur. Spontaneous
splintering of the pole by happenstance coin- Factors jeopardizing external validity or
cident with the auto's onset does not impress us representativeness are: 9. The reactive or in-
as a rival, nor would it explain the damage to teraction effect of testing, in which a pretest
the car, etc. Analogously in quasi -experimental might increase or decrease the respondent's
analysis, tentative causal interpretation of data sensitivity or responsiveness to the experimental
may be made where the interpretation in question variable and thus make the results obtained for a
squares with the data and where other rival in- pretested population unrepresentative of the
terpretations have been rendered implausible. effects of the experimental variable for the
unpretested universe from which the experimental
For the evaluation of data series as respondents were selected. 10. Interaction
quasi- experiments, a set of twelve frequent effects between selection bias and the experi-
threats to validity have been developed. These mental variable. 11. Reactive effects of ex-
may be regarded as the important classes of fre- perimental arrangements, which would preclude
quently plausible rival hypotheses which good generalization about the effect of the experi-
research design seeks to rule out. All will be mental variable for persons being exposed to it
presented briefly even though not all are em- in nonexperimental settings. 12. Multiple -
ployed in the evaluation of the designs used treatment inference, a problem wherever multiple
illustratively here. treatments are applied to the same respondents,
and a particular problem for one -group designs
Fundamental to this listing is a dis- involving equivalent time -samples or equivalent
tinction between internal validity and external materials samples.
validity. Internal validity is the basic mini-
mum without which any experiment is uninter- Perhaps the simplest quasi -experimental
pretable: did in fact the experimental treat- design is the One -Group Pretest -Posttest Design,
ments make a difference in this specific experi- X 0 (where 0 represents measurement or ob-
mental instance? External validity asks the servation, and X represents the experimental
question of generalizability: to what popula- treatment). This common design patently leaves
tions, settings, treatment variables, and uncontrolled the internal validity threats of
measurement variables can this effect be gen- History, Maturation, Testing, Instrumentation,
eralized? Both types of criteria are obviously and, if selected as extreme on Regression.
important, even though they are frequently at There may be situations in which the analyst
odds, in that features increasing one may jeop- could decide that none of these represented
ardize the other. While internal validity is plausible rival hypotheses in his setting: A
the sine qua non, and while the question of log of other possible change- agents might provide
external validity, like the question of inductive no plausible ones, the measurement in question
inference, is never completely answerable, the might be nonreactive (Campbell, 1957), the time
selection of designs strong in both types of span too short for maturation, too spaced for
validity is'obviously our ideal. fatigue, etc. However, the sources of invalidity
are so numerous that a more powerful quasi -
Relevant to internal validity are eight experimental design would be preferred. Several
different classes of extraneous variables which, of these can be constructed by adding features to
if not controlled in the experimental design, this simple one. The Interrupted Time-Series
might produce effects mistaken for the effect of Experiment utilizes a series of measurements pro-
the experimental treatment. These are: 1. viding multiple pretests and posttests, e.g.:
History: the other specific events occurring l 02 03 04 X0-5 07 If in this series,
between a first and second measurement in addi- - a rise greater than found else-
tion to the experimental variable. 2. Matura- wwiere, then Maturation, Testing, and Regression
tion: processes within the respondents operating are no longer plausible, in that they would pre-
as a function of the passage of time per se (not dict equal or greater rises for 01 02, etc.
specific to the particular events), including Instrumentation may well be-controlled too, al-
growing older, growing hungrier, growing tireder, though in institutional settings a change of
and the like. 3. Testing: the effects of administration policy is often accompanied by a
taking a test upon the scores of a second test- change in record -keeping standards. Observers
ing. 4. Instrumentation: in which changes in and participants may be focused on the occurrence
the calibration of a measuring instrument or of X, and may fail to take into consideration
159

changes in rating standards, etc. History re- experimental analysis are those efforts to
mains the major threat, although in many settings achieve causal inference from correlational data.
it would not offer a plausible rival interpreta- Note that while correlation does not prove causa-
tion. If one had available a parallel time tion, most causal hypotheses imply specific cor-
series from a group not receiving the experi- relations, and thus examination of these probes,
mental treatment, but exposed to the same tests, or edits the causal hypothesis. Further,
extraneous sources of influence, and if this as Simon and Blalock have emphasized (e.g.,
control time series failed to show the excep- Blalock, 1961), certain causal models specify
tional jump from to 0;, then the plausibility uneven patterns of correlation. Thus the
of History as a rival interpretation would be B -3 C model implies that rAC be smaller
greatly reduced. We may call this the Multiple than r or r . However, the of partial
BC
Time -Series Design. correlátions or the use of Wright's (1920) path
analysis are rejected by the present writer as
Another way of improving the One -Group tests of the model because of the requirement
Pretest- Posttest Design is to add a "Nonequiva- that the "cause" be totally represented in the
lent Control Group." (Were the control group to "effect." In the social sciences it will never
be randomly assigned from the same population as be plausible that the "cause" has been measured
the experimental group, we would, of course, have without unique error and that it also totally
a true, not quasi, experimental design.) Depend- lacks unique systematic variance not shared with
ing on the similarities of setting and attributes, the "effect." More appropriate would be Lawley's
if the nonequivalent control group fails to show (1940) test of the hypothesis of single- factored-
a gain manifest in the experimental group, then ness. Only if single- factoredness can be re-
History, Maturation, Testing, and Instrumentation jected would the causal model as represented by
are controlled. In this popular design, the its predicted uneven correlations pattern be the
frequent effort to "correct" for the lack of preferred interpretation.
perfect equivalence by matching on pretest scores
is absolutely wrong (e.g., Thorndike, 1942; A word needs to be said about tests of
Hovland et al., 1949; Campbell & Clayton, 1961), significance for quasi -experimental designs.
as it introduces a regression artifact. Instead, There has come from several competent social
one should live with any initial pretest differ- scientists the argument that since randomization
ences, using analysis of covariance, or graphic has not been used, tests of significance assuming
presentation. Remaining uncontrolled is the randomization are not relevant. The attitude of
Selection -Maturation Interaction, i.e., the pos, the present writer is on the whole in disagree-
sibility that the experimental group differed ment. However, some aspects of the protest are
from the control group not only in initial level, endorsed: Good experimental design is needed for
but also in its autonomous maturation rate. In any comparison inferring change, whether or not
experiments on psychotherapy and on the effects tests of significance are used, even if only
of specific coursework this is a very serious photographs, graphs, or essays are being com-
rival. Note that it can be rendered implausible pared. In this sense, experimental design is
by use of a time series of pretests for both independent of tests of significance. More im-
groups, thus moving again to the Multiple Time - portantly, tests of significance have come to be
Series Design. o.
taken as thoroughgoing rQ In vulgar social
science usage, finding a significant difference"
There is not space here to present ade- is apt to be taken as provin& the author's basis
quately even these four quasi -experimental de- for predicting the difference, forgetting the
signs, but perhaps the strategy of adding many other plausible rival hypotheses explaining
specific observations and analyses to check on a significant difference which quasi -experimental
specific threats to validity has been illus- designs leave uncontrolled. Certainly the valua-
trated. This is carried to an extreme in the tion of tests of significance in some quarters
Recurrent Institutional Cycle Design (Campbell needs demoting. Further, the use of tests of
McCormack, 1957; Campbell Stanley, 1963), significance designed for the evaluation of a
in which longitudinal and cross -sectional meas- single comparison becomes much too lenient when
urements are combined with still other-analyses dozens, hundreds, or thousands of comparisons
to assess the impact of indoctrination proce- have been sifted, and this is still common
dures, etc., through exploiting the fact that usage. And in a similar manner, the author's
essentially similar treatments are being given decision as to which of his studies is publish-
to new entrants year after year or cycle after able, and the editor's decision as to which of
cycle. Other quasi -experimental designs the manuscripts is acceptable, further biases
covered in Campbell & Stanley (1963) include two the sampling basis. In all of these ways,
more single -group designs (the Equivalent Time - reform is needed.
Samples Design and the Equivalent Materials
Design), Counterbalanced or Rotational Designs, However, when a quasi -experimenter has
Separate Sample Pretest -Posttest Designs, Re- compared the results from two intact classrooms
gression-Discontinuity Analysis, the Panel employed in a sampling of convenience, sample
Impact Design (see also Campbell Clayton, size, small-sample instability,a chance differ-
1961), and the Cross-Lagged Panel Correlation, ence, is certainly one of the many plausible
which is related to Lazarsfeld's Sixteen -Fold rival hypotheses which must be considered, even
Table (see especially Campbell, 1963). if only one. If each class had but five students
we would interpret the fact that 207. more in the
Related to the program ofcquasi- experimental class showed increases in favorable-
160

ness with much less interest than if each class test instigated changes, reactions to commonly
had 500 students. In this case there is avail- experienced events, etc. But such a test of
able an elaborate formal theory for the plausible significance will help us rule out this 13th
rival hypothesis of chance fluctuation. This plausible rival hypothesis, that there is no
theory involves assumptions of randomness, which difference here at all that a model of purely
are quite appropriately present when we reject chance assignment could not account for as a
the null model of random association in favor of vagary of sampling. Note that our statement
a hypothesis of systematic difference between of probability level is in this light a state-
the two classes. If we find a "significant ment of the plausibility of this rival hypothe-
difference," the test of significance will not, sis, which always has some plausibility, however
of course, tell us whether the two classes faint.In thb orientation, a practice of stating
differed because one saw the experimental movie, the probability in descriptive detail seems
or for some selection reason associated with preferable to using but a single apriori
class topic, time of day, etc., which might have decision criterion.
interacted with rate of autonomous change, pre-

REFERENCES

Blalock, H.M. 1964 Causal inferences in non- Hovland, C.I.; Lumsdaine, A.A.; and Sheffield,
experimental research. Chapel Hill: The F.C. 1949 Experiments on mass communica-
University of North Carolina Press, 1964. tion. Princeton, N.J.: Princeton Univer-
Campbell, D.T. 1957 Factors relevant to valid- sity Press.
ity of experiments in social settings. Lawley, C.N. 1940 The estimation of factor
Psychological Bulletin 54:297 -312. loadings by the method of maximum likeli-
Campbell, D.T. 1963 From description to ex- hood. Proceedings of the Royal Society of
perimentation: Interpreting trends as Edinburgh 60:64 -82.
quasi -experiments. Pages 212 -242 in Stouffer, S.S. (editor) 1949 The American
Harris, C.W. (editor), Problems in meas- Soldier. Princeton, N.J.: Princeton Uni-
uring change. Madison, Wis.: University versity Press. Vols. I and II.
of Wisconsin Press. Thorndike, R.L. 1942 Regression fallacies in
Campbell, D.T.; and Clayton, K.N. 1961 Avoid- the matched groups experiment. Psycho -
ing regression effects in panel studies of metrika 7 :85 -102.
communication impact. Studies in Public Thorndike, E.L.; and Ruger, G.J. 1923 The
Communication No. 3, 99 -118. effect of first -year Latin upon knowledge
Campbell, D.T.; and McCormack, Thelma H. 1957 of English words of Latin derivation.
Military experience and attitudes toward School and Society 81:260 -270, 417 -418.
authority. American Journal of Sociology Thorndike, E.L.; and Woodworth, R.S. 1901 The
62:482 -490. influence of improvement in one mental
Campbell, D.T.; and Stanley, J.C. 1963 Experi- function upon the efficiency of other
mental and quasi-experimental designs for functions. Psychological Review 8:247 -261,
research on teaching. Pages 171 -246 in 384 -395, 553 -564.
Gage, N.L. (editor), Handbook of research Wright, S. 1920 Correlation and causation.
on teaching. Chicago: Rand McNally. Journal of Agricultural Research 20:557 -585.
Chapin, F.S. 1955 Experimental designs in
sociological research. New York: Harper.
(Rev. ed.)
Chapin, F.S.; and Queen, S.A. 1937 Research
memorandum on social work in the depression.
New York: Social Science Research Council,
Bulletin 39, 1937.
Greenwood, E. 1945 Experimental sociology: A
study in method. New York: King's Crown
Press.

You might also like