Professional Documents
Culture Documents
Field Experiments
Author(s): Glenn W. Harrison and John A. List
Source: Journal of Economic Literature, Vol. 42, No. 4 (Dec., 2004), pp. 1009-1055
Published by: American Economic Association
Stable URL: http://www.jstor.org/stable/3594915
Accessed: 02/12/2009 12:53
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=aea.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Economic Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of Economic Literature.
http://www.jstor.org
Journal of Economic Literature
Vol. XLII (December 2004) pp. 1009-1055
Field Experiments
1009
1010 Journal of Economic Literature, Vol. XLII (December 2004)
literature review is necessarily selective, material, language, animal, etc., and not in
although List (2004d) offers a more complete the laboratory, study, or office." This orients
bibliography. us to think of the natural environment of the
In sections 7 and 8 we review two types of different components of an experiment.6
experiments that may be contrasted with It is important to identify what factors
ideal field experiments. One is called a social make up a field experiment so that we can
experiment, in the sense that it is a deliber- functionally identify what factors drive
ate part of social policy by the government. results in different experiments. To provide
Social experiments involve deliberate, ran- a direct example of the type of problem that
domized changes in the manner in which motivated us, when List (2001) obtains
some government program is implemented. results in a field experiment that differ from
They have become popular in certain areas, the counterpart lab experiments of Ronald
such as employment schemes and the detec- Cummings, Glenn Harrison, and Laura
tion of discrimination. Their disadvantages Osborne (1995) and Cummings and Laura
have been well documented, given their Taylor (1999), what explains the difference?
political popularity, and there are several Is it the use of data from a particular market
important methodological lessons from those whose participants have selected into the
debates for the design of field experiments. market instead of student subjects; the use
The other is called a "naturalexperiment." of subjects with experience in related tasks;
The idea is to recognize that some event that the use of private sports-cards as the under-
naturally occurs in the field happens to have lying commodity instead of an environmen-
some of the characteristics of a field experi- tal public good; the use of streamlined
ment. These can be attractive sources of data instructions, the less-intrusive experimental
on large-scale economic transactions, but methods, mundane experimenter effects, or
usually at some cost due to the lack of con- is it some combination of these and similar
trol, forcing the researcher to make certain
identification assumptions. 6 If we are to examine the role of "controls"in different
experimental settings, it is appropriatethat this word also
Finally, in section 9 we briefly examine be defined carefully.The OED (2nd ed.) defines the verb
related types of experiments of the mind. In "control"in the following manner:"Toexercise restraintor
one case these are the "thought experi- direction upon the free action of; to hold sway over, exer-
ments" of theorists and statisticians, and in cise power or authorityover; to dominate, command." So
the word means something more active and intervention-
the other they are the "neuro-economics ist than is suggested by its colloquial clinical usage.
experiments" provided by technology. The Control can include such mundane things as ensuring ster-
ile equipment in a chemistry lab, to restrain the free flow
objective is simply to identify how they differ of germs and unwanted particles that might contaminate
from other types of experiments we consider, some test. But when controls are applied to human behav-
and where they fit in. ior, we are reminded that someone's behavior is being
restrained to be something other than it would otherwise
2. Defining Field Experiments be if the person were free to act. Thus we are immediate-
ly on alert to be sensitive, when studying responses from a
There are several ways to define words. controlled experiment, to the possibility that behavior is
One is to ascertain the formal definition by unusual in some respect. The reason is that the very con-
trol that defines the experiment may be putting the sub-
looking it up in the dictionary. Another is to ject on an artificial margin. Even if behavior on that
identify what it is that you want the word- margin is not different than it would be without the con-
label to differentiate. trol, there is the possibility that constraints on one margin
may induce effects on behavior on unconstrainedmargins.
The Oxford English Dictionary (Second This point is exactly the same as the one made in the "the-
Edition) defines the word "field" in the fol- ory of the second best" in public policy. If there is some
immutable constraint on one of the margins defining an
lowing manner: "Used attributively to optimum, it does not automaticallyfollow that removing a
denote an investigation, study, etc., carried constraint on another margin will move the system closer
out in the natural environment of a given to the optimum.
1012 Journal of Economic Literature, Vol. XLII (December 2004)
average treatment effect is given by individualswith the samevalue for these fac-
T=y* -y*o, where y*J and y*0are the treat- tors will display homogenous responses to
ed and nontreated average outcomes after the treatment,then the treatmenteffect can
the treatment. We have much more to say be measuredwithoutbias. In effect, one can
about controlled experiments,in particular use statisticalmethodsto identifywhich two
field experiments,below. individualsare "morehomogeneouslab rats"
"Naturalexperiments"consider the treat- for the purposesof measuringthe treatment
ment itself as an experimentand find a natu- effect. More formally,the solutionadvocated
rally occurringcomparisongroup to mimic is to find a vector of covariates,Z, such that
the controlgroup:T is measuredby compar- y,,y0 I T | Z and pr(T=l IZ) e (0,1), where
ing the difference in outcomes before and I denotes independence.6
after for the treated group with the before Another alternativeto the DID model is
and afteroutcomesfor the nontreatedgroup. the use of instrumentalvariables(IV),which
Estimationof the treatmenteffect takes the approaches the structural econometric
form Yit=Xit3+ Tit+lit, where i indexesthe methodin the sense thatit relieson exclusion
unit of observation,t indexesyears,Yitis the restrictions (Joshua D. Angrist, Guido W.
outcome in cross-sectioni at time t, Xitis a Imbens, and Donald B. Rubin 1996; and
vector of controls, Tit is a binary variable, JoshuaD. AngristandAlanB. Krueger2001).
lit=a,+ t+£it, and t is the difference-in-dif- The IV method, which essentiallyassumes
ferences (DID) averagetreatmenteffect. If that some components of the non-experi-
we assume that data exists for two periods, mentaldata are random,is perhapsthe most
then t=(ytt-yl y*to)-(y*ti -y*tO) where, widely utilized approachto measuringtreat-
for example, yt*t is the mean outcome for ment effects (MarkRosenzweigandKenneth
the treatedgroup. Wolpin2000). The cruxof the IV approachis
A major identifying assumption in DID to find a variablethat is excluded from the
estimationis that there are no time-varying, outcome equation, but which is related to
unit-specificshocksto the outcome variable treatmentstatusandhas no directassociation
that are correlated with treatment status, with the outcome. The weakness of the IV
and that selection into treatment is inde- approachis that such variablesdo not often
pendent of temporary individual-specific exist, or that unpalatableassumptionsmust
effect: E(rlit I Xit,Dit)=E(oi I Xit,Dit)+. If be maintainedin orderfor them to be used to
Eit,and T are related, DID is inconsistently identifythe treatmenteffect of interest.
estimated as E(t)=X+ E(£it1-£ D=1) A final alternativeto the DID model is
-E(Eitl-ito D=0). structuralmodeling.Suchmodelsoften entail
One alternativemethod of assessing the a heavy mix of identifyingrestrictions(e.g.,
impact of the treatment is the method of
propensityscore matching(PSM)developed 16If one is interested in
in P. Rosenbaumand Donald Rubin(1983). estimating the average treat-
ment effect, only the weaker condition E(yolT=l,
This method has been used extensively in Z)=E(yoIT=O,Z)=E(yo IZ) is required. This assumptionis
the debate over experimentaland nonexper- called the "conditional independence assumption," and
imentalevaluationof treatmenteffects initi- intuitively means that given Z, the nontreated outcomes
are what the treated outcomes would have been had they
ated by Lalonde (1986): see RajeevDehejia not been treated. Or, likewise, that selection occurs only
and Sadek Wahba(1999, 2002) and Jeffrey on observables. Note that the dimensionality of the prob-
lem, as measured by Z, may limit the use of matching. A
Smith and Petra Todd (2000). The goal of more feasible alternative is to match on a function of Z.
PSM is to makenon-experimentaldata"look Rosenbaum and Rubin (1983, 1984) showed that matching
like" experimental data. The intuition on p(Z) instead of Z is valid. This is usually carried out on
the "propensity"to get treated p(Z), or the propensity
behind PSM is that if the researcher can score, which in turn is often implemented by a simple pro-
select observable factors so that any two bit or logit model with T as the dependent variable.
1016 Journal of Economic Literature, Vol. XLII (December 2004)
separability), impose structure on technology could be applied to real people, but to actu-
and preferences (e.g., constant returns to ally do so entails some serious and often
scale or unitary income elasticities), and sim- unattractive logistical problems.19
plifying assumptions about equilibrium out- A more substantial response to this criti-
comes (e.g., zero-profit conditions defining cism is to consider what it is about students
equilibrium industrial structure). Perhaps the that is viewed, a priori, as being nonrepre-
best-known class of such structural models is sentative of the target population. There are
computable general equilibrium models, at least two issues here. The first is whether
which have been extensively applied to evalu- endogenous sample selection or attrition has
ate trade policies, for example.17 It typically occurred due to incomplete control over
relies on complex estimation strategies, but recruitment and retention, so that the
yields structural parameters that are well- observed sample is unreliable in some statis-
suited for ex ante policy simulation, provided tical sense (e.g., generating inconsistent esti-
one undertakes systematic sensitivity analysis mates of treatment effects). The second is
of those parameters.18In this sense, structur- whether the observed sample can be inform-
al models have been the cornerstone of non- ative on the behavior of the population,
experimental evaluation of tax and welfare assuming away sample selection issues.
policies (R. Blundell and Thomas MaCurdy 4.2 SampleSelectionin the Field
1999; and Blundell and M. Costas Dias 2002).
Conventional lab experiments typically
4. ArtefactualField Experiments use students who are recruited after being
told only general statements about the
4.1 The Nature of the SubjectPool experiment. By and large, recruitment pro-
A common criticism of the relevance of cedures avoid mentioning the nature of the
inferences drawn from laboratory experi- task, or the expected earnings. Most lab
ments is that one needs to undertake an experiments are also one-shot, in the sense
that they do not involve repeated observa-
experiment with "real"people, not students.
This criticism is often deflected by experi- tions of a sample subject to attrition. Of
menters with the following imperative: if you course, neither of these features is essential.
think that the experiment will generate differ- If one wanted to recruit subjects with specif-
ent results with "real"people, then go ahead ic interest in a task, it would be easy to do
and run the experiment with real people. A (e.g., Peter Bohm and Hans Lind 1993). And
variantof this response is to challenge the crit- if one wanted to recruit subjects for several
ics' assertion that students are not representa- sessions, to generate "super-experienced"
tive. As we will see, this variant is more subtle subjects20 or to conduct pre-tests of such
and constructive than the first response. things as risk aversion, trust, or "other-
The first response, to suggest that the crit- regarding preferences,"21 that could be built
ic run the experiment with real people, is into the design as well.
often adequate to get rid of unwanted refer- One concern with lab experiments con-
ees at academic journals. In practice, howev- ducted with convenience samples of students
er, few experimenters ever examine field
behavior in a serious and large-sample way. 19Or one can use "real"nonhuman
species: see John
It is relatively easy to say that the experiment Kagel, Don MacDonald, and Raymond Battalio (1990) and
Kagel, Battalio, and Leonard Green (1995) for dramatic
demonstrationsof the power of economic theory to organ-
17For
example, the evaluation of the Uruguay Round ize data from the animal kingdom.
of multilateral trade liberalization by Harrison, Thomas 20For example, John Kagel and Dan Levin (1986, 1999,
Rutherford,and David Tarr(1997). 2002).
18For 21
example, see Harrison and H.D. Vinod (1992). For example, Cox (2004).
Harrison and List: Field Experiments 1017
for predicting the population response.25 All The reason is simple to understand. It is
that is needed is for the behavioral respons- much easier to predict the behavior of a 26-
es of students to be the same as the behav- year-old when one has a model that is based
ioral responses of nonstudents. This can on the behavior of people whose ages range
either be assumed a priori or, better yet, from 21 to 79 than it is to estimate the
tested by sampling nonstudents as well as behavior of a 69-year-old based on the
students. behavioral model from a sample whose ages
Of course, it is always better to be fore- range from 19 to 27.
What is the relevance of these methods for
casting on the basis of an interpolation
rather than an extrapolation, and that is the the original criticism of experimental proce-
most important problem one has with stu- dures? Think of the experimental subjects as
dent samples. This issue is discussed in some the convenience sample in the HL approach.
detail by Blackburn, Harrison, and Rutstrom The lessons that are learned from this stu-
(1994). They estimated a statistical model of dent sample could be embodied in a statisti-
cal model of their behavior, with implications
subject response using a sample of college
students and also estimated a statistical drawn for a larger target population.
model of subject response using field sub- Although this approach rests on an assump-
tion that is as yet untested, concerning the
jects drawn from a wide range of churches in
the same urban area. Each were conven- representativeness of student behavioral
ience samples. The only difference is that responses conditional on their characteris-
the church sample exhibited a much wider tics, it does provide a simple basis for evalu-
variability in their socio-demographic char- ating the extent to which conclusions about
acteristics. In the church sample, ages students apply to a broader population.
How could this method ever lead to inter-
ranged from 21 to 79; in the student sample,
ages ranged from 19 to 27. When predicting esting results? The answer depends on the
behavior of students based on the church- context. Consider a situation in which the
estimated behavioral model, interpolation behavioral model showed that age was an
was used and the predictions were extreme- important determinant of behavior. Consider
further a situation in which the sample used
ly accurate. In the reverse direction, howev-
er, when predicting church behavior from to estimate the model had an average age that
the student-estimated behavioral model, the was not representative of the population as a
whole. In this case, it is perfectly possible that
predictions were disastrous in the sense of
the responses of the student sample could be
having extremely wide forecast variances.26
quite different than the predicted responses
25 For of the population. Although no such instances
example, assume a population of 50 percent men
and 50 percent women, but where a sample drawn at ran- have appeared in the applications of this
dom happens to have 60 percent men. If responses differ method thus far,they should not be ruled out.
accordingto sex, predicting the population is simply a mat- We conclude, therefore, that many of the
ter of reweighting the survey responses.
26 On the other hand,
reporting large variances may be concerns raised by this criticism, while valid,
the most accurate reflection of the wide range of valua- are able to be addressed by simple exten-
tions held by this sample. We should not always assume
that distributions with smaller variances provide more sions of the methods that experimenters cur-
accurate reflections of the underlying population just rently use. Moreover, these extensions
because they have little dispersion; for this to be true, would increase the general relevance of
many auxiliaryassumptions about randomness of the sam-
pling process must be assumed, not to mention issues experimental methods obtained with student
about the stationarity of the underlying population convenience samples.
process. This stationarityis often assumed away in contin- Further problems arise if one allows unob-
gent valuation research (e.g., the proposal to use double- served individual effects to play a role. In
bounded dichotomous choice formatswithout allowing for
possible correlation between the two questions). some statistical settings it is possible to allow
1020 Journal of Economic Literature, Vol. XLII (December 2004)
for those effects by meansof"fixedeffect"or beauty. Again, the immediate implication is
"randomeffects" analyses. But these stan- to collect a standard battery of measures of
darddevices, now quite commonin the tool- individual characteristics to allow some sta-
kit of experimental economists, do not tistical comparisons of conditional treatment
address a deeper problem. The internal effects to be drawn.27 But even here we can
validityof a randomizeddesign is maximized only easily condition on observable charac-
when one knows that the samples in each teristics, and additional identifying assump-
treatmentare identical.This happy extreme tions will be needed to allow for correlated
leads many to infer that matching subjects differences in unobservables.
on a finite set of characteristicsmust be bet-
4.4 Precursors
ter in terms of internal validity than not
matchingthem on any characteristics. Several experimenters have used artefac-
But partial matching can be worse than tual field experiments; that is, they have
no matching.The most importantexample deliberately sought out subjects in the
of this is due to James Heckman and Peter "wild," or brought subjects from the "wild"
Siegelman (1993) and Heckman (1998), into labs. It is notable that this effort has
who critique paired-audittests of discrimi- occurred from the earliest days of experi-
nation. In these experiments,two applicants mental economics, and that it has only
for a job are matched in terms of certain recently become common.
observables,such as age, sex, and education, Lichtenstein and Slovic (1973) replicated
and differ in only one protected characteris- their earlier experiments on "preference
tic, such as race. However, unless some reversals" in "... a nonlaboratory real-play
extremely strong assumptions about how setting unique to the experimental litera-
characteristicsmap into wages are made, ture on decision processes-a casino in
there will be a predeterminedbias in out- downtown Las Vegas" (p. 17). The experi-
comes. The directionof the bias "depends," menter was a professional dealer, and the
and one cannot say much more. A metaphor subjects were drawn from the floor of the
from Heckman (1998, p. 110) illustrates: casino. Although the experimental equip-
Boys and girls of the same age are in a high- ment may have been relatively forbidding
jump competition, and jump the same (it included a PDP-7 computer, a DEC-339
height on average. But boys have a higher CRT, and a keyboard), the goal was to iden-
variancein theirjumpingtechnique, for any tify gamblers in their natural habitat. The
number of reasons.If the bar is set very low subject pool of 44 did include seven known
relativeto the mean, then the girls will look dealers who worked in Las Vegas, and the
like better jumpers; if the bar is set very "... dealer's impression was that the game
high then the boys will look like better attracted a higher proportion of profession-
jumpers. The implications for numerous al and educated persona than the usual
(lab and field) experimentalstudies of the casino clientele" (p. 18).
effect of gender, that do not control for Kagel, Battalio, and James Walker (1979)
other characteristics,should be apparent. provide a remarkable, early examination of
This metaphor also serves to remind us many of the issues we raise. They were con-
that what laboratoryexperimentersthink of cerned with "volunteer artifacts" in lab
as a "standardpopulation"need not be a experiments, ranging from the characteristics
homogeneous population. Although stu- that volunteers have to the issue of sample
dents from different campuses in a given
27
country may have roughly the same age, George Lowenstein (1999) offers a similar criticism
of the popular practice in experimental economics of not
they can differ dramaticallyin influential conditioning on any observable characteristicsor random-
characteristics such as intelligence and izing to treatment from the same population.
Harrison and List: Field Experiments 1021
selection bias.28 They conducted a field reason for trade in this environment.30 The
experiment in the homes of the volunteer major empirical result is the large number of
subjects, examining electricity demand in observed price bubbles: fourteen of the 22
response to changes in prices, weekly feed- experiments can be said to have had some
back on usage, and energy conservation price bubble.
information. They also examined a compar- In an effort to address the criticism that
ison sample drawn from the same popula- bubbles were just a manifestation of using
tion, to check for any biases in the volunteer student subjects, Smith, Suchanek, and
sample. Williams (1988) recruited nonstudent sub-
Binswanger (1980, 1981) conducted jects for one experiment. As they put it, one
experiments eliciting measures of risk aver- experiment "... is noteworthy because of its
sion from farmers in rural India. Apart from use of professional and business people from
the policy interest of studying agents in the Tucson community, as subjects. This
developing countries, one stated goal of market belies any notion that our results are
using artefactual field experiments was to an artifact of student subjects, and that busi-
assess risk attitudes for choices in which the nessmen who 'run the real world' would
income from the experimental task was a quickly learn to have rational expectations.
substantial fraction of the wealth or annual This is the only experiment we conducted
income of the subject. The method he devel- that closed on a mean price higher than in all
oped has been used recently in conventional previous trading periods" (p. 1130-31). The
laboratory settings with student subjects by reference at the end is to the observation
Charles Holt and Susan Laury (2002). that the price bubble did not burst as the
Burns (1985) conducted induced-value finite horizon of the experiment was
market experiments with floor traders from approaching. Another notable feature of this
wool markets, to compare with the behav- price bubble is that it was accompanied by
ior of student subjects in such settings. The heavy volume, unlike the price bubbles
goal was to see if the heuristics and deci- observed with experienced subjects.31
sion rules these traders evolved in their Although these subjects were not students,
natural field setting affected their behavior. they were inexperienced in the use of the
She did find that their natural field rivalry double auction experiments. Moreover,
had a powerful motivating effect on their there is no presumption that their field expe-
behavior. rience was relevant for this type of asset
Vernon Smith, G. L. Suchanek, and market.
Arlington Williams (1988) conducted a large
series of experiments with student subjects 30There are
only two reasons players may want to trade
in an "asset bubble" experiment. In the 22 in this market. First, if players differ in their risk attitudes
then we might see the asset trading below expected divi-
experiments they report, nine to twelve dend value (since more-risk-averseplayers will pay less-
traders with experience in the double-auc- risk-averse players a premium over expected dividend
tion institution traded a number of fifteen value to take their assets). Second, if subjects have diverse
or thirty period assets with the same com- price expectations,we can expect trade to occur because of
expected capital gains. This second reason for trading
mon value distribution of dividends. If all (diverse price expectations) can actually lead to contract
subjects are risk neutral and have common prices above expected dividend value, provided some sub-
ject believes that there are other subjects who believe the
price expectations, then there would be no price will go even higher.
31 Harrison (1992b) reviews the detailed experimental
28
They also have a discussion of the role that these pos- evidence on bubbles, and shows that very few significant
sible biases play in social psychology experiments, and how bubbles occur with subjects who are experienced in asset
they have been addressed in the literature. market experiments in which there is a short-lived asset,
And either inexperienced, once experienced, or such as those under study. A bubble is significant only if
twice experienced in asset market trading. there is some nontrivialvolume associated with it.
1022 Journal of Economic Literature, Vol. XLII (December 2004)
Artefactual field experiments have also tasks, experience is generated in the field
made use of children and high school sub- and not the lab. These results provide sup-
jects. For example, William Harbaugh and port for the notion that context-specific
Kate Krause (2000), Harbaugh, Krause, and experiencedoes appearto carryover to com-
Timothy Berry (2001), and Harbaugh, parable settings, at least with respect to
Krause, and Lise Vesterlund (2002) explore these types of auctions.
other-regarding preferences, individual This experimentaldesign emphasizesthe
rationality, and risk attitudes among children identificationof a naturallyoccurringsetting
in school environments. in which one can control for experience in
Joseph Henrich (2000) and Henrich and the way that it is accumulatedin the field.
Richard McElreath (2002), and Henrich et Experienced traders gain experience over
al. (2001, 2004) have even taken artefactual time by observingand survivinga relatively
field experiments to the true "wilds" of a wide range of trading circumstances. In
number of peasant societies, employing the some settings this might be proxied by the
procedures of cultural anthropology to mannerin which experiencedor super-expe-
recruit and instruct subjects and conduct rienced subjectsare defined in the lab, but it
artefactual field experiments. Their focus remainson open questionwhether standard
was on the ultimatum bargaining game and lab settings can reliably capture the full
measures of risk aversion. extent of the field counterpartof experience.
This is not a criticism of lab experiments,
just their domainof applicability.
5. Framed Field Experiments The methodologicallessonwe drawis that
5.1 The Nature of the Information Subjects
one shouldbe carefulnot to generalizefrom
the evidence of a winner'scurse by student
Already Have
subjects that have no experience at all with
Auction theory provides a rich set of pre- the field context.These resultsdo not imply
dictions concerning bidders' behavior. One that every field contexthas experiencedsub-
particularly salient finding in a plethora of jects, such as professionalsports-carddeal-
laboratory experiments that is not predicted ers, that avoid the winner'scurse. Instead,
in first-price common-value auction theory they point to a more fundamentalneed to
is that bidders commonly fall prey to the consider the field context of experiments
winner's curse. Only "super-experienced" before drawinggeneralconclusions.It is not
subjects, who are in fact recruited on the the case that abstract, context-freeexperi-
basis of not having lost money in previous ments provide more generalfindings if the
experiments, avoid it regularly. This would context itself is relevant to the performance
seem to suggest that experience is a suffi- of subjects. In fact, one would generally
cient condition for an individual bidder to expect such context-freeexperimentsto be
avoid the winner's curse. Harrison and List unusually tough tests of economic theory,
(2003) show that this implication is support- since there is no controlfor the contextthat
ed when one considers a natural setting in subjects might themselves impose on the
which it is relatively easy to identify traders abstractexperimentaltask.
that are more or less experienced at the task. The main result is that if one wants to
In their experiments the experience of sub- drawconclusionsaboutthe validityof theory
jects is either tied to the commodity, the val- in the field, then one must pay attentionto
uation task, and the use of auctions (in the the myriadof waysin whichfield contextcan
field experiments with sports cards), or sim- affect behavior.We believe that convention-
ply to the use of auctions (in the laboratory al lab experiments,in which roles are exoge-
experiments with induced values). In all nously assigned and defined in an abstract
Harrison and List: Field Experiments 1023
the initial state. They shockinglyviolate the induction,if attained,it is quite possiblethat
constraintsand move all the disksto the goal it posed an insurmountablecognitiveburden
state en masse, and then physically work for some of the experimentalsubjects.
backwards along the lines of the above It mightbe temptingto thinkof this asjust
thoughtexperimentin backwardsinduction. two separate tasks, instead of a real com-
The criticalpoint here is thatthey temporar- modity and its abstract analogue. But we
ily violate the constraintsof the problem in believe that this example does identify an
orderto solve it "properly." importantcharacteristicof commodities in
Contrastthis behaviorwith the laboratory ideal field experiments:the fact that they
subjectsin McDaniel and Rutstr6m(2001). allowsubjectsto adoptthe representationof
They were given a computerizedversion of the commodityand task that best suits their
the game and told to tryto solveit. However, objective.In otherwords,the representation
the computerizedversiondid not allowthem of the commodityby the subject is an inte-
to violate the constraints.Hence the labora- gral part of how the subject solves the task.
tory subjectswere unable to use the class- One simply cannot untangle them, at least
room Montessori method, by which the not easilyand naturally.
student learnsthe idea of backwardsinduc- This example also illustrates that off-
tion by exploringit with physicalreferents. equilibriumstates, in which one is not opti-
Thisis not a designflawof the McDanieland mizing in terms of the originalconstrained
Rutstr6m(2001) lab experiments,but simply optimization task, may indeed be critical
one factorto keep in mind when evaluating to the attainmentof the equilibriumstate.33
the behaviorof their subjects.Without the
physicalanalogueof the finalgoal statebeing 33This is quite distinct from the valid point made by
allowed in the experiment,the subject was Smith (1982, p. 934, fn. 17), that it is appropriateto design
the experimental institution so as to make the task as sim-
forced to visualize that state conceptually, ple and transparentas possible, providing one holds con-
and to likewise imagine conceptually the stant these design features as one compares experimental
treatments. Such may make the results of less
penultimate states. Although that may interest for those designs wanting to make field inferences, but
encourage more fundamental conceptual that is a trade-off that every theorist and experimenter
understanding of the idea of backwards faces to varyingdegrees.
Harrison and List: Field Experiments 1025
Thus we should be mindfulof possible field reasons why homegrown values might be
devices that allow subjects to explore off- affiliatedin such experiments.
equilibriumstates, even if those states are The first is that the good being auctioned
ruled out in our null hypotheses. mighthavesome uncertainattributes,andfel-
Field GoodsHave Field Substitutes.There low biddersmighthave more or less informa-
are two respects in which "fieldsubstitutes" tionaboutthoseattributes.Dependingon how
play a role whenever one is conducting an one perceivesthe knowledgeof otherbidders,
experiment with naturally occurring, or observationof their bidding behavior35can
field, goods. We can refer to the former as affecta givenbidder'sestimateof the truesub-
the naturalcontextof substitutes,and to the jectivevalueto the extentthatthey changethe
latter as an artificial context of substitutes. bidder'sestimate of the lottery of attributes
The former needs to be capturedif reliable being auctioned.36Note that what is being
valuationsare to be elicited;the latterneeds 35The term "bidding behavior" is used to allow for
to be minimizedor controlled. information about bids as well as non-bids. In the repeat-
The first way in which substitutesplay a ed Vickrey auction it is the former that is provided (for
role in an experimentis the traditionalsense winners in previous periods). In the one-shot English auc-
tion it is the latter (for those who have not yet caved in at
of demand theory: to some individuals, a the prevailing price). Although the inferential steps in
bottle of scotch may substitute for a bible using these two types of information differ, they are each
when seeking peace of mind. The degree of informative in the same sense. Hence any remarksabout
the dangers of using repeated Vickrey auctions apply
substitutabilityhere is the stuff of individual equally to the use of English auctions.
demand elasticities, and can reasonablybe 36To see this
point, assume that a one-shot Vickreyauc-
tion was being used in one experiment and a one-shot
expectedto varyfromsubjectto subject.The English auction in another experiment. Large samples of
upshotof this considerationis, yet again,that subjects are randomlyassigned to each institution,and the
one should always collect information on commoditydiffers. Let the commoditybe somethingwhose
observable individual characteristics and quality is uncertain; an example used by Cummings,
Harrisonand Rutstrim (1995) and Rutstrim (1998) mightbe
controlfor them. a box of gourmetchocolate truffles.Amongstundergraduate
The second way in which substitutesplay studentsin South Carolina,these boxespresent somethingof
a role in an experiment is the more subtle a taste challenge. The box is not large in relation to those
found in more common chocolateproducts,and manyof the
issue of affiliationwhich arisesin lab or field students have not developed a taste for gourmet chocolates.
settingsthat involvepreferencesover a field A subject endowed with a diverse pallet is faced with an
uncertainlottery.If these arejust ordinarychocolatesdressed
good. To see this point, consider the use of up in a small box, then the true value to the subjectis small
repeated Vickreyauctionsin which subjects (say,$2). If they are indeed gourmetchocolatesthen the true
learn about prevailingprices. This resultsin value to the subject is much higher (say,$10). Assumingan
a loss of control, since we are dealing with equal chanceof either state of chocolate,the risk-neutralsub-
the elicitation of homegrownvalues rather ject wouldbid their true expectedvalue (in this example,$6).
In the Vickreyauction this subjectwill have an incentive to
than experimenter-inducedprivate values. write down her reservationprice for this lottery as described
To the extent that homegrown values are above. In the English auction,however,this subjectis able to
see a number of other subjectsindicate that they are willing
affiliated across subjects, we can expect an to pay reasonablyhigh sums for the commodity.Some have
effect on elicited valuesfrom using repeated not dropped out of the auction as the price has gone above
$2, and it is closing on $6. What should the subject do? The
Vickrey auctions rather than a one-shot answer depends criticallyon how knowledgeablehe thinks
Vickrey auction.34There are, in turn, two the other bidders are as to the qualityof the chocolates. If
those who have dropped out are the more knowledgeable
34The theoretical and experimental literature makes ones, then the correct inference is that the lottery is more
this point clearly by comparing real-time English auctions heavilyweighted towardsthese being common chocolates.If
with sealed-bid Vickrey auctions: see Paul Milgrom and those remainingin the auction are the more knowledgeable
Robert Weber (1982) and Kagel, Harstad, and Levin ones, however,then the oppositeinference is appropriate.In
(1987). The same logic that applies for a one-shot English the former case the real-time observationshould lead the
auction applies for a repeated Vickrey auction, even if the subject to bid lower than in the Vickreyauction, and in the
specific bidding opponents were randomlydrawnfrom the latter case the real-time observationshould lead the subject
population in each round. to bid higher than in the Vickreyauction.
1026 Journal of Economic Literature, Vol. XLII (December 2004)
affected here by this knowledge is the subject's 5.3 The Nature of the Task
best estimate of the subjective value of the
Who Cares If Hamburger Flippers Violate
good. The auction is still eliciting a truthful EUT? Who cares if a hamburger flipper vio-
revelation of this subjective value; it is just that
lates the independence axiom of expected
the subjective value itself can change with
information on the bidding behavior of others. utility theory in an abstract task? His job
The second reason that bids might be description, job evaluation, and job satisfac-
tion do not hinge on it. He may have left
affiliated is that the good might have some
some money on the table in the abstract task,
extra-experimental market price. Assuming but is there any sense in which his failure
transaction costs of entering the "outside"
market to be zero for a moment, information suggests that he might be poor at flipping
hamburgers?
gleaned from the bidding behavior of others Another way to phrase this point is to
can help the bidder infer what that market
actively recruit subjects who have experi-
price might be. To the extent that it is less ence in the field with the task being stud-
than the subjective value of the good, this
ied.39 Trading houses do not allow neophyte
information might result in the bidder delib-
pit-traders to deviate from proscribed limits,
erately bidding low in the experiment.37 The in terms of the exposure they are allowed. A
reason is that the expected utility of bidding
survival metric is commonly applied in the
below the true value is clearly positive: if
field, such that the subjects who engage in
lower bidding results in somebody else win-
certain tasks of interest have specific types of
ning the object at a price below the true value,
then the bidder can (costlessly) enter the out- training.
The relevance of field subjects and field
side market anyway. If lower bidding results
environments for tests of the winner's curse
in the bidder winning the object, and market
is evident from Douglas Dyer and Kagel
price and bids are not linked, then consumer (1996, p. 1464), who review how executives
surplus is greater than if the object had been in the commercial construction industry
bought in the outside market. Note that this avoid the winner's curse in the field:
argument suggests that subjects might have
an incentive to strategically misrepresent Two broad conclusions are reached. One is that
their true subjective value.38 the executives have learned a set of situation-
The upshot of these concerns is that specific rules of thumb which help them to avoid
unless one assumes that homegrown values the winner's curse in the field, but which could
not be applied in the laboratory markets. The
for the good are certain and not affiliated
second is that the bidding environment created
across bidders, or can provide evidence that in the laboratory and the theory underlying it
they are not affiliated in specific settings, are not fully representative of the field environ-
one should avoid the use of institutions that ment. Rather, the latter has developed escape
can have uncontrolled influences on esti- mechanisms for avoiding the winner's curse that
are mutually beneficial to both buyers and sell-
mates of true subjective value and/or the
ers and which have not been incorporated into
incentive to truthfully reveal that value. the standard one-shot auction theory literature.
and List (2003), mentioned earlier. They played by dealers, they frequently fall prey
study the behavior of insiders in their field to the winner'scurse. We conclude that the
context, while controlling the "rules of the theory predicts field behavior well when
game" to make their bidding behavior fall one is able to identify naturallyoccurring
into the domain of existing auction theory. In field counterparts to the key theoretical
this instance, the term "field context" means conditions.
the commodity with which the insiders are At a more general level, consider the
familiar, as well as the type of bidders they argument that subjects who behave irra-
normally encounter. tionally could be subjected to a "money-
This design allows one to tease apart the pump"by some arbitragerfrom hell. When
two hypotheses implicit in the conclusions of we explain transitivity of preferences to
Dyer and Kagel (1996). If these insiders fall undergraduates, the common pedagogy
prey to the winner's curse in the field exper- includes stories of intransitive subjects
iment, then it must be40 that they avoid it by mindlesslycycling foreverin a series of low-
using market mechanisms other than those cost trades.If these cycles continue,the sub-
under study. The evidence is consistent with ject is pumped of money until bankrupt.In
the notion that dealers in the field do not fall fact, the absence of such phenomena is
prey to the winner's curse in the field exper- often taken as evidence that contracts or
iment, providing tentative support for the marketsmust be efficient.
hypothesis that naturally occurring markets There are several reasons why this may
are efficient because certain traders use not be true. First, it is only when certain
heuristics to avoid the inferential error that consistencyconditionsare imposed that suc-
underlies the winner's curse. cessful money-pumpsprovidea generalindi-
This support is only tentative, however, cator of irrationality,defeatingtheir use as a
because it could be that these dealers have sole indicator (Robin Cubitt and Robert
developed heuristics that protect them Sugden2001).
from the winner's curse only in their spe- Second, and germane to our concern
cialized corner of the economy. That would with the field, subjects might have devel-
still be valuable to know, but it would mean oped simple heuristics to avoid such
that the type of heuristics they learn in their money-pumps:for example, never retrade
corner are not general and do not transfer the same objects with the same person.4
to other settings. Hence, the complete As John Conlisk (1996, p. 684) notes,
design also included laboratory experi- "Rules of thumb are typically exploitable
ments in the field, using induced valuations by 'tricksters,'who can in principle 'money
as in the laboratory experiments of Kagel pump' a person using such rules. ...
and Levin (1999), to see if the heuristic of Although tricksters abound-at the door,
insiders transfers. We find that it does when on the phone, and elsewhere-people can
they are acting in familiar roles, adding fur- easily protect themselves, with their
ther support to the claim that these insiders pumpable rules intact, by such simple
have indeed developed a heuristic that devices as slamming the door and hanging
"travels"from problem domain to problem up the phone. The issue is again a matter
domain. Yet when dealers are exogenously of circumstance and degree." The last
provided with less information than their point is important for our argument-only
bidding counterparts, a role that is rarely when the circumstance is natural might
40 This inference follows if one assumes that a dealer's 41 Slightly more complex heuristics work against arbi-
survivalin the industryprovides sufficient evidence that he tragers from meta-hell who understand that this simple
does not make persistent losses. heuristic might be employed.
1028 Journal of Economic Literature, Vol. XLII (December 2004)
one reasonably expect the subject to be the sense of knowingwhat actions are feasi-
able to call upon survival heuristics that ble and what the consequences of different
protect against such irrationality. To be actions might be, then control has been lost
sure, some heuristics might "travel,"and at a basic level. In cases where the subject
that was precisely the research question understandsall the relevant aspects of the
examined by Harrisonand List (2003) with abstract game, problems may arise due to
respect to the dreaded winner's curse. But the triggeringof different methods for solv-
they might not; hence we might have sight- ing the decision problem. The use of field
ings of odd behavior in the lab that would referents could trigger the use of specific
simply not arise in the wild. heuristics from the field to solve the specif-
Third, subjects might behave in a non- ic problem in the lab, which otherwise may
separable manner with respect to sequen- have been solved less efficiently from first
tial decisions over time, and hence avoid principles (e.g., Gerd Gigerenzer et al.
the pitfalls of sequential money pumps 2000). For either of these reasons-a lack
(Mark Machina 1989; and Edward of understandingof the task or a failure to
McClennan 1990). Again, the use of such apply a relevant field heuristic-behavior
sophisticated characterizations of choices may differ between the lab and the field.
over time might be conditionalon the indi- The implication for experimentaldesign is
vidual having familiaritywith the task and to just "do it both ways,"as arguedby Chris
the consequences of simpler characteriza- Starmer(1999) and Harrisonand Rutstrom
tions, such as those employing intertempo- (2001). Experimentaleconomists should be
ral additivity.It is an open question if the willing to consider the effect in their exper-
richer characterization that may have iments of scripts that are less abstract,but
evolved for familiarfield settings travels to in controlled comparisonswith scripts that
other settings in which the individual has are abstract in the traditional sense.
less experience. Nevertheless, it must also be recognized
Our point is that one should not assume that inappropriatechoice of field referents
that heuristicsor sophisticatedcharacteriza- may trigger uncontrolled psychological
tions that have evolvedfor familiarfield set- motivations. Ultimately, the choice
tings do travel to the unfamiliar lab. If they between an abstract script and one with
do exist in the field, and do not travel,then field referents must be guided by the
evidence from the lab might be misleading. research question.
"Context"Is Not a Dirty Word. One tra- This simple point can be made more
dition in experimentaleconomics is to use forcefully by arguing that the passion for
scripts that abstract from any field counter- abstractscriptsmayin fact resultin less con-
part of the task. The reasoning seems to be trol than context-riddenscripts.It is not the
that this might contaminate behavior, and case that abstract,context-freeexperiments
that any observed behavior could not then provide more general findingsif the context
be used to test general theories. There is itself is relevant to the performanceof sub-
logic to this argument, but context should jects. In fact, one would generally expect
not be jettisoned without careful consider- such context-freeexperimentsto be unusu-
ation of the unintended consequences. ally tough tests of economic theory, since
Field referents can often help subjects there is no controlfor the contextthat sub-
overcome confusion about the task. jects might themselves impose on the
Confusion may be present even in settings abstract experimentaltask. This is just one
that experimenters think are logically or partof a generalplea for experimentalecon-
strategicallytransparent. If the subject does omists to take the psychologicalprocess of
not understand what the task is about, in "taskrepresentation"seriously.
Harrison and List: Field Experiments 1029
This general point has already emerged in the specific informationin the foreground
several areas of research in experimental eco- of the task (e.g., Ulric Neisser and Ira
nomics. Noticing large differences between Hyman 2000).42
contributions to another person and a charity At a more homely level, the "simple"
in between-subjects experiments that were choice of parameters can add significant
otherwise identical in structure and design, field context to lab experiments.The idea,
Catherine Eckel and Philip Grossman (1996, pioneered by Grether, Isaac, and Plott
p. 188ff.) drew the following conclusion: (1981, 1989), Gretherand Plott (1984), and
Hong and Plott (1982), is to estimateparam-
It is received wisdom in experimentaleconom- eters that are relevant to field applications
ics that abstractionis important.Experimental and take these into the lab.
procedures should be as context-freeas possi-
ble, and the interactionamong subjectsshould 5.4 The Natureof the Stakes
be carefullylimited by the rules of the experi-
ment to ensure that they are playingthe game One often hears the criticism that lab
we intend them to play. For tests of economic experimentsinvolve trivial stakes, and that
theory,these proceduralrestrictionsare critical. they do not provide information about
As experimenters,we aspireto instructionsthat
most closelymimicthe environmentsimplicitin
agents' behavior in the field if they faced
serious stakes, or that subjects in the lab
the theory, which is inevitably a mathematic
abstractionof an economic situation. We are experiments are only playing with "house
careful not to contaminateour tests by unnec- money."43The immediate response to this
essary context. But it is also possible to use
experimental methodology to explore the 42A healthy counter-lashing was offered by Mahzarin
importance and consequence of context.
Economists are becoming increasinglyaware Banaji and Robert Crowder (1989), who concede that
that social and psychologicalfactorscan only be needlessly artefactual designs are not informative. But
introduced by abandoning, at least to some they conclude that "we students of memory are just as
interested as anybody else in why we forget where we left
extent, abstraction.This may be particularly the car in the morning or in who was sitting across the
true for the investigation of other-regarding table at yesterday's meeting. Precisely for this reason we
behaviorin the economic arena. are driven to laboratoryexperimentation and away from
naturalistic observation. If the former method has been
disappointing to some after about 100 years, so should
Our point is simply that this should be a the latter approach be disappointing after about 2,000.
more general concern. Above all, the superficial glitter of everyday methods
should not be allowed to replace the quest for generalizable
Indeed, research in memory reminds us principles."(p. 1193).
that subjects will impose a natural context 43This problem is often confused with another issue:
on a task even if it literally involves "non- the validity and relevance of hypothetical responses in the
lab. Some argue that hypothetical responses are the only
sense." Long traditions in psychology, no
way that one can mimic the stakes found in the field.
doubt painful to the subjects, involved Conlisk (1989) runs an experiment to test the Allais
Paradoxwith small, real stakes and finds that virtually no
detecting how many "nonsense syllables" a
subjects violated the predictions of expected utility theory.
subject could recall. The logic behind the Subjects drawn from the same population did violate the
use of nonsense was that the researchers "originalrecipe" version of the Allais Paradoxwith large,
were not interested in the role of specific hypothetical stakes. Conlisk (1989; p. 401ff.) argues that
inferences from this evidence confound hypothetical
semantic or syntactic context as an aid to rewards with the reward scale, which is true. Of course,
memory, and in fact saw those as nuisance one could run an experiment with small, hypothetical
variables to be controlled by the use of ran- stakes and see which factor is driving this result. Chinn-
dom syllables. Such experiments generated Ping Fan (2002) did this, using Conlisk'sdesign, and found
that subjects given low, hypothetical stakes tended to
a backlash of sorts in memory research, avoid the Allais Paradox,just as his subjects with low, real
with many studies focusing instead on stakes avoided it. Many of the experiments that find viola-
tions of the Allais Paradox in small, real stake settings
memory within a natural context, in which embed these choices in a large number of tasks, which
cues and frames could be integrated with could affect outcomes.
1030 Journal of Economic Literature, Vol. XLII (December 2004)
point is perhapsobvious:increase the stakes exchange rates to the U.S. dollar prevailing
in the lab and see if it makes a difference at the time, these stakes were $1.90, $9.70,
(e.g., Elizabeth Hoffman, Kevin McCabe, and $48.40, respectively. In terms of average
and Vernon Smith (1996), or have subjects local monthly wages, they were equivalent to
earn their stakes in the lab (e.g., Rutstrom approximately 2.5 hours, 12.5 hours, and
and Williams2000; and List 2004a), or seek 62.5 hours of work, respectively.
out lab subjects in developing countries for They conclude that there was no effect
whom a given budget is a more substantial on initial offer behavior in the first round,
fraction of their income (e.g., Steven but that the higher stakes did have an
Kachelmeierand Mohamed Shehata 1992; effect on offers as the subjects gained
Lisa Cameron1999;and Robert Slonimand experience with subsequent rounds. They
AlvinRoth 1998). also conclude that acceptances were
Colin Camererand Robin Hogarth(1999) greater in all rounds with higher payoffs,
review the issues here, identifying many but that they did not change over time.
instancesin which increasedstakesare asso- Their experiment is particularly significant
ciated with improved performance or less because they varied the stakes by a factor
variationin performance.But they also alert of 25 and used procedures that have been
us to importantinstancesin which increased widely employed in comparable experi-
stakes do not improveperformance,so that ments.46 On the other hand, one might
one does not casuallyassume that there will question if there was any need to go to the
be such an improvement. field for this treatment. Fifty subjects
Taking the Stakes to Subjects Who Are dividing roughly $50 per game is only
RelativelyPoor One of the reasonsfor run- $1,250, and this is quite modest in terms of
ning field experimentsin poor countries is most experimental budgets. But fifty sub-
that it is easier to find subjectswho are rela- jects dividing the monetary equivalent of
tively poor. Such subjects are presumably 62.5 hours is another matter. If we assume
more motivatedby financialstakesof a given $10 per hour in the United States for
level than subjectsin richercountries. lower-skilled blue-collar workers or stu-
Slonim and Roth (1998) conducted bar- dents, that is $15,625, which is substantial
gainingexperimentsin the SlovakRepublic but feasible.47
to test for the effect of "high stakes" on Similarly, consider the "high payoff'
behavior.44The bargaininggame they stud- experiments from China reported by
ied entailsone person makingan offer to the Kachelmeir and Shehata (1992) (KS).
other person, who then decides whether to These involved subjects facing lotteries
accept it. Bargainingwas over a pie worth60 with prizes equal to 0.5 yuan, 1 yuan, 5
Slovak Crowns (Sk) in one session, a pie yuan, or 10 yuan, and being asked to state
worth 300 Sk in another session, and a pie certainty-equivalent selling prices using
worth 1500 Sk in a third session.45 At the "BDM" mechanism due to Gordon
Becker, Morris DeGroot, and Jacob
44Their subjects were students from universities, so
Marschak (1964). Although 10 yuan only
one could question how "nonstandard"this population is.
But the design goal was to conduct the experiment in a converted to about $2.50 at the time of the
country in which the wage rates were low relative to the experiments, this represented a consider-
United States (p. 569), rather than simply conduct the able amount of purchasing power in that
same experiment with students from different countries as
in Roth et al. (1991).
45 Actually, the subjects bargained over points which
46 Harrison
(2005a) reconsiders their conclusions.
were simply converted to currency at different exchange 47 For July 2002 the Bureau of Labor Statistics estimat-
rates. This procedure seems transparent enough, and ed averageprivate sector hourlywages in the United States
served to avoid possible focal points defined over differing at $16.40, with white-collar workers earning roughly $4
cardinalranges of currency. more and blue-collar workers roughly $2 less than that.
Harrison and List: Field Experiments 1031
generate free-riding hypotheses for these SEK500 in group 2 would be excluded from
procedures.50 The major result from enjoying the good.
Bohm's study was that bids were virtually In group 1 a subject has an incentive to
identical for all institutions, averaging understate only if he conjectures that the sum
between SEK7.29 and SEK10.33. of the contributions of others in his group is
Bohm (1984a) uses two procedures that greater than or equal to total cost minus his
elicit a real economic commitment, albeit true valuation. Total cost was known to be
under different (asserted) incentives for SEK200,000, but the contributions of (many)
free-riding. He implemented this experi- others must be conjectured. It is not possible
ment in the field with local government to say what the extent of free-riding is in this
bureaucrats bidding on the provision of a case without further information as to expec-
new statistical service from the Central tations that were not observed. In group 2
Bureau of Statistics.51 The two procedures only those subjects who actually stated a
are used to extract a lower and an upper WTP greater than or equal to SEK500 might
bound, respectively, to the true average have had an incentive to free-ride. Forty-nine
WTP for an actual good. Each agent in subjects reported exactly SEK500 in group 2,
group 1 was to state his individual WTP, and whereas 93 reported a WTP of SEK500 or
his actual cost would be a percentage of that higher. Thus the extent of free-riding in
stated WTP such that costs for producing group 2 could be anywhere from 0 percent (if
the good would be covered exactly. This per- those reporting SEK500 indeed had a true
centage could not exceed 100 percent. WTP of exactly that amount) to 53 percent
Subjects in group 2 were asked to state their (49 free-riders out of 93 possible free-riders).
WTP. If the interval estimated for total stat- The main result reported by Bohm (1984a)
ed WTP equaled or exceeded the (known) is that the average WTP interval between the
total cost, the good was to be provided and two groups was quite small. Group 1 had an
subjects in group 2 would pay only SEK500. average WTP of SEK827 and group 2 an
Subjects bidding zero in group 1 or below average WTP of SEK889, for an interval that
is only 7.5 percent of the smaller average
WTP of group 1. Thus the conclusion in this
50 Procedure I is deemed the most case must be that if free-riding incentives
likely to generate
strategic under-bidding (p. 113), and procedure V the were present in this experiment, they did not
most likely to generate strategic over-bidding. The other
make much of a difference to the outcome.
procedures, with the exception of VI, are thought to lie
somewhere in between these two extremes. Explicit admo- One can question, however, the extent to
nitions against strategic bidding were given to subjects in which these results generalize. The subjects
procedures I, II, IV, and V (see p. 119, 127-29). Although were representatives of local governments,
no theory is provided for VI:2, it can be recognized as a
multiple-unit auction in which subjects have independent and it was announced that all reported WTP
and private values. It is well-known that optimal bids for values would be published. This is not a fea-
risk-neutralagents can be well below the true valuation of
the agent in a Nash Equilibrium,and will never exceed the ture of most surveys used to study public pro-
true valuation (e.g., bidders truthfully reveal demand for grams, which often go to great lengths to
the first unit, but understate demand for subsequent units ensure subject confidentiality. On the other
to influence the price). Unfortunately there is insufficient
informationto be able to say how far below true valuations hand, the methodological point is clear: some
these optimal bids will be, since we do not know the con- subjects may simply care more about under-
jectured range of valuationsfor subjects. List and Lucking- taking certain tasks, and in many field set-
Reiley (2000) use a framed field experiment to test for
demand reduction in the field and find significantdemand tings this is not difficult to identify. For
reduction. example, Juan Cardenas (2003) collects
51 In addition, he conducted some comparable experi-
ments in a more traditionallaboratorysetting, albeit for a experimental data on common pool extrac-
tion from participants that have direct, field
non-hypothetical good (the viewing of a pilot of a TV
show). experience extracting from a common pool
Harrison and List: Field Experiments 1033
Parker Ballinger, Michael Palumbo, and specific agenda was designed to generate
Nathaniel Wilcox (2003). the preferredoutcome to Levine.
Plott and Levine (1978) took this field
6.2 ThreeExamplesof MinimallyInvasive
resultbackinto the lab, as well as to the the-
Experiments
ory chalkboard.This process illustratesthe
Committeesin the Field. Michael Levine complementaritywe urge in all areas of
and Charles Plott (1977) report on a field researchwith lab and field experiments.
experiment they conducted on members of Betting in the Field. Camerer (1998) is a
a flying club in which Levine was a mem- wonderful example of a field experiment
ber.55The club was to decide on a particular that allowed the controls necessary for an
configuration of planes for the members, experiment,but otherwisestudied naturally
and Levine wanted help designing a fair occurring behavior. He recognized that
agenda to deal with this problem. Plott sug- computerizedbetting systems allowed bets
gested to Levine that there were many fair to be placed and cancelled before the race
agendas, each of which would lead to a dif- was run. Thus he could try to manipulate
ferent outcome, and suggested choosing the the marketby placingbets in certainwaysto
one that got the outcome Levine desired. move the marketodds, and then cancelling
Levine agreed, and the agenda was designed them. The cancellationkeeps his net budg-
using principles that Plott understood from et at zero, and in fact is one of the main
committee experiments (but not agenda treatments-to see if such a temporarybet
experiments, which had never been affects prices appreciably.He found that it
attempted at that stage). The parameters did not, but the methodologicalcleanliness
assumed about the field were from Levine's of the test is remarkable.It is also of inter-
impressions and his chatting among mem- est to see that the possibilityof manipulat-
bers. The selected agenda was implemented ing betting markets in this way was
and Levine got what he wanted: the group motivated in part by observationsof such
even complemented him on his work. efforts in laboratorycounterparts(p. 461).
A controversy at the flying club followed The only issue is how general such oppor-
during the process of implementing the group tunities are. This is not a criticism of their
decision. The club president, who did not like use: serendipity has always been a hand-
the choice, reported to certain decision-mak- maiden of science. One cannot expect that
ers that the decision was something other all problemsof interest can be addressedin
than the actual vote. This resulted in another a naturalsetting in such a minimallyinvasive
polling of the group, using a questionnaire manner.
that Plott was allowed to design. He designed Begging in the Field. List and Lucking-
it to get the most complete and accurate pic- Reiley (2002) designed charitable solicita-
ture possible of member preferences. tions to experimentallycompare outcomes
Computation and laboratory experiments, between different seed-money amountsand
using induced values with the reported pref- differentrefund rules by using three differ-
erences, demonstrated that in the lab the ent seed proportionlevels: 10 percent, 33
outcomes were essentially as predicted. percent, or 67 percent of the $3,000
Levine and Plott (1977) counts as a "min- requiredto purchasea computer.These pro-
imally invasive" field experiment, at least in portions were chosen to be as realistic as
the ex ante sense, since there is evidence possible for an actual fundraisingcampaign
that the members did not know that the while also satisfyingthe budget constraints
they were given for this particularfundraiser.
55We are
grateful to Charles Plott for the following They also experimentedwith the use of a
account of the events "behind the scenes." refund,which guaranteesthe individualher
1036 Journal of Economic Literature, Vol. XLII (December 2004)
money back if the goal is not reachedby the treatment to another. In treatment 10NR,
group.Thus, potentialdonorswere assigned for example, the first of two crucial sen-
to one of six treatments,each funding a dif- tences read as follows: "We have already
ferent computer. They refer to their six obtained funds to cover 10 percent of the
treatmentsas 10, 10R, 33, 33R, 67, and 67R, cost for this computer, so we are soliciting
with the numbersdenoting the seed-money donations to cover the remaining $2,700." In
proportion,and R denoting the presence of treatments where the seed proportion dif-
a refundpolicy. fered from 10 percent, the 10 percent and
In carrying out their field experiments, $2,700 numbers were changed appropriate-
they wished to solicit donors in a way that ly. The second crucial sentence stated: "If we
matched, as closely as possible, the current fail to raise the $2,700 from this group of 500
state of the art in fundraising.With advice individuals, we will not be able to purchase
from fundraising companies Donnelley the computer, but we will use the received
Marketing in Englewood, Colorado, and funds to cover other operating expenditures
Caldwell in Atlanta,Georgia,they followed of CEPA." The $2,700 number varied with
generally accepted rules believed to maxi- the seed proportion, and in refund treat-
mize overall contributions.First, they pur- ments this sentence was replaced with: "If
chased the names and addresses of 3,000 we fail to raise the $2,700 from this group of
householdsin the Central Floridaarea that 500 individuals, we will not be able to pur-
met two importantcriteria:1) annualhouse- chase the computer, so we will refund your
hold income above $70,000, and 2) house- donation to you." All other sentences were
hold was knownto have previouslygiven to a identical across the six treatments.
charity(some had in fact previouslygiven to In this experiment the responses from
the Universityof CentralFlorida).They then agents were from their typical environments,
assigned500 of these namesto each of the six and the subjects were not aware that they
treatments.Second,they designedan attrac- were participating in an experiment.
tive brochuredescribingthe new center and
its purpose.Third,theywrotea letterof solic- 7. SocialExperiments
itationwith three maingoalsin mind:making
7.1 What Constitutesa SocialExperiment
the letter engagingand easy to read,promot-
in Economics?
ing the benefits of a proposed Center for
EnvironmentalPolicy Analysis(CEPA),and Robert Ferber and Warner Hirsch (1982,
clearly stating the key points of the experi- p. 7) define social experiments in economics
mental protocol. In the personalizedletter, as "... a publicly funded study that incorpo-
they noted CEPA'srole within the Central rates a rigorous statistical design and whose
Floridacommunity,the total funds required experimental aspects are applied over a peri-
to purchase the computer, the amount of od of time to one or more segments of a
seed moneyavailable,the numberof solicita- human population, with the aim of evaluating
tions sent out, and the refund rule (if any). the aggregate economic and social effects of
They also explained that contributionsin the experimental treatments." In many
excess of the amountrequiredfor the com- respects this definition includes field experi-
puter would be used for other purposes at ments and even lab experiments. The point
CEPA,noted the taxdeductibilityof the con- of departure for social experiments seems to
tribution,and closed the letter with contact be that they are part of a government agency's
informationin case the donorshad questions. attempt to evaluate programs by deliberate
The text of the solicitationletter was com- variations in agency policies. Thus they typi-
pletely identical across treatments, except cally involve variations in the way that the
for the variables that changed from one agency does its normal business, rather than
Harrison and List: Field Experiments 1037
de novo programs. This characterization fits determining punitive damages in the civil
well with the tradition of large-scale social lawsuits generated by the Exxon Valdez oil
experiments in the 1960s and 1970s, dealing spill. It is also playing a major role in ongo-
with negative income taxes, employment pro- ing efforts by some corporations to affect
grams, health insurance, electricity pricing, "tort reform" with respect to limiting appeal
and housing allowances.56 bonds for punitive awards and even caps on
In recent years the lines have become punitive awards.
blurred. Government agencies have been
7.2 MethodologicalLessons
using experiments to examine issues or poli-
cies that have no close counterpart, so that The literature on social experiments has
their use cannot be viewed as variations on a been the subject of sustained methodologi-
bureaucratic theme. Perhaps the most cal criticism. Unfortunately, this criticism
notable social experiments in recent years has created a false tension between the use
have been paired-audit experiments to iden- of experiments and the use of econometrics
tify and measure discrimination. These applied to field data. We believe that virtual-
involve the use of "matched pairs" of indi- ly all of the criticisms of social experiments
viduals, who are made to look as much alike potentially apply in some form to field
as possible apart from the protected charac- experiments unless they are run in an ideal
teristics (e.g., race). These pairs then con- manner, so we briefly review the important
front the target subjects, who are employers, ones. Indeed, many of them also apply to
landlords, mortgage loan officers, or car conventional lab experiments.
salesmen. The majority of audit studies con- Recruitmentand the EvaluationProblem.
ducted to date have been in the fields of Heckman and Smith (1995, p. 87) go to the
employment discrimination and housing dis- heart of the role of experiments in a social-
crimination (see P A. Riach and J. Rich 2002 policy setting, when they note that "the
for a review).57 strongest argument in favor of experiments
The lines have also been blurred by open is that under certain conditions they solve
lobbying efforts by private companies to the fundamental evaluation problem that
influence social-policy change by means of arises from the impossibility of observing
experiments. Exxon funded a series of exper- what would happen to a given person in
iments and surveys, collected by Jerry both the state where he or she receives a
Hausman (1993), to ridicule the use of the treatment (or participates in a program) and
contingent valuation method in environmen- the state where he or she does not. If a per-
tal damage assessment. This effort was in son could be observed in both states, the
response to the role that such surveys poten- impact of the treatment on that person
tially played in the criminal action brought could be calculated by comparing his or
by government trustees after the Exxon her outcomes in the two states, and the
Valdez oil spill. Similarly, ExxonMobil fund- evaluation problem would be solved."
ed a series of experiments and focus groups, Randomization to treatment is the means by
collected in Cass Sunstein et al. (2002), to which social experiments solve this problem
ridicule the way in which juries determine if one assumes that the act of randomizing
punitive damages. This effort was in subjects to treatment does not lead to a clas-
response to the role that juries played in sic sample selection effect, which is to say
that it does not "alter the pool of participants
56 See Ferber and Hirsch of their behavior" (p. 88).
(1978, 1982) and Jerry
Hausman and David Wise (1985) for wonderful reviews.
57 Some discrimination studies have been undertaken Unfortunately, randomization could plau-
by academics with no social-policy evaluation (e.g., Chaim sibly lead to either of these outcomes, which
Fershtman and Uri Gneezy 2001 and List 2004b). are not fatal but do necessitate the use of
1038 Journal of Economic Literature, Vol. XLII (December 2004)
changes in the extent of the injury, or differ- to inform. The second paragraph is intended
ences in the valuation placed on the injury.60 to convince the recipient of their importance
Unfortunately, such surveys suffer from the to the study. The idea here is to explain that
fact that they do not ask subjects to make a their name has been selected as one of a
direct economic commitment, and that this small sample, and that for the sample to be
will likely generate an inflated valuation representative they need to respond. The
report.61 However, many field surveys are goal is clearly to put some polite pressure on
designed to avoid the problem of hypothetical the subject to make sure that their socio-eco-
bias, by presenting the referenda as "adviso- nomic characteristic set is represented.
ry."Great care is often taken in the selection The third paragraph ensures confidential-
of motivational words in cover letters, open- ity, so that the subject can ignore any possi-
ing survey questions, and key valuation ques- ble repercussion from responding one way or
tions, to encourage the subject to take the the other in a "politically incorrect" manner.
survey seriously in the sense that their Although seemingly mundane, this assur-
response will "count."62To the extent that ance can be important when the researcher
they achieve success in this, these surveys interprets the subject as responding to the
should be considered social experiments. question at hand rather than uncontrolled
Consider the generic cover letter advocat- perceptions of repercussions. It also serves
ed by Don Dillman (1978, pp. 165ff.) for use to mimic the anonymity of the ballot box.
in mail surveys. The first paragraph is The fourth paragraph builds on the pre-
intended to convey something about the ceding three to drive home the usefulness of
social usefulness of the study: that there is the survey response itself, and the possibility
some policy issue that the study is attempting that it will influence behavior:
generate these comparisons. The main elected to take the annuity, one could infer
attraction of natural experiments is that that his discount rate was less than the
they reflect the choices of individuals in a threshold.67
natural setting, facing natural conse- This design is essentially the same as one
quences that are typically substantial. The used in a long series of laboratory experi-
main disadvantage of natural experiments ments studying the behavior of college stu-
derives from their very nature: the experi- dents.68 Comparable designs have been
menter does not get to pick and choose the taken into the field, such as the study of the
specifics of the treatments, and the experi- Danish population by Harrison, Lau, and
menter does not get to pick where and Williams (2002). The only difference is that
when the treatments will be imposed. The the field experiment evaluated by WP
first problem may result in low power to offered each individual only one discount
detect any responses of interest, as we rate: Harrison, Lau, and Williams offered
illustrate with a case study in section 8.2 each subject twenty different discount
below. While there is a lack of control, we rates, ranging between 2.5 percent and 50
should obviously not look a random gift percent.
horse in the mouth when it comes to mak- Five features of this natural experiment
ing inferences. There are some circum- make it particularly compelling for the pur-
stances, briefly reviewed in section 8.3, pose of estimating individual discount
when nature provides useful controls to rates. First, the stakes were real. Second,
augment those from theory or "manmade" the stakes were substantial and dwarf any-
experimentation. thing that has been used in laboratory
experiments with salient payoffs in the
United States. The average lump-sum
8.2 Inferring Discount Rates by Heroic
amounts were around $50,000 and $25,000
Extrapolation for officers and enlisted personnel, respec-
-1
.025-
-.8
.a .02- "u
aC)
.015- -.6
0
0
.01- -.4 -
- .005-
.005- -.2
0 I
I I I I I
0
0 20 40 60 80 100
Discount Rate in Percent
Estimated ... ...;'I"..
::::::::::: Offered
Figure 2. Offered and Estimated Discount Rates in Warnerand Pleeter Natural Experiments
model used by WP. 72 These results pool roughly centered on the distribution of
the data for all separating personnel. The offered rates, but much more dispersed.
grey histogram shows the after-tax discount There is nothing "wrong"with these differ-
rates that were offered, and the black his- ences between the offered and estimated
togram shows the discount rates inferred discount rates, although they will be critical
from the estimated "log-linear" model that when we calculate standard errors on these
constrains discount rates to be positive. estimated discount rates. Again, the estimat-
Given the different shapes of the his- ed rates in figure 2 are based on the logic
tograms, they use different vertical axes to described above: no prediction error is
allow simple visual comparisons. assumed from the estimated statistical
The main result is that the distribution of model when it is applied at the level of the
estimated discount rates is much wider than individual to predict the threshold rate at
the distribution of offered rates. Harrison which the lump-sum would be accepted.
(2005a) presents separate results for the The main conclusion of WP is contained
samples of officers and enlisted personnel, in their table 6, which lists estimates of the
and for the alternative specifications consid- average discount rates for various groups of
ered by WP. For enlisted personnel the dis- their subjects. Using the model that imposes
tribution of estimated rates is almost entirely the a priori restriction that discount rates be
out-of-sample in comparison to the offered positive, they report that the average dis-
rates above it. The distribution for officers is count rate for officers was 18.7 percent, and
53.6 percent for enlisted personnel. What
72 are the standard errors on these means?
Virtually identical results are obtained with the
model that corrects for possible sample-selection effects. There is reason to expect that they could be
Harrison and List: Field Experiments 1045
quite large, due to constraintson the scope distributionsonly reflects samplingover the
of the naturalexperiment. individuals. One can generate standard
Individuals were offered a choice errors that also capture the uncertaintyin
between a lump-sum and an annuity.The the probit model coefficients as well.
before-tax discount rate that just equated Figure 3 displaysthe resultsof takinginto
the present value of the two instruments account the uncertainty about the coeffi-
ranged between 17.5 percent and 19.8 per- cients of the estimated model used by WP.
cent, which is a very narrow range of dis- Since it is an importantdimensionto consid-
count rates. The after-tax equivalent rates er, we show the time horizonfor the elicited
rangedfrom a low of 14.5 percent up to 23.5 discount rates on the horizontalaxis.74The
percent for those offered the separation middle line showsa cubic spline throughthe
option, but over 99 percent of the after-tax predicted average discount rate. The top
rates were between 17.6 percent and 20.4 (bottom) line shows a cubic spline through
percent. Thus the above inferences about the upper (lower) bound of the 95 percent
average discount rates for enlisted person- confidence interval,allowingfor uncertainty
nel are "out of sample,"in the sense that in the individualpredictionsdue to reliance
they do not reflect direct observation of on an estimatedstatisticalmodel to infer dis-
responses at those rates of 53.6 percent, or count rates.75Thus, in figure 3 we see that
indeed at any ratesoutside the interval(14.5 there is considerableuncertaintyabout the
percent, 23.5 percent). Figure 2 illustrates discount rates for enlisted personnel, and
this point as well, since the right mode is that it is asymmetric.On balance,the model
entirelydue to the estimatesof enlisted per- implies a considerableskewness in the dis-
sonnel. The average for enlisted personnel tribution of rates for enlisted personnel,
therefore reflects, and relies on, the predic- with some individualshavingextremelyhigh
tive power of the parametric functional implied discount rates. Turning to the
forms fitted to the observeddata.The same results for officers,we find much less of an
general point is true for officers, but the effect from model uncertainty.In this case
problem is far less severe. the rates are relatively precisely inferred,
Even if one acceptedthe parametricfunc- particularlyaroundthe range of rates span-
tional forms (probit),the standarderrorsof ning the effective rates offered, as one
predictions outside of the sample range of would expect.76
break-evendiscountrateswill be much larg- We conclude that the resultsfor enlisted
er thanthose within the samplerange.73The personnel are too impreciselyestimatedfor
standard errors of the predicted response
can be calculateddirectlyfrom the estimat- 74 The time horizon of the annuity offered to individu-
ed model. Note that this is not the same as als in the field varied directly with the years of military
the distributionshownin figure2, which is a service completed. For each year of service the horizon on
the annuitywas two years longer. As a result, the annuities
distributionover the sampleof individualsat
being considered by individuals were between fourteen
each simulated discount rate that assume and thirty years in length. With roughly 10 percent of the
that the model providesa perfectprediction sample at each horizon, the average annuity horizon was
around 22 years.
for each individual.In other words,the pre- 75 In fact, we calculate rates
only up to 100 percent, so
dictions underlying figure 2 just use the the upper confidence intervals for the model is con-
strained to equal 100 percent for that reason. It would be
averagepredictionfor each individualas the a simple matter to allow the calculation to consider higher
truth, so the samplingerror reflected in the rates, but there would be little inferentialvalue in doing so.
76 It is a standardresult from elementary econometrics
that the forecast intervalwidens as one uses the regression
73 Relaxing the functional form also allows some addi- model to predict for values of the exogenous variablesthat
tional uncertainty into the estimation of individual dis- are further and further away from their average (e.g.,
count rates. William Greene 1993, p. 164-66).
1046 Journal of Economic Literature, Vol. XLII (December 2004)
ik
75
50
25
I..
6i
z,j
L-% ...
25
::-:;::·::::
··· '
:::·::-:;:;;:.:.::;;:.:,;:::::;:
:-:::;:;::-
. I
v u
I I I I I I I I
15 20 25 30 15 20 25 30
time horizon of annuity in years time horizon of annuity in years
factors.77 This means that only one instru- marketbehavior,they are positing "whatif'
ment is required, which is fortunate since scenarios which need not be tethered to
nature is a stingy provider of such instru- reality. Sometimes theorists constraintheir
ments. Apart from twins, natural events that propositionsby the requirementthatthey be
have been exploited in this literature "operationally meaningful," which only
include birth dates, gender, and even weath- requiresthat they be capableof being refut-
er events, and these are not likely to grow ed, and not that anyone has the technology
dramatically over time.
or budget to actuallydo so.
Both of these concerns point the way to a Tests of expected utility theory have pro-
complementary use of different methods of
vided a dramaticillustrationof the impor-
experimentation, much as econometricians
tance of thought experiments being
use a priori identifying assumptions as a explicitly linked to stochastic assumptions
substitute for data in limited information involved in their use. Severalstudies offer a
environments. rich array of different error specifications
leading to very different inferences about
9. ThoughtExperiments the validity of expected utility theory, and
particularlyabout what part of it appearsto
Thought experiments are extremely com- be broken: Ballinger and Wilcox (1997);
mon in economics, and would seem to be Enrica Carbonne (1997); David Harless
fundamentally different from lab and field and Camerer(1994); Hey (1995);John Hey
experiments. We argue that they are not, and Chris Orme (1994); Graham Loomes,
drawing on recent literature examining the Peter Moffatt, and Sugden (2002); and
role of statistical specifications of experi- Loomes and Sugden (1995, 1998). The
mental tests of deterministic theories.
methodological problem is that debates
Although it may surprise some, the compar- over the characterizationof the residual
ison between lab experiments and field have come to dominate the substantive
experiments that we propose has analogues issues, as crisply drawn by Ballinger and
to the way thought experiments have been Wilcox (1997, p. 1102)78:
debated in analytic philosophy and the view
that thought experiments are just "attenuat- We know subjects are heterogeneous. The rep-
ed experiments." Finally, we consider the resentative decision maker ... restrictionfails
place of measures of the natural functioning miserablyboth in this study and new ones ....
of the brain during artefactual experimental Purely structural theories permit heterogeneity
conditions. by allowingseveralpreferencepatterns,but are
mute when it comes to mean errorratevariabil-
ity between or within patterns(restrictionslike
9.1 WhereAre the EconometricInstructions CE) and within-patternheterogeneityof choice
to TestTheory? probabilities(restrictionslike CH and ZWC).
We believe Occam'sRazorand the 'Facts don't
To avoid product liability litigation, it is kill theories, theories do' cliches do not apply:
standard practice to sell commodities with CE, CH andZWCarean atheoreticalsupporting
clear warnings about dangerous use and cast in dramasabout theoreticalstars,and poor
showingsby this cast shouldbe excusedneither
operating instructions designed to help one becausethey aresimplenorbecausethereareno
get the most out of the product. replacements.It is time to auditiona new cast.
Unfortunately, the same is not true of eco-
nomic theories. When theorists undertake In this instance, a lot has been learned
thought experiments about individual or about the hidden implicationsof alternative
77 78The notation in this
Rosenzweig and Wolpin (2000, p. 829, fn.4, and p. quote does not need to be
873). defined for the present point to be made.
1048 Journal of Economic Literature, Vol. XLII (December 2004)
subjects appear to exhibit substantialaver- more to do with the aims and rhetorical
sion to inequality in experiments of this goals of doing experiments. As Sorenson
kind, do we need to actuallyrun the experi- (1991, p. 205) notes:
ment in which the same subjectparticipates
in a dictatorgame and an investment game The aim of any experiment is to answer or raise
its question rationally. As stressed (earlier ...),
to realize that "trust"is weaklyoverestimat-
the motives of an experiment are multifarious.
ed by the executed trust experiments?One One can experiment in order to teach a new
might object that we would not be able to technique, to test new laboratory equipment, or
make this inference without having run to work out a grudge against white rats. (The
some prior experiments in which subjects principal architect of moder quantum electro-
transfermoney under dictator,so this design dynamics, Richard Feynman, once demonstrat-
ed that the bladder does not require gravity by
proposal of Cox (2004) does not count as a standing on his head and urinating.) The dis-
thought experiment. But imagine counter- tinction between aim and motive applies to
factually82that Cox (2004) left it at that, and thought experiments as well. When I say that an
did not actually run an experiment. We experiment 'purports' to achieve its aim without
would stillbe able to drawthe new inference execution, I mean that the experimental design
is presented in a certain way to the audience.
fromhis design that trustis weaklyover-esti- The audience is being invited to believe that
mated in previous experiments if one contemplation of the design justifies an answer
accounts for the potential confound of to the question or (more rarely) justifiably raises
inequality aversion.3 Thus, in what sense its question.
should we view the thought experiment of
the proposed design of Cox (2004) as any- In effect, then, it is caveat emptor with
thing other than an attenuatedversionof the thought experiments-but the same homily
ordinary experiment that he actually surely applies to any experiment, even if
designed and executed? executed.
One trepidation with treating a thought
9.3 That's Not a Thought Experiment ...
experimentas just a slimmed-downexperi- ThisIs!
ment is that it is untetheredby the realityof
"proof by data" at the end. But this has We earlier defined the word "field" in the
following manner: "used attributively to
82 A denote an investigation, study, etc., carried
thought experiment at work.
83 As it
happens, there are two further confounds at
out in the natural environment of a given
work in the trust design, each of which can be addressed. material, language, animal, etc., and not in
One is risk attitudes, at least as far as the interpretation of the laboratory, study, or office." Thus, in an
the behavior of the first player is concerned. Sending
money to the other player is risky.If the first player keeps important sense, experiments that employ
all of his endowment, there is no risk. So a risk-lovingplay- methods to measure neuronal activity during
er would invest, just for the thrill. A risk-averse player controlled tasks would be included, since the
would not invest for this reason. But if there are other
motives for investing, then risk attitudes will exacerbate or functioning of the brain can be presumed to
temper them, and need to be taken into account when be a natural reaction to the controlled stim-
identifying the residual as trust. Risk attitudes play no role ulus. Neuroeconomics is the study of how
for the second player's decision. The other confound, in
the proposed design of Cox (2004), is that the "price of different parts of the brain light up when
giving" in his proposed dictator game is $1 for $1 trans- certain tasks are presented, such as exposure
ferred, whereas it is $1 for $3 transferred in the invest- to randomly generated monetary gain or loss
ment game. Thus one would weakly understate the extent
of other-regarding preferences in his design, and hence in Hans Breiter et al. (2001), the risk elicita-
weakly overstate the residual "trust."The general point is tion tasks of Kip Smith et al. (2002) and
even clearer: after these potential confounds are taken Dickhaut et al. (2003), the trust games of
into account, what faith does one have that a reliable
measure of trust has been identified statistically in the McCabe et al. (2001), and the ultimatum
original studies? bargaining games of Alan Sanfey et al.
1050 Journal of Economic Literature, Vol. XLII (December 2004)
(2003). In many ways these methods are The main methodological conclusion we
extensions of the use of verbal protocols draw is that experimenters should be wary of
(speaking out loud as the task is performed) the conventional wisdom that abstract,
used by K. Anders Ericsson and Herbert imposed treatments allow general inferences.
Simon (1993) to study the algorithmic In an attempt to ensure generality and con-
processes that subjects were going through trol by gutting all instructions and procedures
as they solved problems, and the use of of field referents, the traditional lab experi-
mouse-tracking technology by Eric Johnson menter has arguably lost control to the extent
et al. (2002) to track sequential information that subjects seek to provide their own field
search in bargaining tasks. The idea is to referents. The obvious solution is to conduct
monitor some natural mental process as the experiments both ways: with and without nat-
experimental treatment is administered, urally occurring field referents and context.
even if the treatment is artefactual. If there is a difference, then it should be
studied. If there is no difference, one can
10. Conclusion conditionally conclude that the field behavior
in that context travels to the lab environment.
We have avoided drawing a single, bright
line between field experiments and lab REFERENCES
experiments. One reason is that there are Angrist, Joshua D.; Guido W. Imbens and Donald B.
Rubin. 1996. "Identification of Casual Effects Using
several dimensions to that line, and inevitably Instrumental Variables," J. Amer. Statist. Assoc.
there will be some trade-offs between those. 91:434, pp. 444-45.
The extent of those trade-offs will depend on Angrist, Joshua D. and Alan B. Krueger. 2001.
"Instrumental Variables and the Search for
where researchers fall in terms of their agree- Identification: From Supply and Demand to Natural
ment with the argument and issues we raise. Experiments," J. Econ. Persp. 15:4, pp. 69-85.
Another reason is that we disagree where Angrist, Joshua D. and Victor Lavy. 1999. "Using
Maimonides' Rule to Estimate the Effect of Class
the line would be drawn. One of us Size on Scholastic Achievement," Quart. J. Econ.
(Harrison), bred in the barren test-tube set- 114:2, pp. 553-75.
Ballinger, T. Parker; Michael G. Palumbo and
ting of classroom labs sans ferns, sees virtu- Nathaniel T. Wilcox. 2003. "Precautionary Saving
ally any effort to get out of the classroom as and Social Learning Across Generations: An
constituting a field experiment to some use- Experiment," Econ. J. 113:490, pp. 920-47.
ful degree. The other (List), raised in the Ballinger, T. Parker and Nathaniel T. Wilcox. 1997.
"Decisions, Error and Heterogeneity," Econ. J.
wilds amidst naturally occurring sports-card 107:443, pp. 1090-105.
geeks, would include only those experiments Banaji, Mahzarin R. and Robert G. Crowder. 1989.
that used free-range subjects. Despite this "The Bankruptcy of Everyday Memory," Amer.
Pyschol. 44, pp. 1185-93.
disagreement on the boundaries between Bateman, Ian; Alistair Munro, Bruce Rhodes, Chris
one category of experiments and another Starmer and Robert Sugden. 1997. "Does Part-
Whole Bias Exist? An Experimental Investigation,"
category, however, we agree on the charac- Econ. J. 107:441, pp. 322-32.
teristics that make a field experiment differ Becker, Gordon M.; Morris H. DeGroot and Jacob
from a lab experiment. Marschak. 1964. "Measuring Utility by a Single-
Using these characteristics as a guide, we Response Sequential Method," Behav. Sci. 9:July,
pp. 226-32.
propose a taxonomy of field experiments Beetsma, R. M. W J. and P. C. Schotman. 2001.
that helps one see their connection to lab "Measuring Risk Attitudes in a Natural Experiment:
Data from the Television Game Show Lingo," Econ.
experiments, social experiments, and natural
J. 111:474, pp. 821-48.
experiments. Many of the differences are Behrman, Jere R.; Mark R. Rosenzweig and Paul
illusory, such that the same issues of control Taubman. 1994. "Endowments and the Allocation of
apply. But many of the differences matter Schooling in the Family and in the Marriage Market:
The Twins Experiment," J. Polit. Econ. 102:6, pp.
for behavior and inference, and justify the 1131-74.
focus on the field. Benson, P. G. 2000. "The Hawthorne Effect," in The
Harrison and List: IYieldExperiments 1051
Corsini Encyclopedia of Psychology and Behavioral Making: A Comparison of Students and
Science. Vol. 2, 3rd ed. W E. Craighead and C. B. Businessmen in a Simulated Progressive Auction," in
Nemeroff, eds. NY: Wiley. Research in Experimental Economics, Vol. 3. V. L.
Berg, Joyce E.; John Dickhaut and Kevin McCabe. Smith, ed. Greenwich, CT: JAI Press.
1995. "Trust, Reciprocity, and Social History," Games Camerer, Colin F. 1998. "Can Asset Markets Be
Econ. Behav. 10, pp. 122-42. Manipulated? A Field Experiment with Racetrack
Berk, J. B.; E. Hughson and K. Vandezande. 1996. Betting," J. Polit. Econ. 106:3, pp. 457-82.
"The Price Is Right, but Are the Bids? An Camerer, Colin and Robin Hogarth. 1999. "The Effects
Investigation of Rational Decision Theory," Amer. of Financial Incentives in Experiments: A Review
Econ. Rev. 86:4, pp. 954-70. and Capital-Labor Framework," J. Risk Uncertainty
Berlin, Brent and Paul Kay. 1969. Basic Color Terms: 19, pp. 7-42.
Their Universality and Evolution. Berkeley: UC Press. Cameron, Lisa A. 1999. "Raising the Stakes in the
Binswanger, Hans P. 1980. "Attitudes Toward Risk: Ultimatum Game: Experimental Evidence from
Experimental Measurement in Rural India," Amer.J. Indonesia," Econ. Inquiry 37:1, pp. 47-59.
Ag. Econ. 62:3, pp. 395-407. Carbone, Enrica. 1997. "Investigation of Stochastic
. 1981. "Attitudes Toward Risk: Theoretical Preference Theory Using Experimental Data," Econ.
Implications of an Experiment in Rural India," Econ. Letters 57:3, pp. 305-11.
J. 91:364, pp. 867-90. Cardenas, Juan C. 2003. "Real Wealth and
Blackburn, McKinley; Glenn W. Harrison and E. Experimental Cooperation: Evidence from Field
Elisabet Rutstrim. 1994. "Statistical Bias Functions Experiments," J. Devel. Econ. 70:2, pp. 263-89.
and Informative Hypothetical Surveys," Amer. J. Ag. Carpenter, Jeffrey; Amrita Daniere and Lois Takahashi.
Econ. 76:5, pp. 1084-88. 2004. "Cooperation, Trust, and Social Capital in
Blundell, R. and M. Costa-Dias. 2002. "Alternative Southeast Asian Urban Slums," J. Econ. Behav. Org.
Approaches to Evaluation in Empirical 55:4, pp. 533-51.
Microeconomics," Portuguese Econ. J. 1, pp. 91-115. Carson, Richard T. 1997. "Contingent Valuation
Blundell, R. and Thomas MaCurdy. 1999. "Labor Surveys and Tests of Insensitivity to Scope," in
Supply: A Review of Alternative Approaches," in Determining the Value of Non-Marketed Goods:
Handbook of Labor Economics Vol. 3C. 0. Economic, Psychological, and Policy Relevant Aspects
Ashenfelter and D. Card, eds. Amsterdam: Elsevier of Contingent Valuation Methods. R. J. Kopp, W.
Science BV. Pommerhene and N. Schwartz, eds. Boston: Kluwer.
Bohm, Peter. 1972. "Estimating the Demand for Public Carson, Richard T.; Robert C. Mitchell, W. Michael
Goods: An Experiment," Europ. Econ. Rev. 3:2, pp. Hanemann, Raymond J. Kopp, Stanley Presser and
111-30. Paul A. Ruud. 1992. A Contingent Valuation Study of
. 1979. "Estimating Willingness to Pay: Why Lost Passive Use Values Resulting From the Exxon
and How?" Scand. J. Econ. 81:2, pp. 142-53. Valdez Oil Spill. Anchorage: Attorney Gen. Alaska.
1984a. "Revealing Demand for an Actual Chamberlin, Edward H. 1948. "An Experimental
Public Good,"J. Public Econ. 24, pp. 135-51. Imperfect Market," . Polit. Econ. 56:2, 95-108.
. 1984b. "Are There Practicable Demand- Coller, Maribeth and Melonie B. Williams. 1999.
Revealing Mechanisms?" in Public Finance and the "Eliciting Individual Discount Rates," Exper. Econ.
Quest for Efficiency. H. Hanusch, ed. Detroit: 2, pp. 107-27.
Wayne State U. Press. Conlisk, John. 1989. "Three Variants on the Allais
. 1994. "Behavior under Uncertainty without Example," Amer. Econ. Rev. 79:3, pp. 392-407.
Preference Reversal: A Field Experiment," in . 1996. "Why Bounded Rationality?" J. Econ.
Experimental Economics. J. Hey, ed. Heidelberg: Lit. 34:2, pp. 669-700.
Physica-Verlag. Cox, James C. 2004. "How To Identify Trust and
Bohm, Peter and Hans Lind. 1993. "Preference Reciprocity," Games Econ. Behav. 46:2, pp. 260-81.
Reversal, Real-World Lotteries, and Lottery- Cox, James C. and Stephen C. Hayne. 2002. "Barking
Interested Subjects," J. Econ. Behav. Org. 22:3, pp. Up the Right Tree: Are Small Groups Rational
327-48. Agents?" work. pap. Dept. Econ. U. Arizona.
Bornstein, Gary and Ilan Yaniv. 1998. "Individual and Cubitt, Robin P. and Robert Sugden. 2001. "On
Group Behavior in the Ultimatum Game: Are Money Pumps," Games Econ. Behav. 37:1, pp.
Groups More Rational Players?" Exper Econ. 1:1. 121-60.
pp. 101-108. Cummings, Ronald G.; Steven Elliott, Glenn W.
Breiter, Hans C.; Itzhak Aharon, Daniel Kahneman, Harrison and James Murphy. 1997. "Are
Anders Dale, and Peter Shizgal. 2001. "Functional Hypothetical Referenda Incentive Compatible?" J.
Imaging of Neural Responses to Expectancy and Polit. Econ. 105:3, pp. 609-21.
Experience of Monetary Gains and Losses," Neuron Cummings, Ronald G. and Glenn W. Harrison. 1994.
30:2, pp. 619-39. "Was the Ohio Court Well Informed in Their
Bronars, Stephen G. and Jeff Grogger. 1994. "The Assessment of the Accuracy of the Contingent
Economic Consequences of Unwed Motherhood: Valuation Method?" Natural Res. J. 34:1, pp. 1-36.
Using Twin Births as a Natural Experiment," Amer. Cummings, Ronald G.; Glenn W. Harrison and Laura
Econ. Rev. 84:5, pp. 1141-56. L. Osbore. 1995. "Can the Bias of Contingent
Burns, Penny. 1985. "Experience and Decision Valuation Be Reduced? Evidence from the
1052 Journal of Economic Literature, Vol. XLII (December 2004)
Laboratory," econ. work. pap. B-95-03, College Preference: A Critical Review,"J. Econ. Lit. 40:2, pp.
Business Admin., U. South Carolina. 351-401.
Cummings, Ronald G.; Glenn W Harrison and E. Gertner, R. 1993. "Game Shows and Economic
Elisabet Rutstr6m. 1995. "Homegrown Values and Behavior: Risk-Taking on Card Sharks," Quart. J.
Hypothetical Surveys: Is the Dichotomous Choice Econ. 108:2, pp. 507-21.
Approach Incentive Compatible?" Amer. Econ. Rev. Gigerenzer, Gerd; Peter M. Todd and the ABC
85:1, pp. 260-66. Research Group. 2000. Simple Heuristics That Make
Cummings, Ronald G. and Laura O. Taylor. 1999. Us Smart. NY: Oxford U. Press.
"Unbiased Value Estimates for Environmental Gimotty, Phyllis A. 2002. "Delivery of Preventive
Goods: A Cheap Talk Design for the Contingent Health Services for Breast Cancer Control: A
Valuation Method," Amer Econ. Rev. 89:3, pp. Longitudinal View of a Randomized Controlled
649-65. Trial,"Health Services Res. 37:1, pp. 65-85.
Deacon, Robert T. and Jon Sonstelie. 1985. "Rationing Greene, William H. 1993. Econometric Analysis, 2nd
by Waiting and the Value of Time: Results from a ed. NY: Macmillan.
Natural Experiment," . Polit. Econ. 93:4, pp. 627-47. Grether, David M.; R. Mark Isaac and Charles R. Plott.
Dehejia, Rajeev H. and Sadek Wahba. 1999. "Causal 1981. "The Allocation of Landing Rights by
Effects in Nonexperimental Studies: Reevaluating Unanimity among Competitors," Amer. Econ. Rev.
the Evaluation of Training Programs," J. Amer. Pap. Proceed. 71:May, pp. 166-71.
Statist. Assoc. 94:448, pp. 1053-62. 1989. The Allocation of Scarce Resources:
. 2002. "Propensity Score Matching for Experimental Economics and the Problem of
Nonexperimental Causal Studies," Rev. Econ. Allocating Airport Slots. Boulder: Westview Press.
Statist. 84, pp. 151-61. Grether David M. and Charles R. Plott. 1984. "The
Dickhaut, John; Kevin McCabe, Jennifer C. Nagode, Effects of Market Practices in Oligopolistic Markets:
Aldo Rustichini and Jos6 V. Pardo. 2003. "The Impact An Experimental Examination of the Ethyl Case,"
of the Certainty Context on the Process of Choice," Econ. Inquiry 22:0ct. pp. 479-507.
Proceed. Nat. Academy Sci. 100:March, pp. 3536-41. Haigh, Michael, and John A. List. 2004. "Do
Dillman, Don. 1978. mail and telephone surveys; The Professional Traders Exhibit Myopic Loss Aversion?
Total Design Method. NY: Wiley. An Experimental Analysis,"J. Finance 59, forthcom-
Duddy, Edward A. 1924 "Report on an Experiment in ing.
Teaching Method,"J. Polit. Econ. 32:5, pp. 582-603. Harbaugh, William T. and Kate Krause. 2000.
Dufwenberg, Martin and Uri Gneezy. 2000. "Children's Altruism in Public Good and Dictator
"Measuring Beliefs in an Experimental Lost Wallet Experiments," Econ. Inquiry 38:1, pp. 95-109.
Game," Games Econ. Behav. 30:2, pp. 163-82. Harbaugh, William T.; Kate Krause and Timothy R.
Dyer, Douglas and John H. Kagel. 1996. "Bidding in Berry. 2001. "GARP for Kids: On the Development
Common Value Auctions: How the Commercial of Rational Choice Behavior,"Amer. Econ. Rev. 91:5,
Construction Industry Corrects for the Winner's pp. 1539-45.
Curse," Manage. Sci. 42:10, pp. 1463-75. Harbaugh, William T.; Kate Krause and Lise
Eckel, Catherine C. and Philip J. Grossman. 1996. Vesterlund. 2002. "Risk Attitudes of Children and
"Altruism in Anonymous Dictator Games," Games Adults: Choices Over Small and Large Probability
Econ. Behav. 16, pp. 181-91. Gains and Losses," Exper. Econ. 5, pp. 53-84.
Ericsson, K. Anders, and Herbert A. Simon. 1993. Harless, David W and Colin F. Camerer. 1994 "The
Protocol Analysis: Verbal Reports as Data, rev. ed. Predictive Utility of Generalized Expected Utility
Cambridge, MA: MIT Press. Theories," Econometrica 62:6, pp. 1251-89.
Fan, Chinn-Ping. 2002. "Allais Paradox in the Small,"J. Harrison, Glenn W. 1988. "Predatory Pricing in A
Econ. Behav. Org. 49:3, pp. 411-21. Multiple Market Experiment," J. Econ. Behav. Org.
Ferber, Robert and Werner Z. Hirsch. 1978. "Social 9, pp. 405-17.
Experimentation and Economic Policy: A Survey,"J. 1992a. "Theory and Misbehavior of First-Price
Econ. Lit. 16:4, pp. 1379-414. Auctions: Reply,"Amer. Econ. Rev. 82:5, 142643.
. 1982. Social Experimentation and Economic . 1992b. "Market Dynamics, Programmed
Policy. NY: Cambridge U. Press. Traders, and Futures Markets: Beginning the
Fershtman, Chaim and Uri Gneezy. 2001. Laboratory Search for a Smoking Gun," Econ.
"Discrimination in a Segmented Society: An Record 68, Special Issue Futures Markets, pp. 46-62.
Experimental Approach," Quart. J. Econ. 116, pp. 2005a. "Field Experiments and Control," in
351-77. Field Experiments in Economics. J. Carpenter, G. W.
Fix, Michael and Raymond J. Struyk, eds. 1993. Clear Harrison and J. A. List, eds. Research in Exper.
and Convincing Evidence: Measurement of Econ. Vol. 10. Greenwich, CT: JAI Press,
Discrimination in America. Washington, DC: Urban 2005b "Experimental Evidence on
Institute Press. Alternative Environmental Valuation Methods,"
Frech, H. E. 1976. "The Property Rights Theory of the Environ. Res. Econ. 23, forthcoming.
Firm: Empirical Results from a Natural Harrison, Glenn W; Ronald M. Harstad and E.
Experiment," J. Polit. Econ. 84:1, pp. 143-52. Elisabet Rutstrim. 2004. "Experimental Methods
Frederick, Shane; George Loewenstein and Ted and Elicitation of Values," Exper. Econ. 7:June, pp.
O'Donoghue. 2002. "Time Discounting and Time 123-40.
Harrison and List: Field Experiments 1053
Harrison, Glenn W. and Bengt Kristr6m. 1996. "On the Henrich, Joseph and Richard McElreath. 2002. "Are
Interpretation of Responses to Contingent Valuation Peasants Risk-Averse Decision Markers?" Current
Surveys," in Current Issues in Environmental Anthropology 43:1, pp. 172-81.
Economics. P O. Johansson, B. Kristrim and K. G. Hey, John D. 1995. "Experimental Investigations of
Maler, eds. Manchester: Manchester U. Press. Errors in Decision Making Under Risk," Europ.
Harrison, Glenn W.; Morten Igel Lau and Melonie B. Econ. Rev. 39, pp. 633-40.
Williams. 2002. "Estimating Individual Discount Hey, John D. and Chris Orme. 1994. "Investigating
Rates for Denmark: A Field Experiment," Amer. Generalizations of Expected Utility Theory Using
Econ. Rev. 92:5, pp. 1606-17. Experimental Data," Econometrica 62:6, pp.
Harrison, Glenn W. and James C. Lesley. 1996. "Must 1291-326.
Contingent Valuation Surveys Cost So Much?" J. Hoffman, Elizabeth; Kevin A. McCabe and Vernon L.
Environ. Econ. Manage. 31:1, pp. 79-95. Smith. 1996. "On Expectations and the Monetary
Harrison, Glenn W and John A. List. 2003. "Naturally Stakes in Ultimatum Games," Int. J. Game Theory
Occurring Markets and Exogenous Laboratory 25:3, pp. 289-301.
Experiments: A Case Study of the Winner's Curse," Holt, Charles A. and Susan K. Laury. 2002. "Risk
work. pap. 3-14, Dept. Econ., College Bus. Admin., Aversion and Incentive Effects," Amer. Econ. Rev.
U. Central Florida. 92:5, pp. 1644-55.
Harrison, Glenn W; Thomas F. Rutherford and David Hong, James T. and Charles R. Plott. 1982. "Rate
G. Tarr. 1997. "Quantifying the Uruguay Round," Filing Policies for Inland Water Transportation: An
Econ. J. 107:444, pp. 1405-30. Experimental Approach," Bell J. Econ. 13:1, pp.
Harrison, Glenn W and Elisabet Rutstr6m. 2001. 1-19.
"Doing It Both Ways-Experimental Practice and Hotz, V. Joseph. 1992. "Designing an Evaluation of
Heuristic Context," Behav. Brain Sci. 24:3, pp. JTPA," in Evaluating Welfare and Training
413-14. Programs. C. Manski and I. Garfinkel, eds.
Harrison, Glenn W and H. D. Vinod. 1992. "The Cambridge: Harvard U. Press.
Sensitivity Analysis of Applied General Equilibrium Hoxby, Caroline M. 2000. "The Effects of Class Size on
Models: Completely Randomized Factorial Sampling Student Achievement: New Evidence From
Designs," Rev. Econ. Statist. 74:2, pp. 357-62. Population Variation," Quart. J. Econ. 115:4, pp.
Hausman, Jerry A. 1993. Contingent Valuation. NY: 1239-85.
North-Holland. Imber, David; Gay Stevenson and Leanne Wilks. 1991.
Hausman, Jerry A. and David A. Wise. 1985. Social A Contingent Valuation Survey of the Kakadu
Experimentation Chicago: U. Chicago Press. Conservation Zone Canberra: Austral. Govt. Pub.,
Hayes, J. R. and H. A. Simon. 1974. "Understanding Resource Assess. Con.
Written Problem Instructions," in Knowledge and Isaac, R. Mark and Vernon L. Smith. 1985. "In Search
Cognition. L. W. Gregg, ed. Hillsdale, NJ: Erlbaum. of Predatory Pricing," J. Polit. Econ. 93:2, pp.
Heckman, James J. 1998. "Detecting Discrimination," 320-45.
J. Econ. Perspect. 12:2, pp. 101-16. Johnson, Eric J.; Colin F. Camerer, Sen Sankar and
Heckman, James J. and Richard Robb. 1985. Talia Tymon. 2002. "Detecting Failures of Backward
"Alternative Methods for Evaluating the Impact of Induction: Monitoring Information Search in
Interventions," in Longitudinal Analysis of Labor Sequential Bargaining," J. Econ. Theory 104:1, pp.
Market Data. J. Heckman and B. Singer, eds. NY: 16-47.
Cambridge U. Press. Kachelmeier, Steven J. and Mohamed Shehata. 1992.
Heckman, James J. and Peter Siegelman. 1993. "The "Examining Risk Preferences Under High Monetary
Urban Institute Audit Studies: Their Methods and Incentives: Experimental Evidence from the
Findings," in Clear and Convincing Evidence: People's Republic of China," Amer. Econ. Rev. 82:5,
Measurement of Discrimination in America. M. Fix pp. 1120-41.
and R. J. Struyk, eds. Washington, DC: Urban Kagel, John H.; Raymond C. Battalio and Leonard
Institute Press. Green. 1995. Economic Choice Theory. An
Heckman, James J. and Jeffrey A. Smith. 1995. Experimental Analysis of Animal Behavior. NY:
"Assessing the Case for Social Experiments,"J. Econ. Cambridge U. Press.
Perspect. 9:2, pp. 85-110. Kagel, John H.; Raymond C. Battalio and James M.
Henrich, Joseph. 2000. "Does Culture Matter in Walker. 1979. "Volunteer Artifacts in Experiments in
Economic Behavior? Ultimatum Game Bargaining Economics: Specification of the Problem and Some
Among the Machiguenga," Amer. Econ. Rev. 90:4, Initial Data from a Small-Scale Field Experiment,"
pp. 973-79. in Research in Experimental Economics. Vol. 1. V.L.
Henrich, Joseph; Robert Boyd, Samuel Bowles, Colin Smith, ed. Greenwich, CT: JAI Press.
Camerer, Herbert Gintis, Richard McElreath and Kagel, John H.; Ronald M. Harstad and Dan Levin.
Ernst Fehr. 2001. "In Search of Homo Economicus: 1987. "Information Impact and Allocation Rules in
Experiments in 15 Small-Scale Societies," Amer. Auctions with Affiliated Private Values: A Laboratory
Econ. Rev. 91:2, pp. 73-79. Study," Econometrica 55:6, pp. 1275-304.
Henrich, Joseph; Robert Boyd, Samuel Bowles, Colin Kagel, John H. and Dan Levin. 1986. "The Winner's
Camerer, Ernst Fehr and Herbert Gintis, eds. 2004. Curse and Public Information in Common Value
Foundations of Human Sociality. NY: Oxford U. Press Auctions," Amer Econ. Rev. 76:5, pp. 894-920.
1054 Journal of Economic Literature, Vol. XLII (December 2004)
1999. "Common Value Auctions with Insider from the Vantage-Point of Behavioral Economics,"
Information," Econometrica 67:5, pp. 1219-38. Econ. J. 109:453, pp. F25-F34.
2002. Common Value Auctions and the Loomes, Graham; Peter G. Moffatt and Robert
Winner's Curse. Princeton: Princeton U. Press. Sugden. 2002, "A Microeconometric Test of
Kagel, John H.; Don N. MacDonald and Raymond C. Alternative Stochastic Theories of Risky Choice," J.
Battalio. 1990. "Tests of'Fanning Out' of Indifference Risk Uncertainty 24:2, pp. 103-30.
Curves: Results from Animal and Human Loomes, Graham and Robert Sugden. 1995.
Experiments," Amer. Econ. Rev. 80:4, pp. 912-21. "Incorporating a Stochastic Element Into Decision
Kramer, Michael and Stanley Shapiro. 1984. "Scientific Theories," Europ. Econ. Rev. 39, pp. 641-48.
Challenges in the Application of Randomized Trials," . 1998. "Testing Different Stochastic
J. Amer. Medical Assoc. 252:19, pp. 2739-45. Specifications of Risky Choice," Economica 65, pp.
Krueger, Alan B. 1999. "Experimental Estimates of 581-98.
Production Functions, "Quart. J. Econ. 114:2, pp. Lucking-Reiley, David. 1999. "Using Field
497-532. Experiments to Test Equivalence Between Auction
Kunce, Mitch; Shelby Gerking and William Morgan. Formats: Magic on the Internet," Amer. Econ. Rev.
2002. "Effects of Environmental and Land Use 89:5, pp. 1063-80.
Regulation in the Oil and Gas Industry Using the Machina, Mark J. 1989. "Dynamic Consistency and
Wyoming Checkerboard as an Experimental Non-Expected Utility Models of Choice Under
Design," Amer. Econ. Rev. 92:5, pp. 1588-93. Uncertainty,"J. Econ. Lit. 27:4, pp. 1622-68.
Lalonde. Robert J. 1986. "Evaluating the Econometric McCabe, Kevin; Daniel Houser, Lee Ryan, Vernon
Evaluations of Training Programs with Experimental Smith and Theodore Trouard. 2001. "A Functional
Data," Amer. Econ. Rev. 76:4, pp. 604-20. Imaging Study of Cooperation in Two-Person
Levine, Michael E. and Charles R. Plott. 1977. Reciprocal Exchange," Proceed. Nat. Academy Sci.
"Agenda Influence and Its Implications," Virginia 98:20, pp. 11832-35.
Law Rev. 63:May, pp. 561-604. McClennan, Edward F. 1990. Rationality and
Levitt, Steven D. 2003. "Testing Theories of Dynamic Choice NY: Cambridge U. Press.
Discrimination: Evidence from the Weakest Link," McDaniel, Tanga M. and E. Elisabet Rutstrim, 2001.
NBER work. pap. 9449. "Decision Making Costs and Problem Solving
Lichtenstein, Sarah and Paul Slovic. 1973. "Response- Performance," Exper. Econ. 4:2, pp. 145-61.
Induced Reversals of Gambling: An Extended Metrick, Andrew. 1995. "A Natural Experiment in
Replication in Las Vegas," J. Exper Psych. 101, pp. 'Jeopardy!"'Amer. Econ. Rev. 85:1, pp. 240-53.
16-20. Meyer, Bruce D.; W Kip Viscusi and David L. Durbin.
List, John A. 2001. "Do Explicit Warnings Eliminate 1995. "Workers'Compensation and Injury Duration:
the Hypothetical Bias in Elicitation Procedures? Evidence from a Natural Experiment," Amer. Econ.
Evidence from Field Auctions for Sportscards," Rev. 85:3, pp. 322-40.
Amer Econ. Rev. 91:5, pp. 1498-507. Milgrom, Paul R. and Robert J. Weber. 1982. "A
.2003. "Friend or Foe: A Natural Experiment Theory of Auctions and Competitive Bidding,"
of the Prisoner's Dilemma," unpub. manuscript, U. Econometrica 50:5, pp. 1089-122.
Maryland Dept. Ag. Res. Econ. Neisser, Ulric, and Ira E. Hyman, Jr. eds. 2000.
.2004a. "Young, Selfish and Male: Field Memory Observed: Remembering in Natural
Evidence of Social Preferences," Econ. J. 114:492, Contexts. 2nd ed. NY: Worth Publishers.
pp. 121-49. Pearl, Judea. 1984. Heuristics: Intelligent Search
.2004b. "The Nature and Extent of Strategies for Computer Problem Solving. Reading,
Discrimination in the Marketplace: Evidence from MA: Addison-Wesley.
the Field," Quart. J. Econ. 119:1, pp. 49-89. Philipson, Tomas and Larry V. Hedges. 1998. "Subject
.2004c. "Neoclassical Theory Versus Prospect Evaluation in Social Experiments," Econometrica
Theory: Evidence from the Marketplace," 66:2, pp. 381-408.
Econometrica 72:2, pp. 615-25. Plott, Charles R. and Michael E. Levine. 1978. "A
.2004d. "Field Experiments: An Introduction Model of Agenda Influence on Committee
and Survey," work. pap., U. Maryland. Dept. Ag. Decisions," Amer. Econ. Rev. 68:1, pp. 146-60.
Res. Econ. and Dept. Econ. Riach, P. A. and J. Rich. 2002. "Field Experiments of
2004e. "Testing Neoclassical Competitive Discrimination in the Market Place," Econ. J.
Theory in Multi-Lateral Decentralized Markets," J. 112:483, pp. F480-F518.
Polit. Econ. 112:5, pp. 1131-56.. Rosenbaum, P. and Donald Rubin. 1983. "The Central
List, John A. and David Lucking-Reiley. 2000. Role of the Propensity Score in Observational
"Demand Reduction in a Multi-Unit Auction: Studies for Causal Effects," Biometrika 70, pp.
Evidence from a Sportscard Experiment," Amer 41-55.
Econ. Rev. 90:4, pp. 961-72. . 1984. "Reducing Bias in Observational
. 2002. "The Effects of Seed Money and Studies Using Multivariate Matched Sampling
Refunds on Charitable Giving: Experimental Methods that Incorporate the Propensity Score," J.
Evidence from a University Capital Campaign," J. Amer. Statist. Assoc. 79, pp. 39-68.
Polit. Econ. 110:1, pp. 215-33. Rosenthal, R. and L. Jacobson. 1968. Pygmalion in the
Loewenstein, George. 1999. "Experimental Economics Classroom. NY: Holt, Rhinehart & Winston.
Harrison and List: Field Experiments 1055
Rosenzweig, Mark R. and Kenneth I. Wolpin. 2000. Smith, Kip; John Dickhaut, Kevin McCabe and Jos6 V.
"Natural 'Natural Experiments' in Economics," J. Pardo. 2002. "Neuronal Substrates for Choice under
Econ. Lit. 38:4, pp. 827-74. Ambiguity, Risk, Gains, and Losses," Manage. Sci.
Roth, Alvin E. 1991. "A Natural Experiment in the 48:6, pp. 711-18.
Organization of Entry-Level Labor Markets: Smith, V. Kerry and Laura Osborne. 1996. "Do
Regional Markets for New Physicians and Surgeons Contingent Valuation Estimates Pass a Scope Test? A
in the United Kingdom," Amer. Econ. Rev. 81:3, pp. Meta Analysis," J. Environ. Econ. Manage. 31, pp.
415-40. 287-301.
Roth, Alvin E. and Michael W K. Malouf. 1979. Smith, Vernon L. 1962. "An Experimental Study of
"Game-Theoretic Models and the Role of Competitive Market Behavior,"J. Polit. Econ. 70, pp.
Information in Bargaining," Psych. Rev. 86, pp. 111-37.
574-94. . 1982. "Microeconomic Systems as an
Roth, Alvin E.; Vesna Prasnikar, Masahiro Okuno- Experimental Science," Amer. Econ. Rev. 72:5, pp.
Fujiwara and Shmuel Zamir. 1991. "Bargaining and 923-55.
Market Behavior in Jerusalem, Ljubljana, . 2003. "Constructivist and Ecological
Pittsburgh, and Tokyo: An Experimental Study," Rationality in Economics," Amer Econ. Rev. 93:3,
Amer. Econ. Rev. 81:5, pp. 1068-95. pp. 465-508.
Rowe, R. D.; W. Schulze, W. D. Shaw, L. D. Chestnut Smith, Vernon L.; G. L. Suchanek and Arlington W.
and D. Schenk. 1991. "Contingent Valuation of Williams. 1988. "Bubbles, Crashes, and Endogenous
Natural Resource Damage Due to the Nestucca Oil Expectations in Experimental Spot Asset Markets,"
Spill," report to British Columbia Ministry of Econometrica 56, pp. 1119-52.
Environment. Sorenson, Roy A. 1992. Thought Experiments. NY:
Rutstrim, E. Elisabet. 1998. "Home-Grown Values Oxford U. Press.
and the Design of Incentive Compatible Auctions," Starmer, Chris. 1999. "Experiments in Economics:
Int. J. Game Theory27:3, pp. 427-41. Should We Trust the Dismal Scientists in White
Rutstr6m, E. Elisabet and Melonie B. Williams. 2000. Coats?"J. Exper. Method. 6, pp. 1-30.
"Entitlements and Fairness: An Experimental Study Sunstein, Cass R.; Reid Hastie, John W. Payne, David
of Distributive Preferences," J. Econ. Behav. Org. A. Schkade and W Kip Viscusi. 2002. Punitive
43, pp. 75-80. Damages: How Juries Decide. Chicago: U. Chicago
Sanfey, Alan G.; James K. Rilling, Jessica A. Aronson, Press.
Leigh E. Nystrom and Jonathan D. Cohen. 2003. Tenorio, Rafael and Timothy Cason. 2002. "To Spin or
"The Neural Basis of Economic Decision-Making in Not To Spin? Natural and Laboratory Experiments
the Ultimatum Game," Science 300:5626, pp. from The Price is Right," Econ. J. 112, pp. 170-95.
1755-58. Warner, John T. and Saul Pleeter. 2001. "The Personal
Slonim, Robert and Alvin E. Roth. 1998. "Learning in Discount Rate: Evidence from Military Downsizing
High Stakes Ultimatum Games: An Experiment in Programs," Amer. Econ. Rev. 91:1, pp. 33-53.
the Slovak Republic," Econometrica 66:3, pp. Wierzbicka, Anna. 1996. Semantics: Primes and
569-96. Universals. NY: Oxford U. Press.
Smith, Jeffrey and Petra Todd. 2000. "Does Matching Winkler, Robert L. and Allan H. Murphy. 1973.
Address LaLonde's Critique of Nonexperimental "Experiments in the Laboratory and the Real
Estimates?" unpub. man., Dept. Econ. U. Western World," Org. Behav. Human Perform. 10, pp.
Ontario. 252-70.