(Harrison List, JEL 2004) Field Experiments

American Economic Association
Field Experiments
Author(s): Glenn W. Harrison and John A. List
Source: Journal of Economic Literature, Vol. 42, No. 4 (Dec., 2004), pp. 1009-1055
Published by: American Economic Association
Stable URL: http://www.jstor.org/stable/3594915
Accessed: 02/12/2009 12:53
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=aea.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Economic Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of Economic Literature.
http://www.jstor.org
Journal of Economic Literature
Vol. XLII (December 2004) pp. 1009-1055
Field Experiments
GLENN W. HARRISONand JOHNA. LIST1
1. Introduction experimental environment. We do not see

the notion of a "sterile environment" as a
n some sense everyempiricalresearcheris
negative, provided one recognizes its role in
reporting the results of an experiment. the research discovery process. In one
Every researcher who behaves as if an exoge- sense, that sterility allows us to see in crisp
nous variable varies independently of an error
relief the effects of exogenous treatments on
term effectively views their data as coming
behavior. However, lab experiments in isola-
from an experiment. In some cases this belief
tion are necessarily limited in relevance for
is a matter of a priori judgement; in some
cases it is based on auxiliary evidence and predicting field behavior, unless one wants
to insist a priori that those aspects of eco-
inference; and in some cases it is built into the nomic behavior under study are perfectly
design of the data collection process. But the
distinction is not always as bright and clear. general in a sense that we will explain.
Rather, we see the beauty of lab experi-
Testing that assumption is a recurring difficul- ments within a broader context-when they
ty for applied econometricians, and the search are combined with field data, they permit
always continues for variables that might bet-
ter qualify as truly exogenous to the process sharper and more convincing inference.2
In search of greater relevance, experi-
under study. Similarly,the growing popularity
mental economists are recruiting subjects in
of explicit experimental methods arises in
the field rather than in the classroom, using
large part from the potential for constructing field goods rather than induced valuations,
the proper counterfactual.
and using field context rather than abstract
Field experimentsprovide a meeting ground
between these two broad approaches to
2 When we talk about
empirical economic science. By examining the combining lab and field data, we
nature of field experiments, we seek to make it do not just mean a summation of conclusions. Instead, we
have in mind the two complementing each other in some
a common ground between researchers. functional way, much as one might conduct several lab
We approach field experiments from the experiments in order to tease apart potential confounds.
For example, James Cox (2004) demonstrates nicely how
perspective of the sterility of the laboratory "trust"and "reciprocity"are often confounded with "other
regarding preferences," and can be better identified sep-
1 Harrison: Department of Economics, College of
arately if one undertakes several types of experiments
Business Administration, University of Central Florida; with the same population. Similarly, Alvin Roth and
List: Department of Agriculturaland Resource Economics Michael Malouf (1979) demonstrate how the use of dollar
and Department of Economics, University of Maryland, payoffs can confound tests of cooperative game theory
and NBER. We are grateful to Stephen Burks, Colin with less information of one kind (knowledge of the utili-
Camerer, Jeffrey Carpenter, Shelby Gerking, R. Mark ty function of the other player), and more information of
Isaac, Alan Krueger, John McMillan, Andreas Ortmann, another kind (the ability to make interpersonal compar-
Charles Plott, David Reiley, E. Elisabet Rutstrom, isons of monetary gain), than is usually assumed in the
NathanielWilcox,and the referees for generous comments. leading theoretical prediction.
1009
1010 Journal of Economic Literature, Vol. XLII (December 2004)
Our second point is that many of the char-

terminology in instructions.3 We argue that
there is something methodologically funda- acteristics of field experiments can be found
mental behind this trend. Field experiments in varying, correlated degrees in lab experi-
differ from laboratory experiments in many ments. Thus, many of the characteristics that
ways. Although it is tempting to view field people identify with field experiments are
not only found in field experiments, and
experiments as simply less controlled variants
of laboratoryexperiments, we argue that to do should not be used to differentiate them
so would be to seriously mischaracterizethem. from lab experiments.
What passes for "control"in laboratoryexper- Our third point, following from the first
iments might in fact be precisely the opposite two, is that there is much to learn from field
if it is artificialto the subject or context of the experiments when returning to the lab. The
task. In the end, we see field experiments as unexpected behaviors that occur when one
loosens control in the field are often indica-
being methodologically complementary to
traditional laboratoryexperiments.4 tors of key features of the economic transac-
Our primary point is that dissecting the tion that have been neglected in the lab. Thus,
characteristics of field experiments helps field experiments can help one design better
define what might be better called an ideal lab experiments, and have a methodological
role quite apart from their complementarity
experiment, in the sense that one is able to
observe a subject in a controlled setting but at a substantive level.
where the subject does not perceive any of In section 2 we offer a typology of field
the controls as being unnaturaland there is no experiments in the literature, identifying the
deception being practiced. At first blush, the key characteristics defining the species. We
idea that one can observe subjects in a natural suggest some terminology to better identify
different types of field experiments, or more
setting and yet have controls might seem
contradictory,but we will argue that it is not.5 accurately to identify different characteris-
tics of field experiments. We do not propose
3 We a bright line to define some experiments as
explain this jargon from experimental economics
below. field experiments and others as something
4 This view is hardly novel: for example, in decision
research, Robert Winkler and Allan Murphy (1973) pro- else, but a set of criteria that one would
vide an excellent account of the difficulties of reconciling expect to see in varying degrees in a field
suboptimal probabilityassessments in artefactuallaborato- experiment. We propose six factors that can
ry settings with field counterparts,as well as the limitations
of applying inferences from laboratorydata to the field. be used to determine the field context of an
Imagine a classroom setting in which the class breaks experiment: the nature of the subject pool,
up into smallertutorialgroups. In some groups a video cov- the nature of the information that the sub-
ering certain materialis presented, in another group a free
discussion is allowed, and in another group there is a more jects bring to the task, the nature of the com-
traditionallecture. Then the scores of the students in each modity, the nature of the task or trading rules
group are examined after they have taken a common exam. applied, the nature of the stakes, and the
Assumingthat all of the other featuresof the experimentare environment in which the subjects operate.
controlled, such as which student gets assigned to which
group,this experimentwould not seem unnaturalto the sub- Having identified what defines a field exper-
jects. They are all students doing what comes naturallyto iment, in section 3 we put experiments in
students,and these three teachingalternativesare each stan-
dardly employed. Along similar lines in economics, albeit general into methodological perspective, as
with simplertechnologyand less controlthan one might like, one of the ways that economists can identify
see EdwardDuddy (1924). For recent novel examplesin the treatment effects. This serves to remind us
economics literature,see Colin Camerer (1998) and David
Lucking-Reiley(1999). Camerer(1998) places bets at a race why we want control and internal validity in
trackto examine if asset marketscan be manipulated,while all such analyses, whether or not they consti-
Lucking-Reiley(1999) uses internet-basedauctionsin a pre- tute field experiments. In sections 4 through
existing market with an unknown number of participating 6 we describe strengths and weaknesses of
bidders to test the theory of revenue equivalence between
four majorsingle-unitauction formats. the broad types of field experiments. Our
Harrison and List: Field Experiments 1011
literature review is necessarily selective, material, language, animal, etc., and not in
although List (2004d) offers a more complete the laboratory, study, or office." This orients
bibliography. us to think of the natural environment of the
In sections 7 and 8 we review two types of different components of an experiment.6
experiments that may be contrasted with It is important to identify what factors
ideal field experiments. One is called a social make up a field experiment so that we can
experiment, in the sense that it is a deliber- functionally identify what factors drive
ate part of social policy by the government. results in different experiments. To provide
Social experiments involve deliberate, ran- a direct example of the type of problem that
domized changes in the manner in which motivated us, when List (2001) obtains
some government program is implemented. results in a field experiment that differ from
They have become popular in certain areas, the counterpart lab experiments of Ronald
such as employment schemes and the detec- Cummings, Glenn Harrison, and Laura
tion of discrimination. Their disadvantages Osborne (1995) and Cummings and Laura
have been well documented, given their Taylor (1999), what explains the difference?
political popularity, and there are several Is it the use of data from a particular market
important methodological lessons from those whose participants have selected into the
debates for the design of field experiments. market instead of student subjects; the use
The other is called a "naturalexperiment." of subjects with experience in related tasks;
The idea is to recognize that some event that the use of private sports-cards as the under-
naturally occurs in the field happens to have lying commodity instead of an environmen-
some of the characteristics of a field experi- tal public good; the use of streamlined
ment. These can be attractive sources of data instructions, the less-intrusive experimental
on large-scale economic transactions, but methods, mundane experimenter effects, or
usually at some cost due to the lack of con- is it some combination of these and similar
trol, forcing the researcher to make certain
identification assumptions. 6 If we are to examine the role of "controls"in different
experimental settings, it is appropriatethat this word also
Finally, in section 9 we briefly examine be defined carefully.The OED (2nd ed.) defines the verb
related types of experiments of the mind. In "control"in the following manner:"Toexercise restraintor
one case these are the "thought experi- direction upon the free action of; to hold sway over, exer-
ments" of theorists and statisticians, and in cise power or authorityover; to dominate, command." So
the word means something more active and intervention-
the other they are the "neuro-economics ist than is suggested by its colloquial clinical usage.
experiments" provided by technology. The Control can include such mundane things as ensuring ster-
ile equipment in a chemistry lab, to restrain the free flow
objective is simply to identify how they differ of germs and unwanted particles that might contaminate
from other types of experiments we consider, some test. But when controls are applied to human behav-
and where they fit in. ior, we are reminded that someone's behavior is being
restrained to be something other than it would otherwise
2. Defining Field Experiments be if the person were free to act. Thus we are immediate-
ly on alert to be sensitive, when studying responses from a
There are several ways to define words. controlled experiment, to the possibility that behavior is
One is to ascertain the formal definition by unusual in some respect. The reason is that the very con-
trol that defines the experiment may be putting the sub-
looking it up in the dictionary. Another is to ject on an artificial margin. Even if behavior on that
identify what it is that you want the word- margin is not different than it would be without the con-
label to differentiate. trol, there is the possibility that constraints on one margin
may induce effects on behavior on unconstrainedmargins.
The Oxford English Dictionary (Second This point is exactly the same as the one made in the "the-
Edition) defines the word "field" in the fol- ory of the second best" in public policy. If there is some
immutable constraint on one of the margins defining an
lowing manner: "Used attributively to optimum, it does not automaticallyfollow that removing a
denote an investigation, study, etc., carried constraint on another margin will move the system closer
out in the natural environment of a given to the optimum.
differences? We believe field experiments (McKinley Blackburn, Harrison, and

have matured to the point that some frame- Rutstr6m 1994). Alternatively, the subject
work for addressing such differences in a pool can be designed to represent a target
systematic manner is necessary. population of the economy (e.g., traders at
the Chicago Board of Trade in Michael
2.1 Criteriathat Define Field Experiments
Haigh and John List 2004) or the general
Running the risk of oversimplifying what population (e.g., the Danish population in
is inherently a multidimensional issue, we Harrison, Morton Igel Lau, and Melonie
propose six factors that can be used to deter- Williams 2002).
mine the field context of an experiment: In addition, nonstandard subject pools
* the nature of the subject pool, might bring experience with the commodity
* the nature of the information that the or the task to the experiment, quite apart
subjects bring to the task, from their wider array of demographic char-
* the nature of the commodity, acteristics. In the field, subjects bring cer-
* the nature of the task or trading rules tain information to their trading activities in
applied, addition to their knowledge of the trading
* the nature of the stakes, and institution. In abstract settings the impor-
* the nature of the environment that the tance of this information is diminished, by
subject operates in. design, and that can lead to behavioral
We recognize at the outset that these changes. For example, absent such informa-
characteristics will often be correlated to tion, risk aversion can lead to subjects
varying degrees. Nonetheless, they can be requiring a risk premium when bidding for
used to propose a taxonomy of field experi- objects with uncertain characteristics.
ments that will, we believe, be valuable as The commodity itself can be an impor-
comparisons between lab and field experi- tant part of the field. Recent years have
mental results become more common. seen a growth of experiments concerned
Student subjects can be viewed as the with eliciting valuations over actual goods,
standard subject pool used by experi- rather than using induced valuations over
menters, simply because they are a conven- virtual goods. The distinction here is
ience sample for academics. Thus when one between physical goods or actual services
goes "outdoors"and uses field subjects, they and abstractly defined goods. The latter
should be viewed as nonstandard in this have been the staple of experimental eco-
sense. But we argue that the use of nonstan- nomics since Edward Chamberlin (1948)
dard subjects should not automatically qual- and Vernon Smith (1962), but imposes an
ify the experiment as a field experiment. The artificiality that could be a factor influenc-
experiments of Cummings, Harrison, and E. ing behavior.7 Such influences are actually
Elizabet Rutstr6m (1995), for example, used of great interest, or should be. If the nature
individuals recruited from churches in order of the commodity itself affects behavior in
to obtain a wider range of demographic a way that is not accounted for by the the-
characteristics than one would obtain in the ory being applied, then the theory has at
standard college setting. The importance of best a limited domain of applicability that
a nonstandard subject pool varies from we should be aware of, and at worse is sim-
experiment to experiment: in this case it simply false. In either case, one can better
ply provided a less concentrated set of socio-
demographic characteristics with respect to 7 It is worth noting that neither Chamberlin (1948) nor
Smith (1962) used real payoffs to motivate subjects in their
age and education level, which turned out to market experiments, although Smith (1962) does explain
be important when developing statistical how that could be done and reports one experiment (fn 9.,
models to adjust for hypothetical bias p. 121) in which monetary payoffs were employed.
understandthe limitationsof the generality this is an importantcomponent of the inter-

of theory only via empirical testing.8 play between the lab and field. Early illus-
Again,however,just havingone field char- trationsof the value of this approachinclude
acteristic,in this case a physicalgood, does David Grether,R. MarkIsaac, and Charles
not constitutea field experimentin any fun- Plott [1981, 1989], Gretherand Plott [1984],
damental sense. Rutstr6m (1998) sold lots and James Hong and Plott [1982].
and lots of chocolate truffles in a laboratory The nature of the stakes can also affect
study of different auction institutions field responses. Stakes in the laboratory
designed to elicit values truthfully,but hers might be very different than those encoun-
was very much a lab experimentdespite the tered in the field, and hence have an effect
tastiness of the commodity. Similarly,Ian on behavior.If valuationsare takenseriously
Batemanet al. (1997) elicitedvaluationsover when they are in the tens of dollars,or in the
pizza and dessertvouchersfor a local restau- hundreds, but are made indifferentlywhen
rant. While these commodities were not the price is less thanone dollar,laboratoryor
actual pizza or dessert themselves, but field experimentswith stakesbelow one dol-
vouchers entitling the subject to obtain lar could easily engender imprecisebids. Of
them, they are not abstract.There are many course, people buy inexpensivegoods in the
other examplesin the experimentalliterature field as well, but the valuationprocess they
of designsinvolvingphysicalcommodities.9 use might be keyed to differentstakelevels.
The nature of the task that the subject is Alternatively,field experimentsin relatively
being asked to undertake is an important poor countriesoffer the opportunityto eval-
component of a field experiment,since one uate the effects of substantialstakeswithin a
would expect that field experience could given budget.
play a major role in helping individuals The environmentof the experiment can
develop heuristicsfor specific tasks.The lab also influence behavior. The environment
experimentsof John Kagel and Dan Levin can providecontextto suggeststrategiesand
(1999) illustratethis point, with "super-expe- heuristics that a lab setting might not. Lab
rienced" subjects behaving differentlythan experimenters have always wondered
inexperienced subjects in terms of their whether the use of classroomsmight engen-
propensityto fall prey to the winners'curse. der role-playingbehavior,and indeed this is
An importantquestion is whether the suc- one of the reasonsexperimentaleconomists
cessful heuristicsthat evolve in certain field are generally suspicious of experiments
settings "travel"to the other field and lab without salient monetary rewards. Even
settings (Harrisonand List 2003). Another with salient rewards,however,environmen-
aspect of the task is the specific parameteri- tal effects could remain. Rather than view
zation that is adopted in the experiment. them as uncontrolledeffects, we see them as
One can conduct a lab experiment with worthyof controlledstudy.
parametervalues estimated from the field 2.2 A ProposedTaxonomy
data, so as to study lab behaviorin a "field-
relevant" domain. Since theory is often Any taxonomyof field experimentsruns
domain-specific,andbehaviorcan alwaysbe, the riskof missingimportantcombinationsof
the factors that differentiate field experi-
8 To use the ments from conventional lab experiments.
example of Chamberlin (1948) again, List There is some value, however, in having
(2004e) takes the natural next step by exploring the pre-
dictive power of neoclassical theory in decentralized, nat- broad terms to differentiatewhat we see as
urally occurring field markets. the key differences.We proposethe following
9 We would exclude experiments in which the com-
modity was a gamble, since very few of those gambles take terminology:
the form of naturallyoccurring lotteries. * a conventional lab experiment is one
that employs a standard subject pool of extent of discrimination in the sports-card

students, an abstract framing, and an marketplace.
imposed10 set of rules;
* an artefactual field experiment is the 3. MethodologicalImportanceof Field
same as a conventional lab experiment Experiments
but with a nonstandard subject pool;11 Field experiments are methodologically
* aframed field experiment is the same as
important because they mechanically force
an artefactual field experiment but with the rest of us to pay attention to issues that
field context in either the commodity,
great researchers seem to intuitively
task, or information set that the subjects address. These issues cannot be comfortably
can use;2
forgotten in the field, but they are of more
* a naturalfield experiment is the same as
general importance.
a framed field experiment but where The goal of any evaluation method for
the environment is one where the sub- "treatment effects" is to construct the prop-
jects naturally undertake these tasks er counterfactual, and economists have
and where the subjects do not know
spent years examining approaches to this
that they are in an experiment.l3
problem. Consider five alternative methods
We recognize that any such taxonomy of constructing the counterfactual: con-
leaves gaps, and that certain studies may not trolled experiments, natural experiments,
fall neatly into our classification scheme.
propensity score matching (PSM), instru-
Moreover, it is often appropriate to con- mental variables (IV) estimation, and struc-
duct several types of experiments in order to tural approaches. Define y, as the outcome
identify the issue of interest. For example, with treatment, Yo as the outcome without
Harrison and List (2003) conducted artefac- treatment, and let T=l when treated and
tual field experiments and framed field T=0 when not treated.14 The treatment
experiments with the same subject pool, pre- effect for unit i can then be measured as
cisely to identify how well the heuristics that ti=yi--yio. The major problem, however, is
might apply naturally in the latter setting one of a missing counterfactual: t, is
"travel"to less context-ridden environments unknown. If we could observe the outcome
found in the former setting. And List (2004b) for an untreated observation had it been
conducted artefactual, framed, and natural treated, then there is no evaluation problem.
experiments to investigate the nature and "Controlled" experiments, which include
laboratory experiments and field experi-
10The fact that the rules are
imposed does not imply ments, represent the most convincing
that the subjects would reject them, individuallyor social- method of creating the counterfactual, since
ly, if11allowed. they directly construct a control group via
To offer an early and a recent example, consider the
risk-aversionexperiments conducted by Hans Binswanger randomization.15 In this case, the population
(1980, 1981) in India, and Harrison, Lau, and Williams
(2002), who took the lab experimental design of Maribeth
Coller and Melonie Williams (1999) into the field with a 14 We simplify by considering a binary treatment, but
representative sample of the Danish population. the logic generalizes easily to multiple treatment levels and
12 For example, the experiments of Peter Bohm continuous treatments. Obvious examples from outside
(1984b) to elicit valuations for public goods that occurred economics include dosage levels or stress levels. In eco-
naturally in the environment of subjects, albeit with nomics, one might have some measure of risk aversion or
unconventional valuationmethods; or the Vickreyauctions "other regardingpreferences" as a continuous treatment.
and "cheap talk" scripts that List (2001) conducted with 15Experimentsare often run in which the controlis pro-
sport-cardcollectors, using sports cards as the commodity vided by theory,and the objective is to assess how well the-
and at a show where they trade such commodities. ory matches behavior. This would seem to rule out a role
13For example, the manipulationof betting marketsby for randomization,until one recognizes that some implicit
Camerer (1998) or the solicitation of charitable contribu- or explicit error structure is required in order to test theo-
tions by List and Lucking-Reiley (2002). ries meaningfully.We returnto this issue in section 8.
average treatment effect is given by individualswith the samevalue for these fac-
T=y* -y*o, where y*J and y*0are the treat- tors will display homogenous responses to
ed and nontreated average outcomes after the treatment,then the treatmenteffect can
the treatment. We have much more to say be measuredwithoutbias. In effect, one can
about controlled experiments,in particular use statisticalmethodsto identifywhich two
field experiments,below. individualsare "morehomogeneouslab rats"
"Naturalexperiments"consider the treat- for the purposesof measuringthe treatment
ment itself as an experimentand find a natu- effect. More formally,the solutionadvocated
rally occurringcomparisongroup to mimic is to find a vector of covariates,Z, such that
the controlgroup:T is measuredby compar- y,,y0 I T | Z and pr(T=l IZ) e (0,1), where
ing the difference in outcomes before and I denotes independence.6
after for the treated group with the before Another alternativeto the DID model is
and afteroutcomesfor the nontreatedgroup. the use of instrumentalvariables(IV),which
Estimationof the treatmenteffect takes the approaches the structural econometric
form Yit=Xit3+ Tit+lit, where i indexesthe methodin the sense thatit relieson exclusion
unit of observation,t indexesyears,Yitis the restrictions (Joshua D. Angrist, Guido W.
outcome in cross-sectioni at time t, Xitis a Imbens, and Donald B. Rubin 1996; and
vector of controls, Tit is a binary variable, JoshuaD. AngristandAlanB. Krueger2001).
lit=a,+ t+£it, and t is the difference-in-dif- The IV method, which essentiallyassumes
ferences (DID) averagetreatmenteffect. If that some components of the non-experi-
we assume that data exists for two periods, mentaldata are random,is perhapsthe most
then t=(ytt-yl y*to)-(y*ti -y*tO) where, widely utilized approachto measuringtreat-
for example, yt*t is the mean outcome for ment effects (MarkRosenzweigandKenneth
the treatedgroup. Wolpin2000). The cruxof the IV approachis
A major identifying assumption in DID to find a variablethat is excluded from the
estimationis that there are no time-varying, outcome equation, but which is related to
unit-specificshocksto the outcome variable treatmentstatusandhas no directassociation
that are correlated with treatment status, with the outcome. The weakness of the IV
and that selection into treatment is inde- approachis that such variablesdo not often
pendent of temporary individual-specific exist, or that unpalatableassumptionsmust
effect: E(rlit I Xit,Dit)=E(oi I Xit,Dit)+. If be maintainedin orderfor them to be used to
Eit,and T are related, DID is inconsistently identifythe treatmenteffect of interest.
estimated as E(t)=X+ E(£it1-£ D=1) A final alternativeto the DID model is
-E(Eitl-ito D=0). structuralmodeling.Suchmodelsoften entail
One alternativemethod of assessing the a heavy mix of identifyingrestrictions(e.g.,
impact of the treatment is the method of
propensityscore matching(PSM)developed 16If one is interested in
in P. Rosenbaumand Donald Rubin(1983). estimating the average treat-
ment effect, only the weaker condition E(yolT=l,
This method has been used extensively in Z)=E(yoIT=O,Z)=E(yo IZ) is required. This assumptionis
the debate over experimentaland nonexper- called the "conditional independence assumption," and
imentalevaluationof treatmenteffects initi- intuitively means that given Z, the nontreated outcomes
are what the treated outcomes would have been had they
ated by Lalonde (1986): see RajeevDehejia not been treated. Or, likewise, that selection occurs only
and Sadek Wahba(1999, 2002) and Jeffrey on observables. Note that the dimensionality of the prob-
lem, as measured by Z, may limit the use of matching. A
Smith and Petra Todd (2000). The goal of more feasible alternative is to match on a function of Z.
PSM is to makenon-experimentaldata"look Rosenbaum and Rubin (1983, 1984) showed that matching
like" experimental data. The intuition on p(Z) instead of Z is valid. This is usually carried out on
the "propensity"to get treated p(Z), or the propensity
behind PSM is that if the researcher can score, which in turn is often implemented by a simple pro-
select observable factors so that any two bit or logit model with T as the dependent variable.
separability), impose structure on technology could be applied to real people, but to actu-
and preferences (e.g., constant returns to ally do so entails some serious and often
scale or unitary income elasticities), and sim- unattractive logistical problems.19
plifying assumptions about equilibrium out- A more substantial response to this criti-
comes (e.g., zero-profit conditions defining cism is to consider what it is about students
equilibrium industrial structure). Perhaps the that is viewed, a priori, as being nonrepre-
best-known class of such structural models is sentative of the target population. There are
computable general equilibrium models, at least two issues here. The first is whether
which have been extensively applied to evalu- endogenous sample selection or attrition has
ate trade policies, for example.17 It typically occurred due to incomplete control over
relies on complex estimation strategies, but recruitment and retention, so that the
yields structural parameters that are well- observed sample is unreliable in some statis-
suited for ex ante policy simulation, provided tical sense (e.g., generating inconsistent esti-
one undertakes systematic sensitivity analysis mates of treatment effects). The second is
of those parameters.18In this sense, structur- whether the observed sample can be inform-
al models have been the cornerstone of non- ative on the behavior of the population,
experimental evaluation of tax and welfare assuming away sample selection issues.
policies (R. Blundell and Thomas MaCurdy 4.2 SampleSelectionin the Field
1999; and Blundell and M. Costas Dias 2002).
Conventional lab experiments typically
4. ArtefactualField Experiments use students who are recruited after being
told only general statements about the
4.1 The Nature of the SubjectPool experiment. By and large, recruitment pro-
A common criticism of the relevance of cedures avoid mentioning the nature of the
inferences drawn from laboratory experi- task, or the expected earnings. Most lab
ments is that one needs to undertake an experiments are also one-shot, in the sense
that they do not involve repeated observa-
experiment with "real"people, not students.
This criticism is often deflected by experi- tions of a sample subject to attrition. Of
menters with the following imperative: if you course, neither of these features is essential.
think that the experiment will generate differ- If one wanted to recruit subjects with specif-
ent results with "real"people, then go ahead ic interest in a task, it would be easy to do
and run the experiment with real people. A (e.g., Peter Bohm and Hans Lind 1993). And
variantof this response is to challenge the crit- if one wanted to recruit subjects for several
ics' assertion that students are not representa- sessions, to generate "super-experienced"
tive. As we will see, this variant is more subtle subjects20 or to conduct pre-tests of such
and constructive than the first response. things as risk aversion, trust, or "other-
The first response, to suggest that the crit- regarding preferences,"21 that could be built
ic run the experiment with real people, is into the design as well.
often adequate to get rid of unwanted refer- One concern with lab experiments con-
ees at academic journals. In practice, howev- ducted with convenience samples of students
er, few experimenters ever examine field
behavior in a serious and large-sample way. 19Or one can use "real"nonhuman
species: see John
It is relatively easy to say that the experiment Kagel, Don MacDonald, and Raymond Battalio (1990) and
Kagel, Battalio, and Leonard Green (1995) for dramatic
demonstrationsof the power of economic theory to organ-
17For
example, the evaluation of the Uruguay Round ize data from the animal kingdom.
of multilateral trade liberalization by Harrison, Thomas 20For example, John Kagel and Dan Levin (1986, 1999,
Rutherford,and David Tarr(1997). 2002).
18For 21
example, see Harrison and H.D. Vinod (1992). For example, Cox (2004).
is that students might be self-selected in allows one to remove this recruitmentbias

some way, so that they are a sample that from the resultinginference.
excludescertainindividualswith characteris- Some field experimentsface a more seri-
tics that are important determinants of ous problem of sample selection that
underlying population behavior. Although depends on the natureof the task.Once the
this problem is a severe one, its potential experimenthas begun, it is not as easy as it is
importance in practice should not be in the lab to control informationflow about
overemphasized.It is alwayspossible to sim- the nature of the task. This is obviously a
ply inspect the sampleto see if certainstrata matter of degree, but can lead to endoge-
of the population are not represented, at nous subject attritionfrom the experiment.
least underthe tentativeassumptionthat it is Such attritionis actually informativeabout
only observablesthat matter.In this case it subject preferences, since the subject'sexit
would behoove the researcherto augment from the experimentindicatesthat the sub-
the initial convenience sample with a quota ject had made a negative evaluation of it
sample,in which the missingstratawere sur- (TomasPhilipsonand LarryHedges 1998).
veyed. Thus one tends not to see many con- The classic problem of sample selection
victed mass murderersor brain surgeons in refers to possible recruitmentbiases, such
student samples, but we certainly know that the observed sample is generated by a
where to go if we feel the need to include process that depends on the nature of the
them in our sample. experiment.This problemcan be seriousfor
Another consideration, of increasing any experiment,since a hallmarkof virtually
importancefor experimenters,is the possi- every experiment is the use of some ran-
bility of recruitment biases in our proce- domization,typicallyto treatment.22If the
dures. One aspect of this issue is studied by populationfrom which volunteersare being
Rutstr6m (1998). She examines the role of recruitedhas diverseriskattitudesand plau-
recruitmentfees in biasing the samples of sibly expects the experiment to have some
subjects that are obtained. The context for element of randomization, then the
her experimentis particularlyrelevanthere observed sample will tend to look less risk-
since it entails the elicitationof values for a averse than the population. It is easy to
privatecommodity.She finds that there are imagine how this could then affect behavior
some significantbiases in the strata of the differentially in some treatments. James
population recruited as one varies the Heckman and Jeffrey Smith (1995) discuss
recruitmentfee fromzero dollarsto two dol- this issue in the context of social experi-
lars,and then up to ten dollars.An important ments, but the concern applies equally to
finding,however,is that most of those biases field and lab experiments.
can be correctedsimplyby incorporatingthe
4.3 Are StudentsDifferent?
relevantcharacteristicsin a statisticalmodel
of the behaviorof subjectsand thereby con- This question has been addressedin sev-
trollingfor them. In other words,it does not eral studies, includingearly artefactualfield
matterif one group of subjectsin one treat- experimentsby SarahLichtensteinand Paul
ment has 60 percent females and the other Slovic (1973), and Penny Burns (1985).
sample of subjectsin anothertreatmenthas Glenn Harrison and James Lesley (1996)
only 40 percent females, providedone con- (HL) approachthis question with a simple
trols for the differencein genderwhen pool- statistical framework.Indeed, they do not
ing the dataandexaminingthe key treatment consider the issue in terms of the relevance
effect. This is a situation in which gender
might influencethe responseor the effect of 22 If not to
treatment, then randomizationoften occurs
the treatment, but controlling for gender over choices to determine payoff.
of experimental methods, but rather in the subject was asked whether he or she
terms of the relevanceof convenience sam- would be willing to pay $X towardsa public
ples for the contingentvaluationmethod.23 good, where $X was randomly selected to
However,it is easy to see that their methods be $10, $30, $60, or $120. A subject would
applymuch more generally. respond to this question with a "yes," a
The HL approach may be explained in "no," or a "not sure." A simple statistical
terms of their attempt to mimic the results model is developed to explainbehavioras a
of a large-scale national survey conducted function of the observable socioeconomic
for the Exxon Valdez oil-spill litigation. A characteristics.24
majornationalsurveywas undertakenin this Assuming that a statistical model has
case by RichardCarsonet al. (1992) for the been developed, HL then proceeded to the
attorneygeneral of the state of Alaska.This key stage of their method. This is to assume
survey used then-state-of-the-art survey that the coefficient estimates from the statis-
methods but, more importantlyfor present tical model based on the student sample
purposes, used a full probabilitysample of apply to the population at large. If this is the
the nation. HL asked if one can obtain case, or if this assumption is simply main-
essentiallythe same results using a conven- tained, then the statistical model may be
ience sampleof studentsfrom the University used to predict the behavior of the target
of South Carolina.Using students as a con- population if one can obtain information
venience sample is largely a matter of about the socioeconomic characteristics of
methodologicalbravado.One could readily the target population.
obtain convenience samples in other ways, The essential idea of the HL method is
but using students provides a tough test of simple and more generally applicable than
their approach. this example suggests. If students are repre-
They proceeded by developing a simpler sentative in the sense of allowing the
surveyinstrumentthan the one used in the researcher to develop a "good" statistical
originalstudy.The purpose of this is purely model of the behavior under study, then
to facilitate completion of the surveyand is one can often use publicly available infor-
not essential to the use of the method. This mation on the characteristics of the target
surveywas then administeredto a relatively population to predict the behavior of that
large sample of students.An importantpart population. Their fundamental point is that
of the survey,as in anyfield surveythat aims the "problem with students" is the lack of
to control for subject attributes,is the col- variability in their socio-demographic char-
lection of a range of standardsocioeconom- acteristics, not necessarily the unrepresen-
ic characteristicsof the individual(e.g., sex, tativeness of their behavioral responses
age, income, parental income, household conditional on their socio-demographic
size, and marital status). Once these data characteristics.
are collated, a statisticalmodel is developed To the extent that student samples exhibit
in order to explainthe key responses in the limited variability in some key characteris-
survey. In this case the key response is a tics, such as age, then one might be wary of
simple "yes"or "no"to a single dichotomous the veracity of the maintained assumption
choice valuation question. In other words, involved here. However, the sample does not
have to look like the population in order for
23The the statistical model to be an adequate one
contingent valuationmethod refers to the use of
hypothetical field surveys to value the environment, by 24The exact form of that statisticalmodel is not
posing a scenario that asks the subject to place a value on impor-
an environmental change contingent on a market for it tant for illustrativepurposes, although the development of
existing. See Cummings and Harrison (1994) for a critical an adequate statistical model is important to the reliability
review of the role of experimental economics in this field. of this method.
for predicting the population response.25 All The reason is simple to understand. It is
that is needed is for the behavioral respons- much easier to predict the behavior of a 26-
es of students to be the same as the behav- year-old when one has a model that is based
ioral responses of nonstudents. This can on the behavior of people whose ages range
either be assumed a priori or, better yet, from 21 to 79 than it is to estimate the
tested by sampling nonstudents as well as behavior of a 69-year-old based on the
students. behavioral model from a sample whose ages
Of course, it is always better to be fore- range from 19 to 27.
What is the relevance of these methods for
casting on the basis of an interpolation
rather than an extrapolation, and that is the the original criticism of experimental proce-
most important problem one has with stu- dures? Think of the experimental subjects as
dent samples. This issue is discussed in some the convenience sample in the HL approach.
detail by Blackburn, Harrison, and Rutstrom The lessons that are learned from this stu-
(1994). They estimated a statistical model of dent sample could be embodied in a statisti-
cal model of their behavior, with implications
subject response using a sample of college
students and also estimated a statistical drawn for a larger target population.
model of subject response using field sub- Although this approach rests on an assump-
tion that is as yet untested, concerning the
jects drawn from a wide range of churches in
the same urban area. Each were conven- representativeness of student behavioral
ience samples. The only difference is that responses conditional on their characteris-
the church sample exhibited a much wider tics, it does provide a simple basis for evalu-
variability in their socio-demographic char- ating the extent to which conclusions about
acteristics. In the church sample, ages students apply to a broader population.
How could this method ever lead to inter-
ranged from 21 to 79; in the student sample,
ages ranged from 19 to 27. When predicting esting results? The answer depends on the
behavior of students based on the church- context. Consider a situation in which the
estimated behavioral model, interpolation behavioral model showed that age was an
was used and the predictions were extreme- important determinant of behavior. Consider
further a situation in which the sample used
ly accurate. In the reverse direction, howev-
er, when predicting church behavior from to estimate the model had an average age that
the student-estimated behavioral model, the was not representative of the population as a
whole. In this case, it is perfectly possible that
predictions were disastrous in the sense of
the responses of the student sample could be
having extremely wide forecast variances.26
quite different than the predicted responses
25 For of the population. Although no such instances
example, assume a population of 50 percent men
and 50 percent women, but where a sample drawn at ran- have appeared in the applications of this
dom happens to have 60 percent men. If responses differ method thus far,they should not be ruled out.
accordingto sex, predicting the population is simply a mat- We conclude, therefore, that many of the
ter of reweighting the survey responses.
26 On the other hand,
reporting large variances may be concerns raised by this criticism, while valid,
the most accurate reflection of the wide range of valua- are able to be addressed by simple exten-
tions held by this sample. We should not always assume
that distributions with smaller variances provide more sions of the methods that experimenters cur-
accurate reflections of the underlying population just rently use. Moreover, these extensions
because they have little dispersion; for this to be true, would increase the general relevance of
many auxiliaryassumptions about randomness of the sam-
pling process must be assumed, not to mention issues experimental methods obtained with student
about the stationarity of the underlying population convenience samples.
process. This stationarityis often assumed away in contin- Further problems arise if one allows unob-
gent valuation research (e.g., the proposal to use double- served individual effects to play a role. In
bounded dichotomous choice formatswithout allowing for
possible correlation between the two questions). some statistical settings it is possible to allow
for those effects by meansof"fixedeffect"or beauty. Again, the immediate implication is
"randomeffects" analyses. But these stan- to collect a standard battery of measures of
darddevices, now quite commonin the tool- individual characteristics to allow some sta-
kit of experimental economists, do not tistical comparisons of conditional treatment
address a deeper problem. The internal effects to be drawn.27 But even here we can
validityof a randomizeddesign is maximized only easily condition on observable charac-
when one knows that the samples in each teristics, and additional identifying assump-
treatmentare identical.This happy extreme tions will be needed to allow for correlated
leads many to infer that matching subjects differences in unobservables.
on a finite set of characteristicsmust be bet-
4.4 Precursors
ter in terms of internal validity than not
matchingthem on any characteristics. Several experimenters have used artefac-
But partial matching can be worse than tual field experiments; that is, they have
no matching.The most importantexample deliberately sought out subjects in the
of this is due to James Heckman and Peter "wild," or brought subjects from the "wild"
Siegelman (1993) and Heckman (1998), into labs. It is notable that this effort has
who critique paired-audittests of discrimi- occurred from the earliest days of experi-
nation. In these experiments,two applicants mental economics, and that it has only
for a job are matched in terms of certain recently become common.
observables,such as age, sex, and education, Lichtenstein and Slovic (1973) replicated
and differ in only one protected characteris- their earlier experiments on "preference
tic, such as race. However, unless some reversals" in "... a nonlaboratory real-play
extremely strong assumptions about how setting unique to the experimental litera-
characteristicsmap into wages are made, ture on decision processes-a casino in
there will be a predeterminedbias in out- downtown Las Vegas" (p. 17). The experi-
comes. The directionof the bias "depends," menter was a professional dealer, and the
and one cannot say much more. A metaphor subjects were drawn from the floor of the
from Heckman (1998, p. 110) illustrates: casino. Although the experimental equip-
Boys and girls of the same age are in a high- ment may have been relatively forbidding
jump competition, and jump the same (it included a PDP-7 computer, a DEC-339
height on average. But boys have a higher CRT, and a keyboard), the goal was to iden-
variancein theirjumpingtechnique, for any tify gamblers in their natural habitat. The
number of reasons.If the bar is set very low subject pool of 44 did include seven known
relativeto the mean, then the girls will look dealers who worked in Las Vegas, and the
like better jumpers; if the bar is set very "... dealer's impression was that the game
high then the boys will look like better attracted a higher proportion of profession-
jumpers. The implications for numerous al and educated persona than the usual
(lab and field) experimentalstudies of the casino clientele" (p. 18).
effect of gender, that do not control for Kagel, Battalio, and James Walker (1979)
other characteristics,should be apparent. provide a remarkable, early examination of
This metaphor also serves to remind us many of the issues we raise. They were con-
that what laboratoryexperimentersthink of cerned with "volunteer artifacts" in lab
as a "standardpopulation"need not be a experiments, ranging from the characteristics
homogeneous population. Although stu- that volunteers have to the issue of sample
dents from different campuses in a given
27
country may have roughly the same age, George Lowenstein (1999) offers a similar criticism
of the popular practice in experimental economics of not
they can differ dramaticallyin influential conditioning on any observable characteristicsor random-
characteristics such as intelligence and izing to treatment from the same population.
selection bias.28 They conducted a field reason for trade in this environment.30 The
experiment in the homes of the volunteer major empirical result is the large number of
subjects, examining electricity demand in observed price bubbles: fourteen of the 22
response to changes in prices, weekly feed- experiments can be said to have had some
back on usage, and energy conservation price bubble.
information. They also examined a compar- In an effort to address the criticism that
ison sample drawn from the same popula- bubbles were just a manifestation of using
tion, to check for any biases in the volunteer student subjects, Smith, Suchanek, and
sample. Williams (1988) recruited nonstudent sub-
Binswanger (1980, 1981) conducted jects for one experiment. As they put it, one
experiments eliciting measures of risk aver- experiment "... is noteworthy because of its
sion from farmers in rural India. Apart from use of professional and business people from
the policy interest of studying agents in the Tucson community, as subjects. This
developing countries, one stated goal of market belies any notion that our results are
using artefactual field experiments was to an artifact of student subjects, and that busi-
assess risk attitudes for choices in which the nessmen who 'run the real world' would
income from the experimental task was a quickly learn to have rational expectations.
substantial fraction of the wealth or annual This is the only experiment we conducted
income of the subject. The method he devel- that closed on a mean price higher than in all
oped has been used recently in conventional previous trading periods" (p. 1130-31). The
laboratory settings with student subjects by reference at the end is to the observation
Charles Holt and Susan Laury (2002). that the price bubble did not burst as the
Burns (1985) conducted induced-value finite horizon of the experiment was
market experiments with floor traders from approaching. Another notable feature of this
wool markets, to compare with the behav- price bubble is that it was accompanied by
ior of student subjects in such settings. The heavy volume, unlike the price bubbles
goal was to see if the heuristics and deci- observed with experienced subjects.31
sion rules these traders evolved in their Although these subjects were not students,
natural field setting affected their behavior. they were inexperienced in the use of the
She did find that their natural field rivalry double auction experiments. Moreover,
had a powerful motivating effect on their there is no presumption that their field expe-
behavior. rience was relevant for this type of asset
Vernon Smith, G. L. Suchanek, and market.
Arlington Williams (1988) conducted a large
series of experiments with student subjects 30There are
only two reasons players may want to trade
in an "asset bubble" experiment. In the 22 in this market. First, if players differ in their risk attitudes
then we might see the asset trading below expected divi-
experiments they report, nine to twelve dend value (since more-risk-averseplayers will pay less-
traders with experience in the double-auc- risk-averse players a premium over expected dividend
tion institution traded a number of fifteen value to take their assets). Second, if subjects have diverse
or thirty period assets with the same com- price expectations,we can expect trade to occur because of
expected capital gains. This second reason for trading
mon value distribution of dividends. If all (diverse price expectations) can actually lead to contract
subjects are risk neutral and have common prices above expected dividend value, provided some sub-
ject believes that there are other subjects who believe the
price expectations, then there would be no price will go even higher.
31 Harrison (1992b) reviews the detailed experimental
28
They also have a discussion of the role that these pos- evidence on bubbles, and shows that very few significant
sible biases play in social psychology experiments, and how bubbles occur with subjects who are experienced in asset
they have been addressed in the literature. market experiments in which there is a short-lived asset,
And either inexperienced, once experienced, or such as those under study. A bubble is significant only if
twice experienced in asset market trading. there is some nontrivialvolume associated with it.
Artefactual field experiments have also tasks, experience is generated in the field
made use of children and high school sub- and not the lab. These results provide sup-
jects. For example, William Harbaugh and port for the notion that context-specific
Kate Krause (2000), Harbaugh, Krause, and experiencedoes appearto carryover to com-
Timothy Berry (2001), and Harbaugh, parable settings, at least with respect to
Krause, and Lise Vesterlund (2002) explore these types of auctions.
other-regarding preferences, individual This experimentaldesign emphasizesthe
rationality, and risk attitudes among children identificationof a naturallyoccurringsetting
in school environments. in which one can control for experience in
Joseph Henrich (2000) and Henrich and the way that it is accumulatedin the field.
Richard McElreath (2002), and Henrich et Experienced traders gain experience over
al. (2001, 2004) have even taken artefactual time by observingand survivinga relatively
field experiments to the true "wilds" of a wide range of trading circumstances. In
number of peasant societies, employing the some settings this might be proxied by the
procedures of cultural anthropology to mannerin which experiencedor super-expe-
recruit and instruct subjects and conduct rienced subjectsare defined in the lab, but it
artefactual field experiments. Their focus remainson open questionwhether standard
was on the ultimatum bargaining game and lab settings can reliably capture the full
measures of risk aversion. extent of the field counterpartof experience.
This is not a criticism of lab experiments,
just their domainof applicability.
5. Framed Field Experiments The methodologicallessonwe drawis that
5.1 The Nature of the Information Subjects
one shouldbe carefulnot to generalizefrom
the evidence of a winner'scurse by student
Already Have
subjects that have no experience at all with
Auction theory provides a rich set of pre- the field context.These resultsdo not imply
dictions concerning bidders' behavior. One that every field contexthas experiencedsub-
particularly salient finding in a plethora of jects, such as professionalsports-carddeal-
laboratory experiments that is not predicted ers, that avoid the winner'scurse. Instead,
in first-price common-value auction theory they point to a more fundamentalneed to
is that bidders commonly fall prey to the consider the field context of experiments
winner's curse. Only "super-experienced" before drawinggeneralconclusions.It is not
subjects, who are in fact recruited on the the case that abstract, context-freeexperi-
basis of not having lost money in previous ments provide more generalfindings if the
experiments, avoid it regularly. This would context itself is relevant to the performance
seem to suggest that experience is a suffi- of subjects. In fact, one would generally
cient condition for an individual bidder to expect such context-freeexperimentsto be
avoid the winner's curse. Harrison and List unusually tough tests of economic theory,
(2003) show that this implication is support- since there is no controlfor the contextthat
ed when one considers a natural setting in subjects might themselves impose on the
which it is relatively easy to identify traders abstractexperimentaltask.
that are more or less experienced at the task. The main result is that if one wants to
In their experiments the experience of sub- drawconclusionsaboutthe validityof theory
jects is either tied to the commodity, the val- in the field, then one must pay attentionto
uation task, and the use of auctions (in the the myriadof waysin whichfield contextcan
field experiments with sports cards), or sim- affect behavior.We believe that convention-
ply to the use of auctions (in the laboratory al lab experiments,in which roles are exoge-
experiments with induced values). In all nously assigned and defined in an abstract
manner,cannotubiquitouslyprovidereliable rather than abstract commodities, is not

insights into field behavior.One might be unique to the field, nor does one have to
able to modifythe lab experimentaldesignto eschew experimenter-inducedvaluationsin
mimicthose field contextsmore reliably,and the field. But the use of real goods does have
that would make for a more robust applica- consequences that apply to both lab and
tion of the experimentalmethod in general. field experiments.32
Consider, as an example, the effect of Abstraction Requires Abstracting. One
"insiders" on the market phenomenon simple example is the Tower of Hanoi
known as the "winner'scurse."For now we game, which has been extensively studied
define an insider as anyone who has better by cognitive psychologists(e.g., J. R. Hayes
informationthan other marketparticipants. and H. A. Simon 1974) and more recently
If insidersare present in a market,then one by economists (Tanga McDaniel and
mightexpect that the prevailingprices in the Rutstr6m2001) in some fascinatingexperi-
marketwill reflect their better information. ments. The physical form of the game, as
This leads to two general questions about found in all serious Montessori classrooms
market performance.First, do insiders fall and in Judea Pearl (1984, p. 28), is shownin
prey to the winner'scurse? Second, does the figure 1.
presence of insiders mitigate the winner's The top picture shows the initial state, in
curse for the marketas a whole? which n disks are on peg 1. The goal is to
The approach adopted by Harrison and move all of the disks to peg 3, as shown in
List (2003) is to undertakeexperimentsin the goal state in the bottom picture. The
naturallyoccurringsettingsin which thefac- constraintsare that only one disk may be
tors that are at the heart of the theory are moved at a time, and no disk may ever lie
identifiable and arise endogenously, and undera biggerdisk.The objectiveis to reach
then to impose the remainingcontrolsneed- the goal state in the least number of moves.
ed to implementa clean experiment.In other The "trick"to solvingthe Towerof Hanoi is
words,ratherthanimpose all controlsexoge- to use backwardsinduction: visualize the
nously on a convenience sample of college final,goal state anduse the constraintsto fig-
students, they find a populationin the field ure out what the penultimate state must
in which one of the factorsof interest arises havelookedlike (viz.,the tiny diskon the top
naturally,where it can be identified easily, of peg 3 in the goal statewould haveto be on
and then add the necessarycontrols.To test peg 1 or peg 2 by itself). Then work back
their methodologicalhypotheses, they also fromthatpenultimatestate, againrespecting
implement a fully controlled laboratory the constraints(viz.,the second smallestdisk
experiment with subjects drawn from the on peg 3 in the goal state would have to be
same field population.We discuss some of on whicheverof peg 1 or peg 2 the smallest
their findingsbelow. diskis not on). One more step in reverseand
the essential logic should be clear (viz., in
5.2 The Natureof the Commodity
orderfor the thirdlargestdiskon peg 3 to be
Many field experiments involvereal,phys- off peg 3, one of peg 1 or peg 2 will have to
ical commoditiesand the valuesthat subjects be cleared,so the smallestdisk shouldbe on
place on them in their dailylives. This is dis- top of the second-smallestdisk).
tinct from the traditionalfocus in experi- Observation of students in Montessori
mental economics on experimenter-induced classroomsmakesit clear how they (eventu-
valuationson an abstractcommodity,often ally) solve the puzzle, when confrontedwith
referredto as "tickets"just to emphasizethe
lack of any field referent that might suggest 32 See
Harrison, Ronald Harstad, and Rutstrom (2004)
a valuation. The use of real commodities, for a general treatment.
Figure 1. The Tower of Hanoi Game
the initial state. They shockinglyviolate the induction,if attained,it is quite possiblethat
constraintsand move all the disksto the goal it posed an insurmountablecognitiveburden
state en masse, and then physically work for some of the experimentalsubjects.
backwards along the lines of the above It mightbe temptingto thinkof this asjust
thoughtexperimentin backwardsinduction. two separate tasks, instead of a real com-
The criticalpoint here is thatthey temporar- modity and its abstract analogue. But we
ily violate the constraintsof the problem in believe that this example does identify an
orderto solve it "properly." importantcharacteristicof commodities in
Contrastthis behaviorwith the laboratory ideal field experiments:the fact that they
subjectsin McDaniel and Rutstr6m(2001). allowsubjectsto adoptthe representationof
They were given a computerizedversion of the commodityand task that best suits their
the game and told to tryto solveit. However, objective.In otherwords,the representation
the computerizedversiondid not allowthem of the commodityby the subject is an inte-
to violate the constraints.Hence the labora- gral part of how the subject solves the task.
tory subjectswere unable to use the class- One simply cannot untangle them, at least
room Montessori method, by which the not easilyand naturally.
student learnsthe idea of backwardsinduc- This example also illustrates that off-
tion by exploringit with physicalreferents. equilibriumstates, in which one is not opti-
Thisis not a designflawof the McDanieland mizing in terms of the originalconstrained
Rutstr6m(2001) lab experiments,but simply optimization task, may indeed be critical
one factorto keep in mind when evaluating to the attainmentof the equilibriumstate.33
the behaviorof their subjects.Without the
physicalanalogueof the finalgoal statebeing 33This is quite distinct from the valid point made by
allowed in the experiment,the subject was Smith (1982, p. 934, fn. 17), that it is appropriateto design
the experimental institution so as to make the task as sim-
forced to visualize that state conceptually, ple and transparentas possible, providing one holds con-
and to likewise imagine conceptually the stant these design features as one compares experimental
treatments. Such may make the results of less
penultimate states. Although that may interest for those designs wanting to make field inferences, but
encourage more fundamental conceptual that is a trade-off that every theorist and experimenter
understanding of the idea of backwards faces to varyingdegrees.
Thus we should be mindfulof possible field reasons why homegrown values might be
devices that allow subjects to explore off- affiliatedin such experiments.
equilibriumstates, even if those states are The first is that the good being auctioned
ruled out in our null hypotheses. mighthavesome uncertainattributes,andfel-
Field GoodsHave Field Substitutes.There low biddersmighthave more or less informa-
are two respects in which "fieldsubstitutes" tionaboutthoseattributes.Dependingon how
play a role whenever one is conducting an one perceivesthe knowledgeof otherbidders,
experiment with naturally occurring, or observationof their bidding behavior35can
field, goods. We can refer to the former as affecta givenbidder'sestimateof the truesub-
the naturalcontextof substitutes,and to the jectivevalueto the extentthatthey changethe
latter as an artificial context of substitutes. bidder'sestimate of the lottery of attributes
The former needs to be capturedif reliable being auctioned.36Note that what is being
valuationsare to be elicited;the latterneeds 35The term "bidding behavior" is used to allow for
to be minimizedor controlled. information about bids as well as non-bids. In the repeat-
The first way in which substitutesplay a ed Vickrey auction it is the former that is provided (for
role in an experimentis the traditionalsense winners in previous periods). In the one-shot English auc-
tion it is the latter (for those who have not yet caved in at
of demand theory: to some individuals, a the prevailing price). Although the inferential steps in
bottle of scotch may substitute for a bible using these two types of information differ, they are each
when seeking peace of mind. The degree of informative in the same sense. Hence any remarksabout
the dangers of using repeated Vickrey auctions apply
substitutabilityhere is the stuff of individual equally to the use of English auctions.
demand elasticities, and can reasonablybe 36To see this
point, assume that a one-shot Vickreyauc-
tion was being used in one experiment and a one-shot
expectedto varyfromsubjectto subject.The English auction in another experiment. Large samples of
upshotof this considerationis, yet again,that subjects are randomlyassigned to each institution,and the
one should always collect information on commoditydiffers. Let the commoditybe somethingwhose
observable individual characteristics and quality is uncertain; an example used by Cummings,
Harrisonand Rutstrim (1995) and Rutstrim (1998) mightbe
controlfor them. a box of gourmetchocolate truffles.Amongstundergraduate
The second way in which substitutesplay studentsin South Carolina,these boxespresent somethingof
a role in an experiment is the more subtle a taste challenge. The box is not large in relation to those
found in more common chocolateproducts,and manyof the
issue of affiliationwhich arisesin lab or field students have not developed a taste for gourmet chocolates.
settingsthat involvepreferencesover a field A subject endowed with a diverse pallet is faced with an
uncertainlottery.If these arejust ordinarychocolatesdressed
good. To see this point, consider the use of up in a small box, then the true value to the subjectis small
repeated Vickreyauctionsin which subjects (say,$2). If they are indeed gourmetchocolatesthen the true
learn about prevailingprices. This resultsin value to the subject is much higher (say,$10). Assumingan
a loss of control, since we are dealing with equal chanceof either state of chocolate,the risk-neutralsub-
the elicitation of homegrownvalues rather ject wouldbid their true expectedvalue (in this example,$6).
In the Vickreyauction this subjectwill have an incentive to
than experimenter-inducedprivate values. write down her reservationprice for this lottery as described
To the extent that homegrown values are above. In the English auction,however,this subjectis able to
see a number of other subjectsindicate that they are willing
affiliated across subjects, we can expect an to pay reasonablyhigh sums for the commodity.Some have
effect on elicited valuesfrom using repeated not dropped out of the auction as the price has gone above
$2, and it is closing on $6. What should the subject do? The
Vickrey auctions rather than a one-shot answer depends criticallyon how knowledgeablehe thinks
Vickrey auction.34There are, in turn, two the other bidders are as to the qualityof the chocolates. If
those who have dropped out are the more knowledgeable
34The theoretical and experimental literature makes ones, then the correct inference is that the lottery is more
this point clearly by comparing real-time English auctions heavilyweighted towardsthese being common chocolates.If
with sealed-bid Vickrey auctions: see Paul Milgrom and those remainingin the auction are the more knowledgeable
Robert Weber (1982) and Kagel, Harstad, and Levin ones, however,then the oppositeinference is appropriate.In
(1987). The same logic that applies for a one-shot English the former case the real-time observationshould lead the
auction applies for a repeated Vickrey auction, even if the subject to bid lower than in the Vickreyauction, and in the
specific bidding opponents were randomlydrawnfrom the latter case the real-time observationshould lead the subject
population in each round. to bid higher than in the Vickreyauction.
affected here by this knowledge is the subject's 5.3 The Nature of the Task
best estimate of the subjective value of the
Who Cares If Hamburger Flippers Violate
good. The auction is still eliciting a truthful EUT? Who cares if a hamburger flipper vio-
revelation of this subjective value; it is just that
lates the independence axiom of expected
the subjective value itself can change with
information on the bidding behavior of others. utility theory in an abstract task? His job
The second reason that bids might be description, job evaluation, and job satisfac-
tion do not hinge on it. He may have left
affiliated is that the good might have some
some money on the table in the abstract task,
extra-experimental market price. Assuming but is there any sense in which his failure
transaction costs of entering the "outside"
market to be zero for a moment, information suggests that he might be poor at flipping
hamburgers?
gleaned from the bidding behavior of others Another way to phrase this point is to
can help the bidder infer what that market
actively recruit subjects who have experi-
price might be. To the extent that it is less ence in the field with the task being stud-
than the subjective value of the good, this
ied.39 Trading houses do not allow neophyte
information might result in the bidder delib-
pit-traders to deviate from proscribed limits,
erately bidding low in the experiment.37 The in terms of the exposure they are allowed. A
reason is that the expected utility of bidding
survival metric is commonly applied in the
below the true value is clearly positive: if
field, such that the subjects who engage in
lower bidding results in somebody else win-
certain tasks of interest have specific types of
ning the object at a price below the true value,
then the bidder can (costlessly) enter the out- training.
The relevance of field subjects and field
side market anyway. If lower bidding results
environments for tests of the winner's curse
in the bidder winning the object, and market
is evident from Douglas Dyer and Kagel
price and bids are not linked, then consumer (1996, p. 1464), who review how executives
surplus is greater than if the object had been in the commercial construction industry
bought in the outside market. Note that this avoid the winner's curse in the field:
argument suggests that subjects might have
an incentive to strategically misrepresent Two broad conclusions are reached. One is that
their true subjective value.38 the executives have learned a set of situation-
The upshot of these concerns is that specific rules of thumb which help them to avoid
unless one assumes that homegrown values the winner's curse in the field, but which could
not be applied in the laboratory markets. The
for the good are certain and not affiliated
second is that the bidding environment created
across bidders, or can provide evidence that in the laboratory and the theory underlying it
they are not affiliated in specific settings, are not fully representative of the field environ-
one should avoid the use of institutions that ment. Rather, the latter has developed escape
can have uncontrolled influences on esti- mechanisms for avoiding the winner's curse that
are mutually beneficial to both buyers and sell-
mates of true subjective value and/or the
ers and which have not been incorporated into
incentive to truthfully reveal that value. the standard one-shot auction theory literature.
These general insights motivated the

37 Harrison(1992a) makes this point in relation to some
previous experimental studies attempting to elicit home-
design of the field experiments of Harrison
grown values for goods with readily accessible outside 39
markets. The subjects may also have experience with the good
38 It is also
possible that information about likely out- being traded, but that is a separate matter worthy of study.
side market prices could affect the individual'sestimate of For example, List (2004c) had sports-card enthusiasts
true subjective value. Informal personal experience, albeit trade coffee mugs and chocolates in tests of loss aversion,
over a panel data set, is that higher-priced gifts seem to even though they had no experience in openly trading
elicit warmer glows from spouses and spousal-equivalents. those goods.
and List (2003), mentioned earlier. They played by dealers, they frequently fall prey
study the behavior of insiders in their field to the winner'scurse. We conclude that the
context, while controlling the "rules of the theory predicts field behavior well when
game" to make their bidding behavior fall one is able to identify naturallyoccurring
into the domain of existing auction theory. In field counterparts to the key theoretical
this instance, the term "field context" means conditions.
the commodity with which the insiders are At a more general level, consider the
familiar, as well as the type of bidders they argument that subjects who behave irra-
normally encounter. tionally could be subjected to a "money-
This design allows one to tease apart the pump"by some arbitragerfrom hell. When
two hypotheses implicit in the conclusions of we explain transitivity of preferences to
Dyer and Kagel (1996). If these insiders fall undergraduates, the common pedagogy
prey to the winner's curse in the field exper- includes stories of intransitive subjects
iment, then it must be40 that they avoid it by mindlesslycycling foreverin a series of low-
using market mechanisms other than those cost trades.If these cycles continue,the sub-
under study. The evidence is consistent with ject is pumped of money until bankrupt.In
the notion that dealers in the field do not fall fact, the absence of such phenomena is
prey to the winner's curse in the field exper- often taken as evidence that contracts or
iment, providing tentative support for the marketsmust be efficient.
hypothesis that naturally occurring markets There are several reasons why this may
are efficient because certain traders use not be true. First, it is only when certain
heuristics to avoid the inferential error that consistencyconditionsare imposed that suc-
underlies the winner's curse. cessful money-pumpsprovidea generalindi-
This support is only tentative, however, cator of irrationality,defeatingtheir use as a
because it could be that these dealers have sole indicator (Robin Cubitt and Robert
developed heuristics that protect them Sugden2001).
from the winner's curse only in their spe- Second, and germane to our concern
cialized corner of the economy. That would with the field, subjects might have devel-
still be valuable to know, but it would mean oped simple heuristics to avoid such
that the type of heuristics they learn in their money-pumps:for example, never retrade
corner are not general and do not transfer the same objects with the same person.4
to other settings. Hence, the complete As John Conlisk (1996, p. 684) notes,
design also included laboratory experi- "Rules of thumb are typically exploitable
ments in the field, using induced valuations by 'tricksters,'who can in principle 'money
as in the laboratory experiments of Kagel pump' a person using such rules. ...
and Levin (1999), to see if the heuristic of Although tricksters abound-at the door,
insiders transfers. We find that it does when on the phone, and elsewhere-people can
they are acting in familiar roles, adding fur- easily protect themselves, with their
ther support to the claim that these insiders pumpable rules intact, by such simple
have indeed developed a heuristic that devices as slamming the door and hanging
"travels"from problem domain to problem up the phone. The issue is again a matter
domain. Yet when dealers are exogenously of circumstance and degree." The last
provided with less information than their point is important for our argument-only
bidding counterparts, a role that is rarely when the circumstance is natural might
40 This inference follows if one assumes that a dealer's 41 Slightly more complex heuristics work against arbi-
survivalin the industryprovides sufficient evidence that he tragers from meta-hell who understand that this simple
does not make persistent losses. heuristic might be employed.
one reasonably expect the subject to be the sense of knowingwhat actions are feasi-
able to call upon survival heuristics that ble and what the consequences of different
protect against such irrationality. To be actions might be, then control has been lost
sure, some heuristics might "travel,"and at a basic level. In cases where the subject
that was precisely the research question understandsall the relevant aspects of the
examined by Harrisonand List (2003) with abstract game, problems may arise due to
respect to the dreaded winner's curse. But the triggeringof different methods for solv-
they might not; hence we might have sight- ing the decision problem. The use of field
ings of odd behavior in the lab that would referents could trigger the use of specific
simply not arise in the wild. heuristics from the field to solve the specif-
Third, subjects might behave in a non- ic problem in the lab, which otherwise may
separable manner with respect to sequen- have been solved less efficiently from first
tial decisions over time, and hence avoid principles (e.g., Gerd Gigerenzer et al.
the pitfalls of sequential money pumps 2000). For either of these reasons-a lack
(Mark Machina 1989; and Edward of understandingof the task or a failure to
McClennan 1990). Again, the use of such apply a relevant field heuristic-behavior
sophisticated characterizations of choices may differ between the lab and the field.
over time might be conditionalon the indi- The implication for experimentaldesign is
vidual having familiaritywith the task and to just "do it both ways,"as arguedby Chris
the consequences of simpler characteriza- Starmer(1999) and Harrisonand Rutstrom
tions, such as those employing intertempo- (2001). Experimentaleconomists should be
ral additivity.It is an open question if the willing to consider the effect in their exper-
richer characterization that may have iments of scripts that are less abstract,but
evolved for familiarfield settings travels to in controlled comparisonswith scripts that
other settings in which the individual has are abstract in the traditional sense.
less experience. Nevertheless, it must also be recognized
Our point is that one should not assume that inappropriatechoice of field referents
that heuristicsor sophisticatedcharacteriza- may trigger uncontrolled psychological
tions that have evolvedfor familiarfield set- motivations. Ultimately, the choice
tings do travel to the unfamiliar lab. If they between an abstract script and one with
do exist in the field, and do not travel,then field referents must be guided by the
evidence from the lab might be misleading. research question.
"Context"Is Not a Dirty Word. One tra- This simple point can be made more
dition in experimentaleconomics is to use forcefully by arguing that the passion for
scripts that abstract from any field counter- abstractscriptsmayin fact resultin less con-
part of the task. The reasoning seems to be trol than context-riddenscripts.It is not the
that this might contaminate behavior, and case that abstract,context-freeexperiments
that any observed behavior could not then provide more general findingsif the context
be used to test general theories. There is itself is relevant to the performanceof sub-
logic to this argument, but context should jects. In fact, one would generally expect
not be jettisoned without careful consider- such context-freeexperimentsto be unusu-
ation of the unintended consequences. ally tough tests of economic theory, since
Field referents can often help subjects there is no controlfor the contextthat sub-
overcome confusion about the task. jects might themselves impose on the
Confusion may be present even in settings abstract experimentaltask. This is just one
that experimenters think are logically or partof a generalplea for experimentalecon-
strategicallytransparent. If the subject does omists to take the psychologicalprocess of
not understand what the task is about, in "taskrepresentation"seriously.
This general point has already emerged in the specific informationin the foreground
several areas of research in experimental eco- of the task (e.g., Ulric Neisser and Ira
nomics. Noticing large differences between Hyman 2000).42
contributions to another person and a charity At a more homely level, the "simple"
in between-subjects experiments that were choice of parameters can add significant
otherwise identical in structure and design, field context to lab experiments.The idea,
Catherine Eckel and Philip Grossman (1996, pioneered by Grether, Isaac, and Plott
p. 188ff.) drew the following conclusion: (1981, 1989), Gretherand Plott (1984), and
Hong and Plott (1982), is to estimateparam-
It is received wisdom in experimentaleconom- eters that are relevant to field applications
ics that abstractionis important.Experimental and take these into the lab.
procedures should be as context-freeas possi-
ble, and the interactionamong subjectsshould 5.4 The Natureof the Stakes
be carefullylimited by the rules of the experi-
ment to ensure that they are playingthe game One often hears the criticism that lab
we intend them to play. For tests of economic experimentsinvolve trivial stakes, and that
theory,these proceduralrestrictionsare critical. they do not provide information about
As experimenters,we aspireto instructionsthat
most closelymimicthe environmentsimplicitin
agents' behavior in the field if they faced
serious stakes, or that subjects in the lab
the theory, which is inevitably a mathematic
abstractionof an economic situation. We are experiments are only playing with "house
careful not to contaminateour tests by unnec- money."43The immediate response to this
essary context. But it is also possible to use
experimental methodology to explore the 42A healthy counter-lashing was offered by Mahzarin
importance and consequence of context.
Economists are becoming increasinglyaware Banaji and Robert Crowder (1989), who concede that
that social and psychologicalfactorscan only be needlessly artefactual designs are not informative. But
introduced by abandoning, at least to some they conclude that "we students of memory are just as
interested as anybody else in why we forget where we left
extent, abstraction.This may be particularly the car in the morning or in who was sitting across the
true for the investigation of other-regarding table at yesterday's meeting. Precisely for this reason we
behaviorin the economic arena. are driven to laboratoryexperimentation and away from
naturalistic observation. If the former method has been
disappointing to some after about 100 years, so should
Our point is simply that this should be a the latter approach be disappointing after about 2,000.
more general concern. Above all, the superficial glitter of everyday methods
should not be allowed to replace the quest for generalizable
Indeed, research in memory reminds us principles."(p. 1193).
that subjects will impose a natural context 43This problem is often confused with another issue:
on a task even if it literally involves "non- the validity and relevance of hypothetical responses in the
lab. Some argue that hypothetical responses are the only
sense." Long traditions in psychology, no
way that one can mimic the stakes found in the field.
doubt painful to the subjects, involved Conlisk (1989) runs an experiment to test the Allais
Paradoxwith small, real stakes and finds that virtually no
detecting how many "nonsense syllables" a
subjects violated the predictions of expected utility theory.
subject could recall. The logic behind the Subjects drawn from the same population did violate the
use of nonsense was that the researchers "originalrecipe" version of the Allais Paradoxwith large,
were not interested in the role of specific hypothetical stakes. Conlisk (1989; p. 401ff.) argues that
inferences from this evidence confound hypothetical
semantic or syntactic context as an aid to rewards with the reward scale, which is true. Of course,
memory, and in fact saw those as nuisance one could run an experiment with small, hypothetical
variables to be controlled by the use of ran- stakes and see which factor is driving this result. Chinn-
dom syllables. Such experiments generated Ping Fan (2002) did this, using Conlisk'sdesign, and found
that subjects given low, hypothetical stakes tended to
a backlash of sorts in memory research, avoid the Allais Paradox,just as his subjects with low, real
with many studies focusing instead on stakes avoided it. Many of the experiments that find viola-
tions of the Allais Paradox in small, real stake settings
memory within a natural context, in which embed these choices in a large number of tasks, which
cues and frames could be integrated with could affect outcomes.
point is perhapsobvious:increase the stakes exchange rates to the U.S. dollar prevailing
in the lab and see if it makes a difference at the time, these stakes were $1.90, $9.70,
(e.g., Elizabeth Hoffman, Kevin McCabe, and $48.40, respectively. In terms of average
and Vernon Smith (1996), or have subjects local monthly wages, they were equivalent to
earn their stakes in the lab (e.g., Rutstrom approximately 2.5 hours, 12.5 hours, and
and Williams2000; and List 2004a), or seek 62.5 hours of work, respectively.
out lab subjects in developing countries for They conclude that there was no effect
whom a given budget is a more substantial on initial offer behavior in the first round,
fraction of their income (e.g., Steven but that the higher stakes did have an
Kachelmeierand Mohamed Shehata 1992; effect on offers as the subjects gained
Lisa Cameron1999;and Robert Slonimand experience with subsequent rounds. They
AlvinRoth 1998). also conclude that acceptances were
Colin Camererand Robin Hogarth(1999) greater in all rounds with higher payoffs,
review the issues here, identifying many but that they did not change over time.
instancesin which increasedstakesare asso- Their experiment is particularly significant
ciated with improved performance or less because they varied the stakes by a factor
variationin performance.But they also alert of 25 and used procedures that have been
us to importantinstancesin which increased widely employed in comparable experi-
stakes do not improveperformance,so that ments.46 On the other hand, one might
one does not casuallyassume that there will question if there was any need to go to the
be such an improvement. field for this treatment. Fifty subjects
Taking the Stakes to Subjects Who Are dividing roughly $50 per game is only
RelativelyPoor One of the reasonsfor run- $1,250, and this is quite modest in terms of
ning field experimentsin poor countries is most experimental budgets. But fifty sub-
that it is easier to find subjectswho are rela- jects dividing the monetary equivalent of
tively poor. Such subjects are presumably 62.5 hours is another matter. If we assume
more motivatedby financialstakesof a given $10 per hour in the United States for
level than subjectsin richercountries. lower-skilled blue-collar workers or stu-
Slonim and Roth (1998) conducted bar- dents, that is $15,625, which is substantial
gainingexperimentsin the SlovakRepublic but feasible.47
to test for the effect of "high stakes" on Similarly, consider the "high payoff'
behavior.44The bargaininggame they stud- experiments from China reported by
ied entailsone person makingan offer to the Kachelmeir and Shehata (1992) (KS).
other person, who then decides whether to These involved subjects facing lotteries
accept it. Bargainingwas over a pie worth60 with prizes equal to 0.5 yuan, 1 yuan, 5
Slovak Crowns (Sk) in one session, a pie yuan, or 10 yuan, and being asked to state
worth 300 Sk in another session, and a pie certainty-equivalent selling prices using
worth 1500 Sk in a third session.45 At the "BDM" mechanism due to Gordon
Becker, Morris DeGroot, and Jacob
44Their subjects were students from universities, so
Marschak (1964). Although 10 yuan only
one could question how "nonstandard"this population is.
But the design goal was to conduct the experiment in a converted to about $2.50 at the time of the
country in which the wage rates were low relative to the experiments, this represented a consider-
United States (p. 569), rather than simply conduct the able amount of purchasing power in that
same experiment with students from different countries as
in Roth et al. (1991).
45 Actually, the subjects bargained over points which
46 Harrison
(2005a) reconsiders their conclusions.
were simply converted to currency at different exchange 47 For July 2002 the Bureau of Labor Statistics estimat-
rates. This procedure seems transparent enough, and ed averageprivate sector hourlywages in the United States
served to avoid possible focal points defined over differing at $16.40, with white-collar workers earning roughly $4
cardinalranges of currency. more and blue-collar workers roughly $2 less than that.
region of China, as discussed by KS (p. has repeatedly stressed the importance of

1123). Their results support several conclu- recruiting subjects who have some field
sions. First, the coefficients for lotteries experience with the task or who have an
involving low win probabilities imply interest in the particular task. His experi-
extreme risk loving. This is perfectly plausi- ments have generally involved imposing
ble given the paltry stakes involved in such institutions on the subjects who are not
lotteries using the BDM elicitation proce- familiar with the institution, since the
dure. Second, "bad joss," as measured by objective of the early experiments was to
the fraction of random buying prices below study new ways of overcoming free-rider
the expected buying price of 50 percent of bias. But his choice of commodity has usu-
the prize, is associated with a large increase allybeen drivenby a desire to confrontsub-
in risk-loving behavior.48 Third, experience jects with stakes and consequences that are
with the general task increases risk aver- naturalto them. In other words, his experi-
sion. Fourth, increasing the prize from 5 ments illustrate how one can seek out sub-
yuan to 10 yuan increases risk aversion sig- ject pools for whom certain stakes are
nificantly. Of course, this last result is con- meaningful.
sistent with non-constant RRA, and should Bohm (1972) is a landmarkstudythat had
not be necessarily viewed as a problem a great impact on many researchersin the
unless one insisted on applying the same areas of field public-good valuation and
CRRA coefficient over these two reward experimentationon the extent of free-riding.
domains. The commoditywas a closed-circuitbroad-
Again, however, the question is whether cast of a new SwedishTV program.Six elic-
one needed to go to nonstandard popula- itation procedures were used. In each case
tions in order to scale up the stakes to draw except one, the good was produced, and
these conclusions. Using an elicitation pro- the group was able to see the program, if
cedure different than the BDM procedure, aggregate WTP (willingness to pay)
Holt and Laury (2002) undertake conven- equaled or exceeded a known total cost.
tional laboratory experiments in which they Every subject received SEK50 upon arrival
scale up stakes and draw the same conclu- at the experiment, broken down into stan-
sions about experience and stake size. Their dard denominations. Bohm employed five
scaling factors are generally twenty com- basic procedures for valuing his com-
pared to a baseline level, although they also modity.49No formal theory is provided to
conducted a handful of experiments with
factors as high as fifty and ninety. The over-
all cost of these scale treatments was 49 In Procedure I the
subject pays accordingto his stat-
$17,000, although $10,000 was sufficient for ed WTP. In Procedure II the subject pays some fractionof
their primary results with a scaling of twen- stated WTP,with the fractiondetermined equally for all in
the group such that total costs are just covered (and the
ty. These are not cheap experiments, but fractionis not greater than one). In Procedure III the pay-
budgets of this kind are now standard for ment scheme is unknown to subjects at the time of their
bid. In Procedure IV each subject pays a fixed amount. In
many experimenters. Procedure V the subject pays nothing. For comparison, a
Takingthe Taskto the SubjectsWho Care quite different Procedure VI was introduced in two stages.
About It. Bohm (1972; 1979; 1984a,b; 1994) The first stage, denoted VI:1, approximatesa CVM, since
nothing is said to the subject as to what considerations
would lead to the good being produced or what it would
48Although purely anecdotal, our own experience is cost him if it was produced. The second stage, VI:2,
that many subjects faced with the BDM task believe that involves subjects bidding againstwhat they think is a group
the buying price depends in some way on their selling of 100 for the right to see the program.This auction is con-
price. To mitigate such possible perceptions, we have ducted as a discriminative auction, with the ten highest
tended to use physical randomizing devices that are less bidders actually paying their bid and being able to see the
prone to being questioned. program.
generate free-riding hypotheses for these SEK500 in group 2 would be excluded from
procedures.50 The major result from enjoying the good.
Bohm's study was that bids were virtually In group 1 a subject has an incentive to
identical for all institutions, averaging understate only if he conjectures that the sum
between SEK7.29 and SEK10.33. of the contributions of others in his group is
Bohm (1984a) uses two procedures that greater than or equal to total cost minus his
elicit a real economic commitment, albeit true valuation. Total cost was known to be
under different (asserted) incentives for SEK200,000, but the contributions of (many)
free-riding. He implemented this experi- others must be conjectured. It is not possible
ment in the field with local government to say what the extent of free-riding is in this
bureaucrats bidding on the provision of a case without further information as to expec-
new statistical service from the Central tations that were not observed. In group 2
Bureau of Statistics.51 The two procedures only those subjects who actually stated a
are used to extract a lower and an upper WTP greater than or equal to SEK500 might
bound, respectively, to the true average have had an incentive to free-ride. Forty-nine
WTP for an actual good. Each agent in subjects reported exactly SEK500 in group 2,
group 1 was to state his individual WTP, and whereas 93 reported a WTP of SEK500 or
his actual cost would be a percentage of that higher. Thus the extent of free-riding in
stated WTP such that costs for producing group 2 could be anywhere from 0 percent (if
the good would be covered exactly. This per- those reporting SEK500 indeed had a true
centage could not exceed 100 percent. WTP of exactly that amount) to 53 percent
Subjects in group 2 were asked to state their (49 free-riders out of 93 possible free-riders).
WTP. If the interval estimated for total stat- The main result reported by Bohm (1984a)
ed WTP equaled or exceeded the (known) is that the average WTP interval between the
total cost, the good was to be provided and two groups was quite small. Group 1 had an
subjects in group 2 would pay only SEK500. average WTP of SEK827 and group 2 an
Subjects bidding zero in group 1 or below average WTP of SEK889, for an interval that
is only 7.5 percent of the smaller average
WTP of group 1. Thus the conclusion in this
50 Procedure I is deemed the most case must be that if free-riding incentives
likely to generate
strategic under-bidding (p. 113), and procedure V the were present in this experiment, they did not
most likely to generate strategic over-bidding. The other
make much of a difference to the outcome.
procedures, with the exception of VI, are thought to lie
somewhere in between these two extremes. Explicit admo- One can question, however, the extent to
nitions against strategic bidding were given to subjects in which these results generalize. The subjects
procedures I, II, IV, and V (see p. 119, 127-29). Although were representatives of local governments,
no theory is provided for VI:2, it can be recognized as a
multiple-unit auction in which subjects have independent and it was announced that all reported WTP
and private values. It is well-known that optimal bids for values would be published. This is not a fea-
risk-neutralagents can be well below the true valuation of
the agent in a Nash Equilibrium,and will never exceed the ture of most surveys used to study public pro-
true valuation (e.g., bidders truthfully reveal demand for grams, which often go to great lengths to
the first unit, but understate demand for subsequent units ensure subject confidentiality. On the other
to influence the price). Unfortunately there is insufficient
informationto be able to say how far below true valuations hand, the methodological point is clear: some
these optimal bids will be, since we do not know the con- subjects may simply care more about under-
jectured range of valuationsfor subjects. List and Lucking- taking certain tasks, and in many field set-
Reiley (2000) use a framed field experiment to test for
demand reduction in the field and find significantdemand tings this is not difficult to identify. For
reduction. example, Juan Cardenas (2003) collects
51 In addition, he conducted some comparable experi-
ments in a more traditionallaboratorysetting, albeit for a experimental data on common pool extrac-
tion from participants that have direct, field
non-hypothetical good (the viewing of a pilot of a TV
show). experience extracting from a common pool
resource. Similarly,Jeffrey Carpenter, Amrita We consider here two potentially important

Daniere, and Lois Takahashi (2003) conduct parts of the experimental environment: the
social dilemma experiments with urban slum physical place of the actual experiment, and
dwellers who face daily coordination and col- whether subjects are informed that they are
lective action problems, such as access to taking part in an experiment.
clean water and solid waste disposal. Experimental Site. The relationship
between behavior and the environmental
6. Natural Field Experiments context in which it occurs refers to one's
6.1 The Nature of the Environment physical surroundings (viz., noise level,
extreme temperatures, and architectural
Most of the stimuli a subject encounters in
design) as well as the nature of the human
a lab experiment are controlled. The labora- intervention (viz., interaction with the
tory, in essence, is a pristine environment experimental monitor). For simplicity and
where the only thing varied is the stressor in
concreteness, we view the environment as a
which one is interested.52 Indeed, some labo- whole rather than as a bundle of stimuli. For
ratory researchers have attempted to expunge example, a researcher interested in the
all familiar contextual cues as a matter of con- labels attached to colors may expose sub-
trol. This approach is similar to mid-twenti-
jects to color stimuli under sterile laborato-
eth-century psychologists who attempted to ry conditions (e.g., Brent Berlin and Paul
conduct experiments in "context-free" envi-
Kay 1969). A field experimenter, and any
ronments: egg-shaped enclosures where tem- artist, would argue that responses to color
peratures and sound were properly regulated stimuli could very well be different from
(Lowenstein 1999, p. F30). This approach those in the real world, where colors occur
omits the context in which the stressor is nor- in their natural context (e.g., Anna
mally considered by the subject. In the "real Wierzbicka 1996, ch. 10). We argue that, to
world" the individual is paying attention not
fully examine such a situation, the laborato-
only to the stressor, but also to the environ- ry should not be abandoned but supple-
ment around him and various other influ- mented with field research. Since it is often
ences. In this sense, individuals have natural difficult to maintain proper experimental
tools to help cope with several influences,
procedures in the field, laboratory work is
whereas these natural tools are not available often needed to eliminate alternatives and
to individuals in the lab, and thus the full to refine concepts.
effect of the stressor is not being observed. Of course, the emphasis on the interrelat-
An ideal field experiment not only increas- edness of environment and behavior should
es external validity, but does so in a manner not be oversold: the environment clearly
in which little internal validity is foregone.53 constrains behavior, providing varying
52 Of
course, the stressor could be an interaction of two options in some instances, and influences
treatments. behavior more subtly at other times.
53We do not like the expression "external validity." However, people also cope by changing their
What is valid in an experiment depends on the theoretical environments. A particular arrangement of
frameworkthat is being used to draw inferences from the
observed behavior in the experiment. If we have a theory space, or the number of windows in an
that (implicitly) says that hair color does not affect behav- office, may affect employee social interac-
ior, then any experiment that ignores hair color is valid tion. One means of changing interaction is to
from the perspective of that theory. But one cannot iden-
tify what factors make an experiment valid without some change the furniture arrangement or win-
priors from a theoretical framework,which is crossing into dow cardinality, which of course changes the
the turf of "internalvalidity."Note also that the "theory" environment's effect on the employees.
we have in mind here should include the assumptions
Environment-behavior relationships are
required to undertake statisticalinference with the exper-
imental data. more or less in flux continuously.
Experimental Proclamation. Whether sub- performed identically on achievement tests

jects are informed that they are taking part (R. Rosenthal and L. Jacobsen 1968), teachers'
in an experiment may be an important factor. expectations based on the labeling led to dif-
In physics, the Heisenberg Uncertainty ferences in student performance. Krueger
Principle reminds us that the act of meas- (1999) offers a dissenting view, arguing that
urement and observation alters that which is Hawthorne Effects are unlikely.
being measured and observed. In the study Project Star studied class sizes in Tennessee
of human subjects, a related, though distinct, schools. Teachers in the schools with smaller
concept is the Hawthorne Effect. It suggests classes were informed that if their students
"... that any workplace change, such as a performed well, class sizes would be reduced
research study, makes people feel important statewide. If not, they would return to their
and thereby improves their performance."54 earlier levels. In other words, Project Star's
The notion that agents may alter their teachers had a powerful incentive to improve
behavior when observed by others, especial- student performance that would not exist
ly when they know what the observer is under ordinarycircumstances. Recent empiri-
looking for, is not novel to the Hawthorne cal results have shown that students performed
Effect. Other terminology includes "inter- better in smaller classrooms. Caroline Hoxby
personal self-fulfilling prophecies" and the (2000) reported on a naturalexperiment using
"Pygmalion Effect." data from a large sample of Connecticut
Studies that claim to demonstrate the exis- schools which was free from the bias of the
tence of the Hawthorne Effect include Phyllis experiment participants knowing about the
Gimotty (2002), who used a treatment that study's goal. She found no effect of smaller
reminded physicians to refer women for free class sizes. Using data from the same natural
mammograms. In this treatment she observed experiment, Krueger (1999) did find a positive
declining referral rates from the beginning of effect from small class sizes. Similarly,Angrist
the twelve-month study to the end. This result and Lavy (1999) find a positive effect in Israel,
led her to argue that the results were "consis- exploiting data from a natural experiment
tent with the Hawthorne Effect where a tem- "designed"by ancient rabbinicdogma.
porary increase in referrals is observed in Who Makesthe Decisions?Manydecisions
response to the initiation of the breast cancer in life are not made by individuals. In some
control program."Many other studies, ranging cases "households" arrive at a decision, which
from asthma incidence to education to crimi- can be variously characterized as the out-
nal justice, have attributed empirical evidence come of some cooperative or noncooperative
to support the concept of the Hawthorne process. In some cases, groups, such as com-
Effect. For example, in an experiment in edu- mittees, make decisions. To the extent that
cation research in the 1960s where some chil- experimenters focus on individual decision-
dren were labeled as high performers and making when group decision-making is more
others low performers, when they had actually natural, there is a risk that the results will be
54 From P. G. Benson misleading. Similarly, even if the decision is
(2000, p. 688). The Hawthorne made by an individual, there is a possibility of
Effect was first demonstrated in an industrial/organiza-
tional psychological study by Professor Elton Mayo of the social learning or "cheap talk" advice to aid
Harvard Business School at the Hawthorne Plant of the the decision. Laboratory experimenters have
Western Electric Company in Cicero, Illinois, from 1927
to 1932. Researchers were confounded by the fact that begun to study this characteristic of field
productivityincreased each time a change was made to the decision-making, in effect taking one of the
lighting no matter if it was an increase or a decrease. What characteristics of naturally occurring field
brought the Hawthorne Effect to prominence in behav- environments back into the lab: for example,
ioral research was the publication of a major book in 1939
see Gary Bornstein and Ilan Yaniv (1998),
describing Mayo's research by his associates F. J.
Roethlisberger and William J. Dickson. James Cox and Stephen Hayne (2002), and T.
Parker Ballinger, Michael Palumbo, and specific agenda was designed to generate
Nathaniel Wilcox (2003). the preferredoutcome to Levine.
Plott and Levine (1978) took this field
6.2 ThreeExamplesof MinimallyInvasive
resultbackinto the lab, as well as to the the-
Experiments
ory chalkboard.This process illustratesthe
Committeesin the Field. Michael Levine complementaritywe urge in all areas of
and Charles Plott (1977) report on a field researchwith lab and field experiments.
experiment they conducted on members of Betting in the Field. Camerer (1998) is a
a flying club in which Levine was a mem- wonderful example of a field experiment
ber.55The club was to decide on a particular that allowed the controls necessary for an
configuration of planes for the members, experiment,but otherwisestudied naturally
and Levine wanted help designing a fair occurring behavior. He recognized that
agenda to deal with this problem. Plott sug- computerizedbetting systems allowed bets
gested to Levine that there were many fair to be placed and cancelled before the race
agendas, each of which would lead to a dif- was run. Thus he could try to manipulate
ferent outcome, and suggested choosing the the marketby placingbets in certainwaysto
one that got the outcome Levine desired. move the marketodds, and then cancelling
Levine agreed, and the agenda was designed them. The cancellationkeeps his net budg-
using principles that Plott understood from et at zero, and in fact is one of the main
committee experiments (but not agenda treatments-to see if such a temporarybet
experiments, which had never been affects prices appreciably.He found that it
attempted at that stage). The parameters did not, but the methodologicalcleanliness
assumed about the field were from Levine's of the test is remarkable.It is also of inter-
impressions and his chatting among mem- est to see that the possibilityof manipulat-
bers. The selected agenda was implemented ing betting markets in this way was
and Levine got what he wanted: the group motivated in part by observationsof such
even complemented him on his work. efforts in laboratorycounterparts(p. 461).
A controversy at the flying club followed The only issue is how general such oppor-
during the process of implementing the group tunities are. This is not a criticism of their
decision. The club president, who did not like use: serendipity has always been a hand-
the choice, reported to certain decision-mak- maiden of science. One cannot expect that
ers that the decision was something other all problemsof interest can be addressedin
than the actual vote. This resulted in another a naturalsetting in such a minimallyinvasive
polling of the group, using a questionnaire manner.
that Plott was allowed to design. He designed Begging in the Field. List and Lucking-
it to get the most complete and accurate pic- Reiley (2002) designed charitable solicita-
ture possible of member preferences. tions to experimentallycompare outcomes
Computation and laboratory experiments, between different seed-money amountsand
using induced values with the reported pref- differentrefund rules by using three differ-
erences, demonstrated that in the lab the ent seed proportionlevels: 10 percent, 33
outcomes were essentially as predicted. percent, or 67 percent of the $3,000
Levine and Plott (1977) counts as a "min- requiredto purchasea computer.These pro-
imally invasive" field experiment, at least in portions were chosen to be as realistic as
the ex ante sense, since there is evidence possible for an actual fundraisingcampaign
that the members did not know that the while also satisfyingthe budget constraints
they were given for this particularfundraiser.
55We are
grateful to Charles Plott for the following They also experimentedwith the use of a
account of the events "behind the scenes." refund,which guaranteesthe individualher
money back if the goal is not reachedby the treatment to another. In treatment 10NR,
group.Thus, potentialdonorswere assigned for example, the first of two crucial sen-
to one of six treatments,each funding a dif- tences read as follows: "We have already
ferent computer. They refer to their six obtained funds to cover 10 percent of the
treatmentsas 10, 10R, 33, 33R, 67, and 67R, cost for this computer, so we are soliciting
with the numbersdenoting the seed-money donations to cover the remaining $2,700." In
proportion,and R denoting the presence of treatments where the seed proportion dif-
a refundpolicy. fered from 10 percent, the 10 percent and
In carrying out their field experiments, $2,700 numbers were changed appropriate-
they wished to solicit donors in a way that ly. The second crucial sentence stated: "If we
matched, as closely as possible, the current fail to raise the $2,700 from this group of 500
state of the art in fundraising.With advice individuals, we will not be able to purchase
from fundraising companies Donnelley the computer, but we will use the received
Marketing in Englewood, Colorado, and funds to cover other operating expenditures
Caldwell in Atlanta,Georgia,they followed of CEPA." The $2,700 number varied with
generally accepted rules believed to maxi- the seed proportion, and in refund treat-
mize overall contributions.First, they pur- ments this sentence was replaced with: "If
chased the names and addresses of 3,000 we fail to raise the $2,700 from this group of
householdsin the Central Floridaarea that 500 individuals, we will not be able to pur-
met two importantcriteria:1) annualhouse- chase the computer, so we will refund your
hold income above $70,000, and 2) house- donation to you." All other sentences were
hold was knownto have previouslygiven to a identical across the six treatments.
charity(some had in fact previouslygiven to In this experiment the responses from
the Universityof CentralFlorida).They then agents were from their typical environments,
assigned500 of these namesto each of the six and the subjects were not aware that they
treatments.Second,they designedan attrac- were participating in an experiment.
tive brochuredescribingthe new center and
its purpose.Third,theywrotea letterof solic- 7. SocialExperiments
itationwith three maingoalsin mind:making
7.1 What Constitutesa SocialExperiment
the letter engagingand easy to read,promot-
in Economics?
ing the benefits of a proposed Center for
EnvironmentalPolicy Analysis(CEPA),and Robert Ferber and Warner Hirsch (1982,
clearly stating the key points of the experi- p. 7) define social experiments in economics
mental protocol. In the personalizedletter, as "... a publicly funded study that incorpo-
they noted CEPA'srole within the Central rates a rigorous statistical design and whose
Floridacommunity,the total funds required experimental aspects are applied over a peri-
to purchase the computer, the amount of od of time to one or more segments of a
seed moneyavailable,the numberof solicita- human population, with the aim of evaluating
tions sent out, and the refund rule (if any). the aggregate economic and social effects of
They also explained that contributionsin the experimental treatments." In many
excess of the amountrequiredfor the com- respects this definition includes field experi-
puter would be used for other purposes at ments and even lab experiments. The point
CEPA,noted the taxdeductibilityof the con- of departure for social experiments seems to
tribution,and closed the letter with contact be that they are part of a government agency's
informationin case the donorshad questions. attempt to evaluate programs by deliberate
The text of the solicitationletter was com- variations in agency policies. Thus they typi-
pletely identical across treatments, except cally involve variations in the way that the
for the variables that changed from one agency does its normal business, rather than
de novo programs. This characterization fits determining punitive damages in the civil
well with the tradition of large-scale social lawsuits generated by the Exxon Valdez oil
experiments in the 1960s and 1970s, dealing spill. It is also playing a major role in ongo-
with negative income taxes, employment pro- ing efforts by some corporations to affect
grams, health insurance, electricity pricing, "tort reform" with respect to limiting appeal
and housing allowances.56 bonds for punitive awards and even caps on
In recent years the lines have become punitive awards.
blurred. Government agencies have been
7.2 MethodologicalLessons
using experiments to examine issues or poli-
cies that have no close counterpart, so that The literature on social experiments has
their use cannot be viewed as variations on a been the subject of sustained methodologi-
bureaucratic theme. Perhaps the most cal criticism. Unfortunately, this criticism
notable social experiments in recent years has created a false tension between the use
have been paired-audit experiments to iden- of experiments and the use of econometrics
tify and measure discrimination. These applied to field data. We believe that virtual-
involve the use of "matched pairs" of indi- ly all of the criticisms of social experiments
viduals, who are made to look as much alike potentially apply in some form to field
as possible apart from the protected charac- experiments unless they are run in an ideal
teristics (e.g., race). These pairs then con- manner, so we briefly review the important
front the target subjects, who are employers, ones. Indeed, many of them also apply to
landlords, mortgage loan officers, or car conventional lab experiments.
salesmen. The majority of audit studies con- Recruitmentand the EvaluationProblem.
ducted to date have been in the fields of Heckman and Smith (1995, p. 87) go to the
employment discrimination and housing dis- heart of the role of experiments in a social-
crimination (see P A. Riach and J. Rich 2002 policy setting, when they note that "the
for a review).57 strongest argument in favor of experiments
The lines have also been blurred by open is that under certain conditions they solve
lobbying efforts by private companies to the fundamental evaluation problem that
influence social-policy change by means of arises from the impossibility of observing
experiments. Exxon funded a series of exper- what would happen to a given person in
iments and surveys, collected by Jerry both the state where he or she receives a
Hausman (1993), to ridicule the use of the treatment (or participates in a program) and
contingent valuation method in environmen- the state where he or she does not. If a per-
tal damage assessment. This effort was in son could be observed in both states, the
response to the role that such surveys poten- impact of the treatment on that person
tially played in the criminal action brought could be calculated by comparing his or
by government trustees after the Exxon her outcomes in the two states, and the
Valdez oil spill. Similarly, ExxonMobil fund- evaluation problem would be solved."
ed a series of experiments and focus groups, Randomization to treatment is the means by
collected in Cass Sunstein et al. (2002), to which social experiments solve this problem
ridicule the way in which juries determine if one assumes that the act of randomizing
punitive damages. This effort was in subjects to treatment does not lead to a clas-
response to the role that juries played in sic sample selection effect, which is to say
that it does not "alter the pool of participants
56 See Ferber and Hirsch of their behavior" (p. 88).
(1978, 1982) and Jerry
Hausman and David Wise (1985) for wonderful reviews.
57 Some discrimination studies have been undertaken Unfortunately, randomization could plau-
by academics with no social-policy evaluation (e.g., Chaim sibly lead to either of these outcomes, which
Fershtman and Uri Gneezy 2001 and List 2004b). are not fatal but do necessitate the use of
"econometric(k)s." We have discussed "for"the transactionwith the experimenter.

already the possibilitythat the use of ran- Of course, they were bookies, and hence
domizationcould attractsubjects to experi- selected for that occupation.
ments that are less risk-averse than the A variant on the recruitment problem
population,if the subjects rationallyantici- occurs in settings where subjects are
pate the use of randomization.It is well- observedover a period of time, and attrition
known in the field of clinicaldrug trialsthat is a possibility. Statistical methods can be
persuadingpatientsto participatein random- developed to use differentialattritionrates
ized studies is much harderthan persuading as valuable information on how subjects
them to participatein nonrandomizedstud- value outcomes (e.g., see Philipson and
ies (e.g., Michael Kramer and Stanley Hedges 1998).
Shapiro1984). The same problemappliesto Substitutionand the EvaluationProblem.
socialexperiments,as evidencedby the diffi- The second assumptionunderlyingthe valid-
culties that can be encountered when ity of social experimentsis that "closesubsti-
recruiting decentralized bureaucracies to tutes for the experimentaltreatmentare not
administer the random treatment (e.g., V. readily available" (Heckman and Smith
Joseph Hotz 1992). James Heckman and 1995, p. 88). If they are, then subjectswho
Richard Robb (1985) note that the refusal are placed in the controlgroupcould opt for
rate in one randomizedjob-trainingprogram the substitutesavailableoutside the experi-
was over 90 percent, with many of the mentalsetting.The resultis thatoutcomesin
refusalscitingethicalconcernswith adminis- the control no longer show the effect of "no
tering a randomtreatment. treatment,"but instead the effect of "possi-
What relevancedoes this have for field or ble access to an uncontrolled treatment."
lab experiments?The answer is simple: we Again, this is not a fatal problem, but one
do not know,since it has not been systemat- that has to be addressedexplicitly.In fact, it
ically studied. On the other hand, field has arisenalreadyin the elicitationof prefer-
experiments have one major advantage if ences over field commodities,as discussedin
they involve the use of subjectsin their nat- section 4.
ural environment, undertaking tasks that Experimenter Effects. In social experi-
they are familiarwith, since no sampleselec- ments, given the open natureof the political
tion is involvedat the firstlevel of the exper- process, it is almost impossible to hide the
iment. In conventional lab experiments experimental objective from the person
there is sample selection at two stages: the implementingthe experimentor the subject.
decisionto attendcollege, andthen the deci- The paired-auditexperiments are perhaps
sion to participatein the experiment.In arte- the most obvious targets of this, since the
factualfield experiments,as we defined the "treatments" themselveshave anynumberof
term in section 1, the subjectselects to be in ways bring about the conclusion that is
to
the naturally occurring environment and favoredby the researchteam conductingthe
then in the decisionto be in the experiment. experiment. In this instance, the Urban
So the artefactualfield experiment shares Institute makesno bones about its view that
this two-stageselectionprocesswith conven- discriminationis a widespreadproblem and
tionallab experiments.However,the natural that paired-auditexperimentsare a critical
field experimenthas only one source of pos- way to address it (e.g., a casual perusal of
sible selection bias:the decision to be in the Michael Fix and Raymond Struyk 1993).
naturallyoccurringmarket.Hence the book- There is nothingwrongwith this, apartfrom
ies that accepted the contrived bets of the fact that it is hardto imaginehow volun-
Camerer(1998) had no idea thathe was con- teer auditorswould not see things similarly.
ducting an experiment, and did not select Indeed, Heckman (1998, p. 104) notes that
"auditorsare sometimes instructed on the expressed their concerns, summarized by

'problemof discriminationin Americansoci- Bohm (1984b, p. 136) as follows:
ety' prior to sampling firms, so they may
have been coached to find what audit agen- They reported that they had held meetings of
their own and had decided (1) that they did not
cies wanted them to find."The opportunity
for unobservablesto influence the outcome accept the local government's decision not to
provide them with regular bus service on regu-
are potentiallyrampantin this case. lar terms; (2) that they did not accept the idea of
Of course, simple controls could be having to pay in a way that differs from the way
designed to address this issue. One could that "everybody else" pays (bus service is subsi-
dized in the area)-the implication being that
have different test-pairsvisit multiple loca-
tions to help identify the effect of a given they would rather go without this bus service,
even if their members felt it would be worth the
pair on the overall measure of discrimina- costs; (3) that they would not like to help in real-
tion. The variabilityof measured discrimi- izing an arrangement that might reduce the
nation across audit pairs is marked, and level of public services provided free or at low
costs. It was argued that such an arrangement, if
raises statisticalissues, as well as issues of
accepted here, could spread to other parts of the
interpretation (e.g., see Heckman and public sector; and (4) on these grounds, they
Siegelman 1993). Another control could be advised their union members to abstain from
to have an artificial location for the audit participating in the project.
pair to visit, where their "unobservables"
could be "observed"and controlled in later This fascinatingoutcome is actuallymore
statisticalanalyses. This procedure is used relevantfor experimentaleconomicsin gen-
in a standard manner in private business eral than it might seem.58
concerned with measuring the quality of When certain institutions are imposed on
customer relationsin the field. subjects, and certain outcomes tabulated, it
One stunning example of experimenter does not necessarily follow that the out-
effects from Bohm (1984b) illustrateswhat comes of interest for the experimenter are
can happen when the subjects see a meta- the ones that are of interest to the subject.59
game beyond the experimentitself. In 1980 For example, Isaac and Smith (1985)
he undertooka framedfield experimentfor observe virtually no instances of predatory
a local government in Stockholm that was pricing in a partial equilibrium market in
considering expanding a bus route to a which the prey had no alternative market to
majorhospitaland a factory.The experiment escape to at the first taste of blood. In a
was to elicit valuations from people who comparable multi-market setting in which
were naturallyaffected by this route, and to subjects could choose to exit markets for
test whether their aggregate contributions other markets, Harrison (1988) observed
would make it worthwhile to provide the many instances of predatory pricing.
service.A key featureof the experimentwas
7.3 Surveysthat Whisperin the Ears of
that the subjectswould have to be willingto
Princes
pay for the public good if it was to be pro-
vided for a trial period of six months. Field surveys are often undertaken to eval-
Everyone who was likely to contributewas uate environmental injury. Many involve con-
given information on the experiment, but trolled treatments such as "scope tests" of
when it came time for the experimentvirtu-
ally nobody turned up! The reasonwas that
the local trade unions had decided to boy- 58 It is a
pity that Bohm (1984b) himself firmly catego-
cott the experiment, since it represented a rized this experiment as a failure, although one can under-
stand that perspective.
threatto the currentway in which such serv- 59 See
Philipson and Hedges (1998) for a general statis-
ices were provided. The union leaders tical perspective on this problem.
changes in the extent of the injury, or differ- to inform. The second paragraph is intended
ences in the valuation placed on the injury.60 to convince the recipient of their importance
Unfortunately, such surveys suffer from the to the study. The idea here is to explain that
fact that they do not ask subjects to make a their name has been selected as one of a
direct economic commitment, and that this small sample, and that for the sample to be
will likely generate an inflated valuation representative they need to respond. The
report.61 However, many field surveys are goal is clearly to put some polite pressure on
designed to avoid the problem of hypothetical the subject to make sure that their socio-eco-
bias, by presenting the referenda as "adviso- nomic characteristic set is represented.
ry."Great care is often taken in the selection The third paragraph ensures confidential-
of motivational words in cover letters, open- ity, so that the subject can ignore any possi-
ing survey questions, and key valuation ques- ble repercussion from responding one way or
tions, to encourage the subject to take the the other in a "politically incorrect" manner.
survey seriously in the sense that their Although seemingly mundane, this assur-
response will "count."62To the extent that ance can be important when the researcher
they achieve success in this, these surveys interprets the subject as responding to the
should be considered social experiments. question at hand rather than uncontrolled
Consider the generic cover letter advocat- perceptions of repercussions. It also serves
ed by Don Dillman (1978, pp. 165ff.) for use to mimic the anonymity of the ballot box.
in mail surveys. The first paragraph is The fourth paragraph builds on the pre-
intended to convey something about the ceding three to drive home the usefulness of
social usefulness of the study: that there is the survey response itself, and the possibility
some policy issue that the study is attempting that it will influence behavior:
The fourth paragraph of our cover letter reem-

60
Scope treatments might be employed if there is some phasizes the basic justification for the study-its
scientific uncertainty about the extent of the injury to the social usefulness. A somewhat different
environment at the time of the valuation, as in the two sce-
narios used in the surveyof the KakaduConservationZone approach is taken here, however, in that the
in Australiareported in David Imber, Gay Stevenson, and intent of the researcher to carry through on any
Leanne Wilks (1991). Or they may be used to ascertain promises that are made, often the weakest link
some measure of the internal validity of the elicited valua- in making study results useful, is emphasized. In
tions, as discussed by Carson (1997) and V. Kerry Smith {an example cover letter in the textl the promise
and Laura Osborne (1996). Variationsin the valuation are (later carried out) was made to provide results to
the basis for inferring the demand curve for the environ- government officials, consistent with the lead
mental curve, as discussed by Glenn Harrison and Bengt
paragraph, which included a reference to bills
Kristr6m(1996).
61
See Cummings and Harrison (1994), Cummings, being considered in the State Legislature and
Harrison,and Rutstrim (1995), and Cummingset al. (1997). Congress. Our basic concern here is to make the
62 There are some instances in which the
agency under- promise of action consistent with the original
social utility appeal. In surveys of particular
taking the study is deliberately kept secret to the respon-
dent. For example, this strategywas adopted by Carson et communities, a promise is often made to pro-
al. (1992) in their survey of the Exxon Valdez oil spill vide results to the local media and city officials.
undertaken for the attorney-generalof the state of Alaska. (Dillman 1978, p. 171)
They in fact asked subjects near the end of the surveywho
they thought had sponsored the study, and only 11 percent From our perspective, the clear intent and
responded correctly (p. 91). However, 29 percent thought
that Exxon had sponsored the study. Although no explicit effect of these admonitions is to attempt to
connection was made to suggest who would be using the convince the subject that their response will
results, it is therefore reasonable to presume that at least
40 percent of the subjects expected the responses to go have some probabilistic bearing on actual
directly to one or another of the litigants in this well- outcomes.
known case. Of course, that does not ensure that the This generic approach has been used, for
responses will have a direct impact, since there may have
been some (rational)expectation that the case would settle example, in the CVM study of the Nestucca
without the survey results being entered as evidence. oil spill by Rowe et al. (1991). Their cover
letter contained the following sentences in Although no promise of a direct policy

the opening and penultimateparagraphs: impact is made, the survey responses are
Government and industry officials throughout obviously valued in this instance by the
the Pacific Northwest are evaluating programs agency charged with directly and publically
to prevent oil spills in this area. Before making advising the relevant politicians on the matter.
decisions that may cost you money, these offi- It remains an open question if these "advi-
cials want your input.... The results of this study sory referenda" actually motivate subjects to
will be made available to representatives of state,
respond truthfully, although that is obviously
provincial and federal governments, and indus-
try in the Pacific Northwest. (emphasis added)
something that could be studied systemati-
cally as part of the exercise or using con-
In the key valuationquestion,subjectsare trolled laboratory and field experiments.64
motivatedby the followingwords: 8. Natural Experiments
Your answers to the next questions are very 8.1 What Constitutes a Natural Experiment
important. We do not yet know how much it will in Economics?
cost to prevent oil spills. However, to make deci-
sions about new oil spill prevention programs Natural experiments arise when the
that could cost you money, government and experimenter simply observes naturally
industry representatives want to learn how much occurring, controlled comparisons of one
it is worth to people like you to avoid more spills. or more treatments with a baseline.65 The
common feature of these experiments is
These words reinforce the basic message
of the cover letter: there is some probabili- serendipity: policy makers, nature, or tele-
vision game-show producers66 conspire to
ty, however small, that the response of the
subjectwill have an actualimpact. 64 Harrison
(2005b) reviews the literature.
More direct connectionsto policy impact 65 Good
examples in economics include H. E. Frech
occur when the surveyis openly undertaken (1976); Roth (1991); Jere Behrman, MarkRosenzweig, and
for a public agencychargedwith makingthe Paul Taubman (1994); Stephen Bronars and Jeff Grogger
(1994); Robert Deacon and Jon Sonstelie (1985); Andrew
policy decision. For example, the Resource Metrick (1995); Bruce Meyer, W. Kip Viscusi, and David
Assessment Commission of Australia was Durbin (1995); John Warnerand Saul Pleeter (2001); and
Mitch Kunce, Shelby Gerking,and William Morgan(2002).
charged with making a decision on an applica- 66 Smith (1982; p. 929) compared the advantagesof lab-
tion to mine in public lands, and used a survey
oratory experiments to econometric practice, noting that
to help it evaluate the issue. The cover letter, "Overtwenty-five years ago, Guy Orcutt characterizedthe
econometrician as being in the same predicament as the
signed by the chairperson of the commission electrical engineer who has been charged with the task of
under the letterhead of the commission,
deducing the laws of electricity by listening to a radio play.
spelled out the policy setting clearly: To a limited extent, econometric ingenuity has provided
some techniques for conditional solutions to inference
The Resource Assessment Commission has problems of this type." Arguably,watching the television
been asked by the Prime Minister to conduct an can be an improvement on listening to the radio, since TV
inquiry into the use of the resources of the game shows provide a naturalavenue to observe real deci-
Kakadu Conservation Zone in the Northern sions in an environment with high stakes. J. B. Berk, E.
Hughson, and K. Vandezande (1996) and Rafael Tenorio
Territory and to report to him on this issue by and Timothy Cason (2002) study contestants' behavior on
the end of April 1991.63 ... You have been The Price Is Right to investigate rational decision theory
selected randomly to participate in a national and whether subjects play the unique subgame perfect
survey related to this inquiry. The survey will be Nash equilibrium. R. Gertner (1993) and R. M. W. J.
asking the views of 2500 people across Beetsma and P. C. Schotman (2001) make use of data from
Australia. It is important that your views are Card Sharks and Lingo to examine individual risk prefer-
recorded so that all groups of Australians are ences. Steven Levitt (2003) and List (2003) use data from
included in the survey. (Imber, Stevenson, and The WeakestLink and Friend or Foe to examine the nature
Wilks 1991, p. 102) and extent of disparate treatment among game-show con-
testants. And Metrick (1995) uses data from Jeopardy! to
63The cover letter was dated analyze behavior under uncertainty and players' ability to
August 28, 1990. choose strategic best-responses.
generate these comparisons. The main elected to take the annuity, one could infer
attraction of natural experiments is that that his discount rate was less than the
they reflect the choices of individuals in a threshold.67
natural setting, facing natural conse- This design is essentially the same as one
quences that are typically substantial. The used in a long series of laboratory experi-
main disadvantage of natural experiments ments studying the behavior of college stu-
derives from their very nature: the experi- dents.68 Comparable designs have been
menter does not get to pick and choose the taken into the field, such as the study of the
specifics of the treatments, and the experi- Danish population by Harrison, Lau, and
menter does not get to pick where and Williams (2002). The only difference is that
when the treatments will be imposed. The the field experiment evaluated by WP
first problem may result in low power to offered each individual only one discount
detect any responses of interest, as we rate: Harrison, Lau, and Williams offered
illustrate with a case study in section 8.2 each subject twenty different discount
below. While there is a lack of control, we rates, ranging between 2.5 percent and 50
should obviously not look a random gift percent.
horse in the mouth when it comes to mak- Five features of this natural experiment
ing inferences. There are some circum- make it particularly compelling for the pur-
stances, briefly reviewed in section 8.3, pose of estimating individual discount
when nature provides useful controls to rates. First, the stakes were real. Second,
augment those from theory or "manmade" the stakes were substantial and dwarf any-
experimentation. thing that has been used in laboratory
experiments with salient payoffs in the
United States. The average lump-sum
8.2 Inferring Discount Rates by Heroic
amounts were around $50,000 and $25,000
Extrapolation for officers and enlisted personnel, respec-
In 1992, the United States Department

of Defense started offering substantial early
retirement options to nearly 300,000 indi- 67Warner and Pleeter
(2001) recognize that one prob-
viduals in the military. This voluntary sepa- lem of interpretation might arise if the very existence of
the scheme signaled to individuals that they would be
ration policy was instituted as part of a forced to retire anyway. As it happens, the military also
general policy of reducing the size of the significantly tightened up the rules governing "progres-
sion through the ranks," so that the probability of being
military as part of the "Cold War dividend."
involuntarilyseparated from the military increased at the
John Warner and Saul Pleeter (2001) (WP) same time as the options for voluntary separation were
recognize how the options offered to mili- offered. This background factor could be significant,
since it could have led to many individuals thinking that
tary personnel could be viewed as a natural
they were going to be separated from the militaryanyway
experiment with which one could estimate and hence deciding to participate in the voluntary
individual discount rates. In general terms, scheme even if they would not have done so otherwise.
one option was a lump-sum amount, and Of course, this background feature could work in any
direction, to increase or decrease the propensity of a
the other option was an annuity. The indi-
given individual to take one or the other option. In any
vidual was told what the cut-off discount event, WP allow for the possibility that the decision to
rate was for the two to be actuarially equal, join the voluntary separation process itself might lead to
and this concept was explained in various sample selection issues. They estimate a bivariate probit
model, in which one decision is to join the separation
ways. If an individual is observed to take process and the other decision is to take the annuity
the lump-sum, one could infer that his dis- rather than the lump-sum.
68 See Coller and Williams (1999), and Shane
count rate was greater than the threshold Frederick, George Loewenstein, and Ted O'Donoghue
rate. Similarly, for those individuals that (2002), for recent reviews of those experiments.
iment are simply not robustto the sampling

tively.69 Third, the military went to some
and predictive uncertaintyof having to use
lengths to explain to everyone the financial
an estimated model to infer discount rates.
implications of choosing one option over
the other, making the comparison of per- We use the same method as WP (2001, table
sonal and threshold discount rates relative- 6, p. 48) to calculate estimated discount
ly transparent. Fourth, the options were
rates.71In their table 3, WP calculate the
offered to a wide range of officers and mean predicted discountrate from a single-
enlisted personnel, such that there are sub- equation probit model, using only the dis-
stantial variations in key demographic vari- count rate as an explanatory variable,
ables such as income, age, race, and employinga shortcutformulathat correctly
education. Fifth, the time horizon for the evaluatesthe mean discountrate. Aftereach
annuity differed in direct proportion to the probit equation is estimated, it is used to
years of military service of the individual, predict the probabilitythat each individual
so that there are annuities between four- would accept the lump-sum alternativeat
teen and thirty years in length. This facili- discount rates varying between 0 percent
tates evaluation of the hypothesis that and 100 percent in incrementsof 1 percent-
discount rates are stationary over different age point. For example,considera 5 percent
time horizons. discount rate offered to officers, and the
WP conclude that the average individual results of the single-equationprobit model.
discount rates implied by the observed sep- Of the 11,212individualsin this case, 72 per-
aration choices were high relative to a priori cent are predicted to have a probabilityof
expectations for enlisted personnel. In one accepting the lump-sum of 0.5 or greater.
model in which the after-tax interest rate The lowest predicted probabilityof accept-
offered to the individual appears in linear ance for any individualat this rate is 0.207,
form, they predict average rates of 10.4 per- and the highest is 0.983.
cent and 35.4 percent for officers and enlist- Similar calculations are undertaken for
ed personnel, respectively. However, this each possible discount rate between 0 per-
model implicitly allows estimated discount cent and 100 percent, and the results tabu-
rates to be negative, and indeed allows them lated. Once the predicted probabilitiesof
to be arbitrarily negative. In an alternative acceptance are tabulated for each of the
model in which the interest rate term individualsoffered the buy-out, and each
appears in logarithmic form, and one possible discount rate between 0 percent
and 100 percent, we loop over each individ-
implicitly imposes the a priori constraint
that an elicited individual discount rate be ual and identifythe smallestdiscountrate at
which the lump-sum would be accepted.
positive, they estimate average rates of 18.7
percent and 53.6 percent, respectively. We
This smallest discount rate is precisely
where the probit model predicts that this
prefer the estimates that impose this prior
belief, although nothing below depends on individualwould be indifferentbetween the
using them.70 lump-sum and the annuity.This provides a
We show that many of the conclusions distributionof estimatedminimumdiscount
about discount rates from this natural exper- rates, one for each individualin the sample.
In figure 2 we report the results of this
6992 calculation, showing the distribution of
percent of the enlisted personnel accepted the
lump-sum, and 51 percent of the officers. However, these personal discount rates initially offered to
acceptance rates varied with the interest rates offered, the subjects and then the distributions
particularlyfor enlisted personnel. implied by the single-equation probit
70 Harrison (2005a) documents the detailed calcula-
tions involved, and examines the differences that arise with
alternative specifications and samples. 71John Warnerkindly provided the data.
-1
.025-
-.8
.a .02- "u
aC)
.015- -.6
0
0
.01- -.4 -
- .005-
.005- -.2
0 I
I I I I I
0
0 20 40 60 80 100
Discount Rate in Percent
Estimated ... ...;'I"..
::::::::::: Offered
Figure 2. Offered and Estimated Discount Rates in Warnerand Pleeter Natural Experiments
model used by WP. 72 These results pool roughly centered on the distribution of
the data for all separating personnel. The offered rates, but much more dispersed.
grey histogram shows the after-tax discount There is nothing "wrong"with these differ-
rates that were offered, and the black his- ences between the offered and estimated
togram shows the discount rates inferred discount rates, although they will be critical
from the estimated "log-linear" model that when we calculate standard errors on these
constrains discount rates to be positive. estimated discount rates. Again, the estimat-
Given the different shapes of the his- ed rates in figure 2 are based on the logic
tograms, they use different vertical axes to described above: no prediction error is
allow simple visual comparisons. assumed from the estimated statistical
The main result is that the distribution of model when it is applied at the level of the
estimated discount rates is much wider than individual to predict the threshold rate at
the distribution of offered rates. Harrison which the lump-sum would be accepted.
(2005a) presents separate results for the The main conclusion of WP is contained
samples of officers and enlisted personnel, in their table 6, which lists estimates of the
and for the alternative specifications consid- average discount rates for various groups of
ered by WP. For enlisted personnel the dis- their subjects. Using the model that imposes
tribution of estimated rates is almost entirely the a priori restriction that discount rates be
out-of-sample in comparison to the offered positive, they report that the average dis-
rates above it. The distribution for officers is count rate for officers was 18.7 percent, and
53.6 percent for enlisted personnel. What
72 are the standard errors on these means?
Virtually identical results are obtained with the
model that corrects for possible sample-selection effects. There is reason to expect that they could be
quite large, due to constraintson the scope distributionsonly reflects samplingover the
of the naturalexperiment. individuals. One can generate standard
Individuals were offered a choice errors that also capture the uncertaintyin
between a lump-sum and an annuity.The the probit model coefficients as well.
before-tax discount rate that just equated Figure 3 displaysthe resultsof takinginto
the present value of the two instruments account the uncertainty about the coeffi-
ranged between 17.5 percent and 19.8 per- cients of the estimated model used by WP.
cent, which is a very narrow range of dis- Since it is an importantdimensionto consid-
count rates. The after-tax equivalent rates er, we show the time horizonfor the elicited
rangedfrom a low of 14.5 percent up to 23.5 discount rates on the horizontalaxis.74The
percent for those offered the separation middle line showsa cubic spline throughthe
option, but over 99 percent of the after-tax predicted average discount rate. The top
rates were between 17.6 percent and 20.4 (bottom) line shows a cubic spline through
percent. Thus the above inferences about the upper (lower) bound of the 95 percent
average discount rates for enlisted person- confidence interval,allowingfor uncertainty
nel are "out of sample,"in the sense that in the individualpredictionsdue to reliance
they do not reflect direct observation of on an estimatedstatisticalmodel to infer dis-
responses at those rates of 53.6 percent, or count rates.75Thus, in figure 3 we see that
indeed at any ratesoutside the interval(14.5 there is considerableuncertaintyabout the
percent, 23.5 percent). Figure 2 illustrates discount rates for enlisted personnel, and
this point as well, since the right mode is that it is asymmetric.On balance,the model
entirelydue to the estimatesof enlisted per- implies a considerableskewness in the dis-
sonnel. The average for enlisted personnel tribution of rates for enlisted personnel,
therefore reflects, and relies on, the predic- with some individualshavingextremelyhigh
tive power of the parametric functional implied discount rates. Turning to the
forms fitted to the observeddata.The same results for officers,we find much less of an
general point is true for officers, but the effect from model uncertainty.In this case
problem is far less severe. the rates are relatively precisely inferred,
Even if one acceptedthe parametricfunc- particularlyaroundthe range of rates span-
tional forms (probit),the standarderrorsof ning the effective rates offered, as one
predictions outside of the sample range of would expect.76
break-evendiscountrateswill be much larg- We conclude that the resultsfor enlisted
er thanthose within the samplerange.73The personnel are too impreciselyestimatedfor
standard errors of the predicted response
can be calculateddirectlyfrom the estimat- 74 The time horizon of the annuity offered to individu-
ed model. Note that this is not the same as als in the field varied directly with the years of military
the distributionshownin figure2, which is a service completed. For each year of service the horizon on
the annuitywas two years longer. As a result, the annuities
distributionover the sampleof individualsat
being considered by individuals were between fourteen
each simulated discount rate that assume and thirty years in length. With roughly 10 percent of the
that the model providesa perfectprediction sample at each horizon, the average annuity horizon was
around 22 years.
for each individual.In other words,the pre- 75 In fact, we calculate rates
only up to 100 percent, so
dictions underlying figure 2 just use the the upper confidence intervals for the model is con-
strained to equal 100 percent for that reason. It would be
averagepredictionfor each individualas the a simple matter to allow the calculation to consider higher
truth, so the samplingerror reflected in the rates, but there would be little inferentialvalue in doing so.
76 It is a standardresult from elementary econometrics
that the forecast intervalwidens as one uses the regression
73 Relaxing the functional form also allows some addi- model to predict for values of the exogenous variablesthat
tional uncertainty into the estimation of individual dis- are further and further away from their average (e.g.,
count rates. William Greene 1993, p. 164-66).
Officers Enlisted Personnel

100 100
ik
75
50
25
I..
6i
z,j
L-% ...
25
::-:;::·::::
··· '
:::·::-:;:;;:.:.::;;:.:,;:::::;:
:-:::;:;::-
. I
v u
I I I I I I I I
15 20 25 30 15 20 25 30
time horizon of annuity in years time horizon of annuity in years
Figure 3. Implied Discount Rates IncorporatingModel Uncertainty
them to be used to draw reliable inferences "naturalnaturalexperimentalapproach"by

about the discount rates. However, the Rosenzweig and Wolpin (2000).
resultsfor officersare relativelytightly esti- For example, monozygotic twins are effec-
mated, and can be used to draw more reli- tively natural clones of each other at birth.
able inferences. The reason for the lack of Thus one can, in principle, compare out-
precision in the estimates for enlisted per- comes for such twins to see the effect of dif-
sonnel is transparentfrom the design,which ferences in their history, knowing that one
was obviously not chosen by the experi- has a control for abilities that were innate at
menters: the estimates rely on out-of-sam- birth. Of course, a lot of uncontrolled and
ple predictions, and the standard errors unobserved things can occur after birth and
embodied in figure 3 properly reflect the before humans get to make choices that are
uncertaintyof such an inference. of any policy interest. So the use of such
instruments obviously requires additional
8.3 NaturalInstruments
assumptions, beyond the a priori plausible
Some variable or event is said to be a one that the natural biological event that led
good instrumentfor unobservedfactorsif it to these individuals being twins was inde-
is orthogonalto those factors. Many of the pendent of the efficacy of their later educa-
difficulties of "manmade"random treat- tional and labor-market experiences. Thus
ments have been discussedin the context of the lure of "measurement without theory" is
social experiments. However, in recent clearly illusory.
years many economists have turned to Another concern with the "natural instru-
"nature-made"random treatments instead, ments" approach is that it often relies on the
employingan approachto the evaluationof assumption that only one of the explanatory
treatments that has come to be called the variables is correlated with the unobserved
factors.77 This means that only one instru- marketbehavior,they are positing "whatif'
ment is required, which is fortunate since scenarios which need not be tethered to
nature is a stingy provider of such instru- reality. Sometimes theorists constraintheir
ments. Apart from twins, natural events that propositionsby the requirementthatthey be
have been exploited in this literature "operationally meaningful," which only
include birth dates, gender, and even weath- requiresthat they be capableof being refut-
er events, and these are not likely to grow ed, and not that anyone has the technology
dramatically over time.
or budget to actuallydo so.
Both of these concerns point the way to a Tests of expected utility theory have pro-
complementary use of different methods of
vided a dramaticillustrationof the impor-
experimentation, much as econometricians
tance of thought experiments being
use a priori identifying assumptions as a explicitly linked to stochastic assumptions
substitute for data in limited information involved in their use. Severalstudies offer a
environments. rich array of different error specifications
leading to very different inferences about
9. ThoughtExperiments the validity of expected utility theory, and
particularlyabout what part of it appearsto
Thought experiments are extremely com- be broken: Ballinger and Wilcox (1997);
mon in economics, and would seem to be Enrica Carbonne (1997); David Harless
fundamentally different from lab and field and Camerer(1994); Hey (1995);John Hey
experiments. We argue that they are not, and Chris Orme (1994); Graham Loomes,
drawing on recent literature examining the Peter Moffatt, and Sugden (2002); and
role of statistical specifications of experi- Loomes and Sugden (1995, 1998). The
mental tests of deterministic theories.
methodological problem is that debates
Although it may surprise some, the compar- over the characterizationof the residual
ison between lab experiments and field have come to dominate the substantive
experiments that we propose has analogues issues, as crisply drawn by Ballinger and
to the way thought experiments have been Wilcox (1997, p. 1102)78:
debated in analytic philosophy and the view
that thought experiments are just "attenuat- We know subjects are heterogeneous. The rep-
ed experiments." Finally, we consider the resentative decision maker ... restrictionfails
place of measures of the natural functioning miserablyboth in this study and new ones ....
of the brain during artefactual experimental Purely structural theories permit heterogeneity
conditions. by allowingseveralpreferencepatterns,but are
mute when it comes to mean errorratevariabil-
ity between or within patterns(restrictionslike
9.1 WhereAre the EconometricInstructions CE) and within-patternheterogeneityof choice
to TestTheory? probabilities(restrictionslike CH and ZWC).
We believe Occam'sRazorand the 'Facts don't
To avoid product liability litigation, it is kill theories, theories do' cliches do not apply:
standard practice to sell commodities with CE, CH andZWCarean atheoreticalsupporting
clear warnings about dangerous use and cast in dramasabout theoreticalstars,and poor
showingsby this cast shouldbe excusedneither
operating instructions designed to help one becausethey aresimplenorbecausethereareno
get the most out of the product. replacements.It is time to auditiona new cast.
Unfortunately, the same is not true of eco-
nomic theories. When theorists undertake In this instance, a lot has been learned
thought experiments about individual or about the hidden implicationsof alternative
77 78The notation in this
Rosenzweig and Wolpin (2000, p. 829, fn.4, and p. quote does not need to be
873). defined for the present point to be made.
stochastic specifications for experimental illustrated by example. We choose an exam-

tests of theory.But the point is that all of this ple in which there have been actual (lab and
could have been avoided if the thought field) experiments, but where the actual
experimentsunderlyingthe structuralmod- experiments could have been preceded by a
els had accountedfor errorsand allowedfor thought experiment. Specifically, consider
individual heterogeneity in preferences the identification of "trust"as a characteris-
from the outset. That relaxationdoes not tic of an individual's utility function. In
rescue expected utilitytheory,nor is that the some studies this concept is defined as the
intent, but it does serve to make the experi- sole motive that leads a subject to transfer
mental tests informativefor their intended money to another subject in an investment
purpose of identifyingwhen and where that game.79 For example, Joyce Berg, John
theory fails. Dickhaut, and McCabe (1995) use the game
to measure "trust"by the actions of the first
9.2 Are ThoughtExperimentsJust Slimmed- player and hence "trustworthiness" from
Down Experiments? the responses of the second player.80
But "trust"measured in this way obviously
Roy Sorenson (1992) presents an elabo- suffers from at least one confound: aversion
rate defense of the notion that a thought
to inequality, or "other-regarding prefer-
experimentis reallyjust an experiment"that ences." The idea is that someone may be
purportsto achieve its aim without the ben- averse to seeing different payoffs for the two
efit of execution"(p. 205). This lack of exe-
cution leads to some practical differences, players, since roles and hence endowments
in the basic version are assigned at random.
such as the absence of any need to worry
This is one reason that almost all versions of
about luck affectingoutcomes. Anotherdif-
the experiments have given each player the
ference is that thought experimentsactually
same initial endowment to start, so that the
require more discipline if they are to be first player does not invest money with the
valid. In his Nobel Prize lecture, Smith
second player just to equalize their payoffs.
(2003, p. 465) notes that: But it is possible that the first player would
like the other player to have more, even if it
Doing experimental economics has changed the
way I think about economics. There are many
means having more than the first player.
reasons for this, but one of the most prominent Cox (2004) proposes that one pair the
is that designing and conducting experiments investment game with a dictator gamesl to
forces you to think through the process rules and
identify how much of the observed transfer
procedures of an institution. Few, like Einstein, from the first player is due to "trust" and
can perform detailed and imaginative mental
experiments. Most of us need the challenge of
how much is due to "other-regarding prefer-
real experiments to discipline our thinking. ences." Since there is strong evidence that
There are, of course, other differences 79Player 1 transferssome percentage of an endowment

between the way that thought experiments to player 2, that transfer is tripled, and then player 2
and actual experimentsare conducted and decides how much of the expanded pie to return.
80 This
game has been embedded in many other set-
presented. But these likely have more to do tings before and after Berg, Dickhaut, and McCabe
with the culture of particular scholarly (1995). We do not question the use of this game in the
groups than anythingintrinsicto each type investigation of broader assessments of the nature of
"socialpreferences,"which is an expression that subsumes
of experiment. many possible motives for the observed behavior, includ-
The manner in which thought experi- ing the ones discussed below.
ments can be viewed as "slimmed-down 81The first player transfersmoney to the second player,
who is unable to return it or respond in any way. Martin
experiments-ones that are all talk and no Dufwenberg and Gneezy (2000) also compare the trust
action" (Sorenson 1992, p. 190), is best and dictator games directly.
subjects appear to exhibit substantialaver- more to do with the aims and rhetorical
sion to inequality in experiments of this goals of doing experiments. As Sorenson
kind, do we need to actuallyrun the experi- (1991, p. 205) notes:
ment in which the same subjectparticipates
in a dictatorgame and an investment game The aim of any experiment is to answer or raise
its question rationally. As stressed (earlier ...),
to realize that "trust"is weaklyoverestimat-
the motives of an experiment are multifarious.
ed by the executed trust experiments?One One can experiment in order to teach a new
might object that we would not be able to technique, to test new laboratory equipment, or
make this inference without having run to work out a grudge against white rats. (The
some prior experiments in which subjects principal architect of moder quantum electro-
transfermoney under dictator,so this design dynamics, Richard Feynman, once demonstrat-
ed that the bladder does not require gravity by
proposal of Cox (2004) does not count as a standing on his head and urinating.) The dis-
thought experiment. But imagine counter- tinction between aim and motive applies to
factually82that Cox (2004) left it at that, and thought experiments as well. When I say that an
did not actually run an experiment. We experiment 'purports' to achieve its aim without
would stillbe able to drawthe new inference execution, I mean that the experimental design
is presented in a certain way to the audience.
fromhis design that trustis weaklyover-esti- The audience is being invited to believe that
mated in previous experiments if one contemplation of the design justifies an answer
accounts for the potential confound of to the question or (more rarely) justifiably raises
inequality aversion.3 Thus, in what sense its question.
should we view the thought experiment of
the proposed design of Cox (2004) as any- In effect, then, it is caveat emptor with
thing other than an attenuatedversionof the thought experiments-but the same homily
ordinary experiment that he actually surely applies to any experiment, even if
designed and executed? executed.
One trepidation with treating a thought
9.3 That's Not a Thought Experiment ...
experimentas just a slimmed-downexperi- ThisIs!
ment is that it is untetheredby the realityof
"proof by data" at the end. But this has We earlier defined the word "field" in the
following manner: "used attributively to
82 A denote an investigation, study, etc., carried
thought experiment at work.
83 As it
happens, there are two further confounds at
out in the natural environment of a given
work in the trust design, each of which can be addressed. material, language, animal, etc., and not in
One is risk attitudes, at least as far as the interpretation of the laboratory, study, or office." Thus, in an
the behavior of the first player is concerned. Sending
money to the other player is risky.If the first player keeps important sense, experiments that employ
all of his endowment, there is no risk. So a risk-lovingplay- methods to measure neuronal activity during
er would invest, just for the thrill. A risk-averse player controlled tasks would be included, since the
would not invest for this reason. But if there are other
motives for investing, then risk attitudes will exacerbate or functioning of the brain can be presumed to
temper them, and need to be taken into account when be a natural reaction to the controlled stim-
identifying the residual as trust. Risk attitudes play no role ulus. Neuroeconomics is the study of how
for the second player's decision. The other confound, in
the proposed design of Cox (2004), is that the "price of different parts of the brain light up when
giving" in his proposed dictator game is $1 for $1 trans- certain tasks are presented, such as exposure
ferred, whereas it is $1 for $3 transferred in the invest- to randomly generated monetary gain or loss
ment game. Thus one would weakly understate the extent
of other-regarding preferences in his design, and hence in Hans Breiter et al. (2001), the risk elicita-
weakly overstate the residual "trust."The general point is tion tasks of Kip Smith et al. (2002) and
even clearer: after these potential confounds are taken Dickhaut et al. (2003), the trust games of
into account, what faith does one have that a reliable
measure of trust has been identified statistically in the McCabe et al. (2001), and the ultimatum
original studies? bargaining games of Alan Sanfey et al.
(2003). In many ways these methods are The main methodological conclusion we
extensions of the use of verbal protocols draw is that experimenters should be wary of
(speaking out loud as the task is performed) the conventional wisdom that abstract,
used by K. Anders Ericsson and Herbert imposed treatments allow general inferences.
Simon (1993) to study the algorithmic In an attempt to ensure generality and con-
processes that subjects were going through trol by gutting all instructions and procedures
as they solved problems, and the use of of field referents, the traditional lab experi-
mouse-tracking technology by Eric Johnson menter has arguably lost control to the extent
et al. (2002) to track sequential information that subjects seek to provide their own field
search in bargaining tasks. The idea is to referents. The obvious solution is to conduct
monitor some natural mental process as the experiments both ways: with and without nat-
experimental treatment is administered, urally occurring field referents and context.
even if the treatment is artefactual. If there is a difference, then it should be
studied. If there is no difference, one can
10. Conclusion conditionally conclude that the field behavior
in that context travels to the lab environment.
We have avoided drawing a single, bright
line between field experiments and lab REFERENCES
experiments. One reason is that there are Angrist, Joshua D.; Guido W. Imbens and Donald B.
Rubin. 1996. "Identification of Casual Effects Using
several dimensions to that line, and inevitably Instrumental Variables," J. Amer. Statist. Assoc.
there will be some trade-offs between those. 91:434, pp. 444-45.
The extent of those trade-offs will depend on Angrist, Joshua D. and Alan B. Krueger. 2001.
"Instrumental Variables and the Search for
where researchers fall in terms of their agree- Identification: From Supply and Demand to Natural
ment with the argument and issues we raise. Experiments," J. Econ. Persp. 15:4, pp. 69-85.
Another reason is that we disagree where Angrist, Joshua D. and Victor Lavy. 1999. "Using
Maimonides' Rule to Estimate the Effect of Class
the line would be drawn. One of us Size on Scholastic Achievement," Quart. J. Econ.
(Harrison), bred in the barren test-tube set- 114:2, pp. 553-75.
Ballinger, T. Parker; Michael G. Palumbo and
ting of classroom labs sans ferns, sees virtu- Nathaniel T. Wilcox. 2003. "Precautionary Saving
ally any effort to get out of the classroom as and Social Learning Across Generations: An
constituting a field experiment to some use- Experiment," Econ. J. 113:490, pp. 920-47.
ful degree. The other (List), raised in the Ballinger, T. Parker and Nathaniel T. Wilcox. 1997.
"Decisions, Error and Heterogeneity," Econ. J.
wilds amidst naturally occurring sports-card 107:443, pp. 1090-105.
geeks, would include only those experiments Banaji, Mahzarin R. and Robert G. Crowder. 1989.
that used free-range subjects. Despite this "The Bankruptcy of Everyday Memory," Amer.
Pyschol. 44, pp. 1185-93.
disagreement on the boundaries between Bateman, Ian; Alistair Munro, Bruce Rhodes, Chris
one category of experiments and another Starmer and Robert Sugden. 1997. "Does Part-
Whole Bias Exist? An Experimental Investigation,"
category, however, we agree on the charac- Econ. J. 107:441, pp. 322-32.
teristics that make a field experiment differ Becker, Gordon M.; Morris H. DeGroot and Jacob
from a lab experiment. Marschak. 1964. "Measuring Utility by a Single-
Using these characteristics as a guide, we Response Sequential Method," Behav. Sci. 9:July,
pp. 226-32.
propose a taxonomy of field experiments Beetsma, R. M. W J. and P. C. Schotman. 2001.
that helps one see their connection to lab "Measuring Risk Attitudes in a Natural Experiment:
Data from the Television Game Show Lingo," Econ.
experiments, social experiments, and natural
J. 111:474, pp. 821-48.
experiments. Many of the differences are Behrman, Jere R.; Mark R. Rosenzweig and Paul
illusory, such that the same issues of control Taubman. 1994. "Endowments and the Allocation of
apply. But many of the differences matter Schooling in the Family and in the Marriage Market:
The Twins Experiment," J. Polit. Econ. 102:6, pp.
for behavior and inference, and justify the 1131-74.
focus on the field. Benson, P. G. 2000. "The Hawthorne Effect," in The
Harrison and List: IYieldExperiments 1051
Corsini Encyclopedia of Psychology and Behavioral Making: A Comparison of Students and
Science. Vol. 2, 3rd ed. W E. Craighead and C. B. Businessmen in a Simulated Progressive Auction," in
Nemeroff, eds. NY: Wiley. Research in Experimental Economics, Vol. 3. V. L.
Berg, Joyce E.; John Dickhaut and Kevin McCabe. Smith, ed. Greenwich, CT: JAI Press.
1995. "Trust, Reciprocity, and Social History," Games Camerer, Colin F. 1998. "Can Asset Markets Be
Econ. Behav. 10, pp. 122-42. Manipulated? A Field Experiment with Racetrack
Berk, J. B.; E. Hughson and K. Vandezande. 1996. Betting," J. Polit. Econ. 106:3, pp. 457-82.
"The Price Is Right, but Are the Bids? An Camerer, Colin and Robin Hogarth. 1999. "The Effects
Investigation of Rational Decision Theory," Amer. of Financial Incentives in Experiments: A Review
Econ. Rev. 86:4, pp. 954-70. and Capital-Labor Framework," J. Risk Uncertainty
Berlin, Brent and Paul Kay. 1969. Basic Color Terms: 19, pp. 7-42.
Their Universality and Evolution. Berkeley: UC Press. Cameron, Lisa A. 1999. "Raising the Stakes in the
Binswanger, Hans P. 1980. "Attitudes Toward Risk: Ultimatum Game: Experimental Evidence from
Experimental Measurement in Rural India," Amer.J. Indonesia," Econ. Inquiry 37:1, pp. 47-59.
Ag. Econ. 62:3, pp. 395-407. Carbone, Enrica. 1997. "Investigation of Stochastic
. 1981. "Attitudes Toward Risk: Theoretical Preference Theory Using Experimental Data," Econ.
Implications of an Experiment in Rural India," Econ. Letters 57:3, pp. 305-11.
J. 91:364, pp. 867-90. Cardenas, Juan C. 2003. "Real Wealth and
Blackburn, McKinley; Glenn W. Harrison and E. Experimental Cooperation: Evidence from Field
Elisabet Rutstrim. 1994. "Statistical Bias Functions Experiments," J. Devel. Econ. 70:2, pp. 263-89.
and Informative Hypothetical Surveys," Amer. J. Ag. Carpenter, Jeffrey; Amrita Daniere and Lois Takahashi.
Econ. 76:5, pp. 1084-88. 2004. "Cooperation, Trust, and Social Capital in
Blundell, R. and M. Costa-Dias. 2002. "Alternative Southeast Asian Urban Slums," J. Econ. Behav. Org.
Approaches to Evaluation in Empirical 55:4, pp. 533-51.
Microeconomics," Portuguese Econ. J. 1, pp. 91-115. Carson, Richard T. 1997. "Contingent Valuation
Blundell, R. and Thomas MaCurdy. 1999. "Labor Surveys and Tests of Insensitivity to Scope," in
Supply: A Review of Alternative Approaches," in Determining the Value of Non-Marketed Goods:
Handbook of Labor Economics Vol. 3C. 0. Economic, Psychological, and Policy Relevant Aspects
Ashenfelter and D. Card, eds. Amsterdam: Elsevier of Contingent Valuation Methods. R. J. Kopp, W.
Science BV. Pommerhene and N. Schwartz, eds. Boston: Kluwer.
Bohm, Peter. 1972. "Estimating the Demand for Public Carson, Richard T.; Robert C. Mitchell, W. Michael
Goods: An Experiment," Europ. Econ. Rev. 3:2, pp. Hanemann, Raymond J. Kopp, Stanley Presser and
111-30. Paul A. Ruud. 1992. A Contingent Valuation Study of
. 1979. "Estimating Willingness to Pay: Why Lost Passive Use Values Resulting From the Exxon
and How?" Scand. J. Econ. 81:2, pp. 142-53. Valdez Oil Spill. Anchorage: Attorney Gen. Alaska.
1984a. "Revealing Demand for an Actual Chamberlin, Edward H. 1948. "An Experimental
Public Good,"J. Public Econ. 24, pp. 135-51. Imperfect Market," . Polit. Econ. 56:2, 95-108.
. 1984b. "Are There Practicable Demand- Coller, Maribeth and Melonie B. Williams. 1999.
Revealing Mechanisms?" in Public Finance and the "Eliciting Individual Discount Rates," Exper. Econ.
Quest for Efficiency. H. Hanusch, ed. Detroit: 2, pp. 107-27.
Wayne State U. Press. Conlisk, John. 1989. "Three Variants on the Allais
. 1994. "Behavior under Uncertainty without Example," Amer. Econ. Rev. 79:3, pp. 392-407.
Preference Reversal: A Field Experiment," in . 1996. "Why Bounded Rationality?" J. Econ.
Experimental Economics. J. Hey, ed. Heidelberg: Lit. 34:2, pp. 669-700.
Physica-Verlag. Cox, James C. 2004. "How To Identify Trust and
Bohm, Peter and Hans Lind. 1993. "Preference Reciprocity," Games Econ. Behav. 46:2, pp. 260-81.
Reversal, Real-World Lotteries, and Lottery- Cox, James C. and Stephen C. Hayne. 2002. "Barking
Interested Subjects," J. Econ. Behav. Org. 22:3, pp. Up the Right Tree: Are Small Groups Rational
327-48. Agents?" work. pap. Dept. Econ. U. Arizona.
Bornstein, Gary and Ilan Yaniv. 1998. "Individual and Cubitt, Robin P. and Robert Sugden. 2001. "On
Group Behavior in the Ultimatum Game: Are Money Pumps," Games Econ. Behav. 37:1, pp.
Groups More Rational Players?" Exper Econ. 1:1. 121-60.
pp. 101-108. Cummings, Ronald G.; Steven Elliott, Glenn W.
Breiter, Hans C.; Itzhak Aharon, Daniel Kahneman, Harrison and James Murphy. 1997. "Are
Anders Dale, and Peter Shizgal. 2001. "Functional Hypothetical Referenda Incentive Compatible?" J.
Imaging of Neural Responses to Expectancy and Polit. Econ. 105:3, pp. 609-21.
Experience of Monetary Gains and Losses," Neuron Cummings, Ronald G. and Glenn W. Harrison. 1994.
30:2, pp. 619-39. "Was the Ohio Court Well Informed in Their
Bronars, Stephen G. and Jeff Grogger. 1994. "The Assessment of the Accuracy of the Contingent
Economic Consequences of Unwed Motherhood: Valuation Method?" Natural Res. J. 34:1, pp. 1-36.
Using Twin Births as a Natural Experiment," Amer. Cummings, Ronald G.; Glenn W. Harrison and Laura
Econ. Rev. 84:5, pp. 1141-56. L. Osbore. 1995. "Can the Bias of Contingent
Burns, Penny. 1985. "Experience and Decision Valuation Be Reduced? Evidence from the
Laboratory," econ. work. pap. B-95-03, College Preference: A Critical Review,"J. Econ. Lit. 40:2, pp.
Business Admin., U. South Carolina. 351-401.
Cummings, Ronald G.; Glenn W Harrison and E. Gertner, R. 1993. "Game Shows and Economic
Elisabet Rutstr6m. 1995. "Homegrown Values and Behavior: Risk-Taking on Card Sharks," Quart. J.
Hypothetical Surveys: Is the Dichotomous Choice Econ. 108:2, pp. 507-21.
Approach Incentive Compatible?" Amer. Econ. Rev. Gigerenzer, Gerd; Peter M. Todd and the ABC
85:1, pp. 260-66. Research Group. 2000. Simple Heuristics That Make
Cummings, Ronald G. and Laura O. Taylor. 1999. Us Smart. NY: Oxford U. Press.
"Unbiased Value Estimates for Environmental Gimotty, Phyllis A. 2002. "Delivery of Preventive
Goods: A Cheap Talk Design for the Contingent Health Services for Breast Cancer Control: A
Valuation Method," Amer Econ. Rev. 89:3, pp. Longitudinal View of a Randomized Controlled
649-65. Trial,"Health Services Res. 37:1, pp. 65-85.
Deacon, Robert T. and Jon Sonstelie. 1985. "Rationing Greene, William H. 1993. Econometric Analysis, 2nd
by Waiting and the Value of Time: Results from a ed. NY: Macmillan.
Natural Experiment," . Polit. Econ. 93:4, pp. 627-47. Grether, David M.; R. Mark Isaac and Charles R. Plott.
Dehejia, Rajeev H. and Sadek Wahba. 1999. "Causal 1981. "The Allocation of Landing Rights by
Effects in Nonexperimental Studies: Reevaluating Unanimity among Competitors," Amer. Econ. Rev.
the Evaluation of Training Programs," J. Amer. Pap. Proceed. 71:May, pp. 166-71.
Statist. Assoc. 94:448, pp. 1053-62. 1989. The Allocation of Scarce Resources:
. 2002. "Propensity Score Matching for Experimental Economics and the Problem of
Nonexperimental Causal Studies," Rev. Econ. Allocating Airport Slots. Boulder: Westview Press.
Statist. 84, pp. 151-61. Grether David M. and Charles R. Plott. 1984. "The
Dickhaut, John; Kevin McCabe, Jennifer C. Nagode, Effects of Market Practices in Oligopolistic Markets:
Aldo Rustichini and Jos6 V. Pardo. 2003. "The Impact An Experimental Examination of the Ethyl Case,"
of the Certainty Context on the Process of Choice," Econ. Inquiry 22:0ct. pp. 479-507.
Proceed. Nat. Academy Sci. 100:March, pp. 3536-41. Haigh, Michael, and John A. List. 2004. "Do
Dillman, Don. 1978. mail and telephone surveys; The Professional Traders Exhibit Myopic Loss Aversion?
Total Design Method. NY: Wiley. An Experimental Analysis,"J. Finance 59, forthcom-
Duddy, Edward A. 1924 "Report on an Experiment in ing.
Teaching Method,"J. Polit. Econ. 32:5, pp. 582-603. Harbaugh, William T. and Kate Krause. 2000.
Dufwenberg, Martin and Uri Gneezy. 2000. "Children's Altruism in Public Good and Dictator
"Measuring Beliefs in an Experimental Lost Wallet Experiments," Econ. Inquiry 38:1, pp. 95-109.
Game," Games Econ. Behav. 30:2, pp. 163-82. Harbaugh, William T.; Kate Krause and Timothy R.
Dyer, Douglas and John H. Kagel. 1996. "Bidding in Berry. 2001. "GARP for Kids: On the Development
Common Value Auctions: How the Commercial of Rational Choice Behavior,"Amer. Econ. Rev. 91:5,
Construction Industry Corrects for the Winner's pp. 1539-45.
Curse," Manage. Sci. 42:10, pp. 1463-75. Harbaugh, William T.; Kate Krause and Lise
Eckel, Catherine C. and Philip J. Grossman. 1996. Vesterlund. 2002. "Risk Attitudes of Children and
"Altruism in Anonymous Dictator Games," Games Adults: Choices Over Small and Large Probability
Econ. Behav. 16, pp. 181-91. Gains and Losses," Exper. Econ. 5, pp. 53-84.
Ericsson, K. Anders, and Herbert A. Simon. 1993. Harless, David W and Colin F. Camerer. 1994 "The
Protocol Analysis: Verbal Reports as Data, rev. ed. Predictive Utility of Generalized Expected Utility
Cambridge, MA: MIT Press. Theories," Econometrica 62:6, pp. 1251-89.
Fan, Chinn-Ping. 2002. "Allais Paradox in the Small,"J. Harrison, Glenn W. 1988. "Predatory Pricing in A
Econ. Behav. Org. 49:3, pp. 411-21. Multiple Market Experiment," J. Econ. Behav. Org.
Ferber, Robert and Werner Z. Hirsch. 1978. "Social 9, pp. 405-17.
Experimentation and Economic Policy: A Survey,"J. 1992a. "Theory and Misbehavior of First-Price
Econ. Lit. 16:4, pp. 1379-414. Auctions: Reply,"Amer. Econ. Rev. 82:5, 142643.
. 1982. Social Experimentation and Economic . 1992b. "Market Dynamics, Programmed
Policy. NY: Cambridge U. Press. Traders, and Futures Markets: Beginning the
Fershtman, Chaim and Uri Gneezy. 2001. Laboratory Search for a Smoking Gun," Econ.
"Discrimination in a Segmented Society: An Record 68, Special Issue Futures Markets, pp. 46-62.
Experimental Approach," Quart. J. Econ. 116, pp. 2005a. "Field Experiments and Control," in
351-77. Field Experiments in Economics. J. Carpenter, G. W.
Fix, Michael and Raymond J. Struyk, eds. 1993. Clear Harrison and J. A. List, eds. Research in Exper.
and Convincing Evidence: Measurement of Econ. Vol. 10. Greenwich, CT: JAI Press,
Discrimination in America. Washington, DC: Urban 2005b "Experimental Evidence on
Institute Press. Alternative Environmental Valuation Methods,"
Frech, H. E. 1976. "The Property Rights Theory of the Environ. Res. Econ. 23, forthcoming.
Firm: Empirical Results from a Natural Harrison, Glenn W; Ronald M. Harstad and E.
Experiment," J. Polit. Econ. 84:1, pp. 143-52. Elisabet Rutstrim. 2004. "Experimental Methods
Frederick, Shane; George Loewenstein and Ted and Elicitation of Values," Exper. Econ. 7:June, pp.
O'Donoghue. 2002. "Time Discounting and Time 123-40.
Harrison, Glenn W. and Bengt Kristr6m. 1996. "On the Henrich, Joseph and Richard McElreath. 2002. "Are
Interpretation of Responses to Contingent Valuation Peasants Risk-Averse Decision Markers?" Current
Surveys," in Current Issues in Environmental Anthropology 43:1, pp. 172-81.
Economics. P O. Johansson, B. Kristrim and K. G. Hey, John D. 1995. "Experimental Investigations of
Maler, eds. Manchester: Manchester U. Press. Errors in Decision Making Under Risk," Europ.
Harrison, Glenn W.; Morten Igel Lau and Melonie B. Econ. Rev. 39, pp. 633-40.
Williams. 2002. "Estimating Individual Discount Hey, John D. and Chris Orme. 1994. "Investigating
Rates for Denmark: A Field Experiment," Amer. Generalizations of Expected Utility Theory Using
Econ. Rev. 92:5, pp. 1606-17. Experimental Data," Econometrica 62:6, pp.
Harrison, Glenn W. and James C. Lesley. 1996. "Must 1291-326.
Contingent Valuation Surveys Cost So Much?" J. Hoffman, Elizabeth; Kevin A. McCabe and Vernon L.
Environ. Econ. Manage. 31:1, pp. 79-95. Smith. 1996. "On Expectations and the Monetary
Harrison, Glenn W and John A. List. 2003. "Naturally Stakes in Ultimatum Games," Int. J. Game Theory
Occurring Markets and Exogenous Laboratory 25:3, pp. 289-301.
Experiments: A Case Study of the Winner's Curse," Holt, Charles A. and Susan K. Laury. 2002. "Risk
work. pap. 3-14, Dept. Econ., College Bus. Admin., Aversion and Incentive Effects," Amer. Econ. Rev.
U. Central Florida. 92:5, pp. 1644-55.
Harrison, Glenn W; Thomas F. Rutherford and David Hong, James T. and Charles R. Plott. 1982. "Rate
G. Tarr. 1997. "Quantifying the Uruguay Round," Filing Policies for Inland Water Transportation: An
Econ. J. 107:444, pp. 1405-30. Experimental Approach," Bell J. Econ. 13:1, pp.
Harrison, Glenn W and Elisabet Rutstr6m. 2001. 1-19.
"Doing It Both Ways-Experimental Practice and Hotz, V. Joseph. 1992. "Designing an Evaluation of
Heuristic Context," Behav. Brain Sci. 24:3, pp. JTPA," in Evaluating Welfare and Training
413-14. Programs. C. Manski and I. Garfinkel, eds.
Harrison, Glenn W and H. D. Vinod. 1992. "The Cambridge: Harvard U. Press.
Sensitivity Analysis of Applied General Equilibrium Hoxby, Caroline M. 2000. "The Effects of Class Size on
Models: Completely Randomized Factorial Sampling Student Achievement: New Evidence From
Designs," Rev. Econ. Statist. 74:2, pp. 357-62. Population Variation," Quart. J. Econ. 115:4, pp.
Hausman, Jerry A. 1993. Contingent Valuation. NY: 1239-85.
North-Holland. Imber, David; Gay Stevenson and Leanne Wilks. 1991.
Hausman, Jerry A. and David A. Wise. 1985. Social A Contingent Valuation Survey of the Kakadu
Experimentation Chicago: U. Chicago Press. Conservation Zone Canberra: Austral. Govt. Pub.,
Hayes, J. R. and H. A. Simon. 1974. "Understanding Resource Assess. Con.
Written Problem Instructions," in Knowledge and Isaac, R. Mark and Vernon L. Smith. 1985. "In Search
Cognition. L. W. Gregg, ed. Hillsdale, NJ: Erlbaum. of Predatory Pricing," J. Polit. Econ. 93:2, pp.
Heckman, James J. 1998. "Detecting Discrimination," 320-45.
J. Econ. Perspect. 12:2, pp. 101-16. Johnson, Eric J.; Colin F. Camerer, Sen Sankar and
Heckman, James J. and Richard Robb. 1985. Talia Tymon. 2002. "Detecting Failures of Backward
"Alternative Methods for Evaluating the Impact of Induction: Monitoring Information Search in
Interventions," in Longitudinal Analysis of Labor Sequential Bargaining," J. Econ. Theory 104:1, pp.
Market Data. J. Heckman and B. Singer, eds. NY: 16-47.
Cambridge U. Press. Kachelmeier, Steven J. and Mohamed Shehata. 1992.
Heckman, James J. and Peter Siegelman. 1993. "The "Examining Risk Preferences Under High Monetary
Urban Institute Audit Studies: Their Methods and Incentives: Experimental Evidence from the
Findings," in Clear and Convincing Evidence: People's Republic of China," Amer. Econ. Rev. 82:5,
Measurement of Discrimination in America. M. Fix pp. 1120-41.
and R. J. Struyk, eds. Washington, DC: Urban Kagel, John H.; Raymond C. Battalio and Leonard
Institute Press. Green. 1995. Economic Choice Theory. An
Heckman, James J. and Jeffrey A. Smith. 1995. Experimental Analysis of Animal Behavior. NY:
"Assessing the Case for Social Experiments,"J. Econ. Cambridge U. Press.
Perspect. 9:2, pp. 85-110. Kagel, John H.; Raymond C. Battalio and James M.
Henrich, Joseph. 2000. "Does Culture Matter in Walker. 1979. "Volunteer Artifacts in Experiments in
Economic Behavior? Ultimatum Game Bargaining Economics: Specification of the Problem and Some
Among the Machiguenga," Amer. Econ. Rev. 90:4, Initial Data from a Small-Scale Field Experiment,"
pp. 973-79. in Research in Experimental Economics. Vol. 1. V.L.
Henrich, Joseph; Robert Boyd, Samuel Bowles, Colin Smith, ed. Greenwich, CT: JAI Press.
Camerer, Herbert Gintis, Richard McElreath and Kagel, John H.; Ronald M. Harstad and Dan Levin.
Ernst Fehr. 2001. "In Search of Homo Economicus: 1987. "Information Impact and Allocation Rules in
Experiments in 15 Small-Scale Societies," Amer. Auctions with Affiliated Private Values: A Laboratory
Econ. Rev. 91:2, pp. 73-79. Study," Econometrica 55:6, pp. 1275-304.
Henrich, Joseph; Robert Boyd, Samuel Bowles, Colin Kagel, John H. and Dan Levin. 1986. "The Winner's
Camerer, Ernst Fehr and Herbert Gintis, eds. 2004. Curse and Public Information in Common Value
Foundations of Human Sociality. NY: Oxford U. Press Auctions," Amer Econ. Rev. 76:5, pp. 894-920.
1999. "Common Value Auctions with Insider from the Vantage-Point of Behavioral Economics,"
Information," Econometrica 67:5, pp. 1219-38. Econ. J. 109:453, pp. F25-F34.
2002. Common Value Auctions and the Loomes, Graham; Peter G. Moffatt and Robert
Winner's Curse. Princeton: Princeton U. Press. Sugden. 2002, "A Microeconometric Test of
Kagel, John H.; Don N. MacDonald and Raymond C. Alternative Stochastic Theories of Risky Choice," J.
Battalio. 1990. "Tests of'Fanning Out' of Indifference Risk Uncertainty 24:2, pp. 103-30.
Curves: Results from Animal and Human Loomes, Graham and Robert Sugden. 1995.
Experiments," Amer. Econ. Rev. 80:4, pp. 912-21. "Incorporating a Stochastic Element Into Decision
Kramer, Michael and Stanley Shapiro. 1984. "Scientific Theories," Europ. Econ. Rev. 39, pp. 641-48.
Challenges in the Application of Randomized Trials," . 1998. "Testing Different Stochastic
J. Amer. Medical Assoc. 252:19, pp. 2739-45. Specifications of Risky Choice," Economica 65, pp.
Krueger, Alan B. 1999. "Experimental Estimates of 581-98.
Production Functions, "Quart. J. Econ. 114:2, pp. Lucking-Reiley, David. 1999. "Using Field
497-532. Experiments to Test Equivalence Between Auction
Kunce, Mitch; Shelby Gerking and William Morgan. Formats: Magic on the Internet," Amer. Econ. Rev.
2002. "Effects of Environmental and Land Use 89:5, pp. 1063-80.
Regulation in the Oil and Gas Industry Using the Machina, Mark J. 1989. "Dynamic Consistency and
Wyoming Checkerboard as an Experimental Non-Expected Utility Models of Choice Under
Design," Amer. Econ. Rev. 92:5, pp. 1588-93. Uncertainty,"J. Econ. Lit. 27:4, pp. 1622-68.
Lalonde. Robert J. 1986. "Evaluating the Econometric McCabe, Kevin; Daniel Houser, Lee Ryan, Vernon
Evaluations of Training Programs with Experimental Smith and Theodore Trouard. 2001. "A Functional
Data," Amer. Econ. Rev. 76:4, pp. 604-20. Imaging Study of Cooperation in Two-Person
Levine, Michael E. and Charles R. Plott. 1977. Reciprocal Exchange," Proceed. Nat. Academy Sci.
"Agenda Influence and Its Implications," Virginia 98:20, pp. 11832-35.
Law Rev. 63:May, pp. 561-604. McClennan, Edward F. 1990. Rationality and
Levitt, Steven D. 2003. "Testing Theories of Dynamic Choice NY: Cambridge U. Press.
Discrimination: Evidence from the Weakest Link," McDaniel, Tanga M. and E. Elisabet Rutstrim, 2001.
NBER work. pap. 9449. "Decision Making Costs and Problem Solving
Lichtenstein, Sarah and Paul Slovic. 1973. "Response- Performance," Exper. Econ. 4:2, pp. 145-61.
Induced Reversals of Gambling: An Extended Metrick, Andrew. 1995. "A Natural Experiment in
Replication in Las Vegas," J. Exper Psych. 101, pp. 'Jeopardy!"'Amer. Econ. Rev. 85:1, pp. 240-53.
16-20. Meyer, Bruce D.; W Kip Viscusi and David L. Durbin.
List, John A. 2001. "Do Explicit Warnings Eliminate 1995. "Workers'Compensation and Injury Duration:
the Hypothetical Bias in Elicitation Procedures? Evidence from a Natural Experiment," Amer. Econ.
Evidence from Field Auctions for Sportscards," Rev. 85:3, pp. 322-40.
Amer Econ. Rev. 91:5, pp. 1498-507. Milgrom, Paul R. and Robert J. Weber. 1982. "A
.2003. "Friend or Foe: A Natural Experiment Theory of Auctions and Competitive Bidding,"
of the Prisoner's Dilemma," unpub. manuscript, U. Econometrica 50:5, pp. 1089-122.
Maryland Dept. Ag. Res. Econ. Neisser, Ulric, and Ira E. Hyman, Jr. eds. 2000.
.2004a. "Young, Selfish and Male: Field Memory Observed: Remembering in Natural
Evidence of Social Preferences," Econ. J. 114:492, Contexts. 2nd ed. NY: Worth Publishers.
pp. 121-49. Pearl, Judea. 1984. Heuristics: Intelligent Search
.2004b. "The Nature and Extent of Strategies for Computer Problem Solving. Reading,
Discrimination in the Marketplace: Evidence from MA: Addison-Wesley.
the Field," Quart. J. Econ. 119:1, pp. 49-89. Philipson, Tomas and Larry V. Hedges. 1998. "Subject
.2004c. "Neoclassical Theory Versus Prospect Evaluation in Social Experiments," Econometrica
Theory: Evidence from the Marketplace," 66:2, pp. 381-408.
Econometrica 72:2, pp. 615-25. Plott, Charles R. and Michael E. Levine. 1978. "A
.2004d. "Field Experiments: An Introduction Model of Agenda Influence on Committee
and Survey," work. pap., U. Maryland. Dept. Ag. Decisions," Amer. Econ. Rev. 68:1, pp. 146-60.
Res. Econ. and Dept. Econ. Riach, P. A. and J. Rich. 2002. "Field Experiments of
2004e. "Testing Neoclassical Competitive Discrimination in the Market Place," Econ. J.
Theory in Multi-Lateral Decentralized Markets," J. 112:483, pp. F480-F518.
Polit. Econ. 112:5, pp. 1131-56.. Rosenbaum, P. and Donald Rubin. 1983. "The Central
List, John A. and David Lucking-Reiley. 2000. Role of the Propensity Score in Observational
"Demand Reduction in a Multi-Unit Auction: Studies for Causal Effects," Biometrika 70, pp.
Evidence from a Sportscard Experiment," Amer 41-55.
Econ. Rev. 90:4, pp. 961-72. . 1984. "Reducing Bias in Observational
. 2002. "The Effects of Seed Money and Studies Using Multivariate Matched Sampling
Refunds on Charitable Giving: Experimental Methods that Incorporate the Propensity Score," J.
Evidence from a University Capital Campaign," J. Amer. Statist. Assoc. 79, pp. 39-68.
Polit. Econ. 110:1, pp. 215-33. Rosenthal, R. and L. Jacobson. 1968. Pygmalion in the
Loewenstein, George. 1999. "Experimental Economics Classroom. NY: Holt, Rhinehart & Winston.
Rosenzweig, Mark R. and Kenneth I. Wolpin. 2000. Smith, Kip; John Dickhaut, Kevin McCabe and Jos6 V.
"Natural 'Natural Experiments' in Economics," J. Pardo. 2002. "Neuronal Substrates for Choice under
Econ. Lit. 38:4, pp. 827-74. Ambiguity, Risk, Gains, and Losses," Manage. Sci.
Roth, Alvin E. 1991. "A Natural Experiment in the 48:6, pp. 711-18.
Organization of Entry-Level Labor Markets: Smith, V. Kerry and Laura Osborne. 1996. "Do
Regional Markets for New Physicians and Surgeons Contingent Valuation Estimates Pass a Scope Test? A
in the United Kingdom," Amer. Econ. Rev. 81:3, pp. Meta Analysis," J. Environ. Econ. Manage. 31, pp.
415-40. 287-301.
Roth, Alvin E. and Michael W K. Malouf. 1979. Smith, Vernon L. 1962. "An Experimental Study of
"Game-Theoretic Models and the Role of Competitive Market Behavior,"J. Polit. Econ. 70, pp.
Information in Bargaining," Psych. Rev. 86, pp. 111-37.
574-94. . 1982. "Microeconomic Systems as an
Roth, Alvin E.; Vesna Prasnikar, Masahiro Okuno- Experimental Science," Amer. Econ. Rev. 72:5, pp.
Fujiwara and Shmuel Zamir. 1991. "Bargaining and 923-55.
Market Behavior in Jerusalem, Ljubljana, . 2003. "Constructivist and Ecological
Pittsburgh, and Tokyo: An Experimental Study," Rationality in Economics," Amer Econ. Rev. 93:3,
Amer. Econ. Rev. 81:5, pp. 1068-95. pp. 465-508.
Rowe, R. D.; W. Schulze, W. D. Shaw, L. D. Chestnut Smith, Vernon L.; G. L. Suchanek and Arlington W.
and D. Schenk. 1991. "Contingent Valuation of Williams. 1988. "Bubbles, Crashes, and Endogenous
Natural Resource Damage Due to the Nestucca Oil Expectations in Experimental Spot Asset Markets,"
Spill," report to British Columbia Ministry of Econometrica 56, pp. 1119-52.
Environment. Sorenson, Roy A. 1992. Thought Experiments. NY:
Rutstrim, E. Elisabet. 1998. "Home-Grown Values Oxford U. Press.
and the Design of Incentive Compatible Auctions," Starmer, Chris. 1999. "Experiments in Economics:
Int. J. Game Theory27:3, pp. 427-41. Should We Trust the Dismal Scientists in White
Rutstr6m, E. Elisabet and Melonie B. Williams. 2000. Coats?"J. Exper. Method. 6, pp. 1-30.
"Entitlements and Fairness: An Experimental Study Sunstein, Cass R.; Reid Hastie, John W. Payne, David
of Distributive Preferences," J. Econ. Behav. Org. A. Schkade and W Kip Viscusi. 2002. Punitive
43, pp. 75-80. Damages: How Juries Decide. Chicago: U. Chicago
Sanfey, Alan G.; James K. Rilling, Jessica A. Aronson, Press.
Leigh E. Nystrom and Jonathan D. Cohen. 2003. Tenorio, Rafael and Timothy Cason. 2002. "To Spin or
"The Neural Basis of Economic Decision-Making in Not To Spin? Natural and Laboratory Experiments
the Ultimatum Game," Science 300:5626, pp. from The Price is Right," Econ. J. 112, pp. 170-95.
1755-58. Warner, John T. and Saul Pleeter. 2001. "The Personal
Slonim, Robert and Alvin E. Roth. 1998. "Learning in Discount Rate: Evidence from Military Downsizing
High Stakes Ultimatum Games: An Experiment in Programs," Amer. Econ. Rev. 91:1, pp. 33-53.
the Slovak Republic," Econometrica 66:3, pp. Wierzbicka, Anna. 1996. Semantics: Primes and
569-96. Universals. NY: Oxford U. Press.
Smith, Jeffrey and Petra Todd. 2000. "Does Matching Winkler, Robert L. and Allan H. Murphy. 1973.
Address LaLonde's Critique of Nonexperimental "Experiments in the Laboratory and the Real
Estimates?" unpub. man., Dept. Econ. U. Western World," Org. Behav. Human Perform. 10, pp.
Ontario. 252-70.

(Harrison List, JEL 2004) Field Experiments

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Harrison List, JEL 2004) Field Experiments

Uploaded by

Copyright:

Available Formats

American Economic Association

GLENN W. HARRISONand JOHNA. LIST1

1. Introduction experimental environment. We do not see

Our second point is that many of the char-

differences? We believe field experiments (McKinley Blackburn, Harrison, and

understandthe limitationsof the generality this is an importantcomponent of the inter-

that employs a standard subject pool of extent of discrimination in the sports-card

is that students might be self-selected in allows one to remove this recruitmentbias

manner,cannotubiquitouslyprovidereliable rather than abstract commodities, is not

Figure 1. The Tower of Hanoi Game

These general insights motivated the

region of China, as discussed by KS (p. has repeatedly stressed the importance of

resource. Similarly,Jeffrey Carpenter, Amrita We consider here two potentially important

Experimental Proclamation. Whether sub- performed identically on achievement tests

"econometric(k)s." We have discussed "for"the transactionwith the experimenter.

"auditorsare sometimes instructed on the expressed their concerns, summarized by

The fourth paragraph of our cover letter reem-

letter contained the following sentences in Although no promise of a direct policy

In 1992, the United States Department

iment are simply not robustto the sampling

Officers Enlisted Personnel

Figure 3. Implied Discount Rates IncorporatingModel Uncertainty

them to be used to draw reliable inferences "naturalnaturalexperimentalapproach"by

stochastic specifications for experimental illustrated by example. We choose an exam-

There are, of course, other differences 79Player 1 transferssome percentage of an endowment

You might also like