You are on page 1of 48

dec04_Article 1 12/14/04 2:25 PM Page 1009

Journal of Economic Literature


Vol. XLII (December 2004) pp. 1009–1055

Field Experiments
GLENN W. HARRISON and JOHN A. LIST1

1. Introduction experimental environment. We do not see


the notion of a “sterile environment” as a
I n some sense every empirical researcher is
reporting the results of an experiment.
Every researcher who behaves as if an exoge-
negative, provided one recognizes its role in
the research discovery process. In one
sense, that sterility allows us to see in crisp
nous variable varies independently of an error
relief the effects of exogenous treatments on
term effectively views their data as coming
behavior. However, lab experiments in isola-
from an experiment. In some cases this belief
tion are necessarily limited in relevance for
is a matter of a priori judgement; in some
predicting field behavior, unless one wants
cases it is based on auxiliary evidence and
to insist a priori that those aspects of eco-
inference; and in some cases it is built into the
nomic behavior under study are perfectly
design of the data collection process. But the
general in a sense that we will explain.
distinction is not always as bright and clear.
Rather, we see the beauty of lab experi-
Testing that assumption is a recurring difficul-
ments within a broader context—when they
ty for applied econometricians, and the search
are combined with field data, they permit
always continues for variables that might bet-
sharper and more convincing inference.2
ter qualify as truly exogenous to the process
In search of greater relevance, experi-
under study. Similarly, the growing popularity
mental economists are recruiting subjects in
of explicit experimental methods arises in
the field rather than in the classroom, using
large part from the potential for constructing
field goods rather than induced valuations,
the proper counterfactual.
and using field context rather than abstract
Field experiments provide a meeting ground
between these two broad approaches to
2
empirical economic science. By examining the When we talk about combining lab and field data, we
nature of field experiments, we seek to make it do not just mean a summation of conclusions. Instead, we
have in mind the two complementing each other in some
a common ground between researchers. functional way, much as one might conduct several lab
We approach field experiments from the experiments in order to tease apart potential confounds.
perspective of the sterility of the laboratory For example, James Cox (2004) demonstrates nicely how
“trust” and “reciprocity” are often confounded with “other
regarding preferences,” and can be better identified sep-
1
Harrison: Department of Economics, College of arately if one undertakes several types of experiments
Business Administration, University of Central Florida; with the same population. Similarly, Alvin Roth and
List: Department of Agricultural and Resource Economics Michael Malouf (1979) demonstrate how the use of dollar
and Department of Economics, University of Maryland, payoffs can confound tests of cooperative game theory
and NBER. We are grateful to Stephen Burks, Colin with less information of one kind (knowledge of the utili-
Camerer, Jeffrey Carpenter, Shelby Gerking, R. Mark ty function of the other player), and more information of
Isaac, Alan Krueger, John McMillan, Andreas Ortmann, another kind (the ability to make interpersonal compar-
Charles Plott, David Reiley, E. Elisabet Rutström, isons of monetary gain), than is usually assumed in the
Nathaniel Wilcox, and the referees for generous comments. leading theoretical prediction.

1009
dec04_Article 1 12/14/04 2:25 PM Page 1010

1010 Journal of Economic Literature, Vol. XLII (December 2004)

terminology in instructions.3 We argue that Our second point is that many of the char-
there is something methodologically funda- acteristics of field experiments can be found
mental behind this trend. Field experiments in varying, correlated degrees in lab experi-
differ from laboratory experiments in many ments. Thus, many of the characteristics that
ways. Although it is tempting to view field people identify with field experiments are
experiments as simply less controlled variants not only found in field experiments, and
of laboratory experiments, we argue that to do should not be used to differentiate them
so would be to seriously mischaracterize them. from lab experiments.
What passes for “control” in laboratory exper- Our third point, following from the first
iments might in fact be precisely the opposite two, is that there is much to learn from field
if it is artificial to the subject or context of the experiments when returning to the lab. The
task. In the end, we see field experiments as unexpected behaviors that occur when one
being methodologically complementary to loosens control in the field are often indica-
traditional laboratory experiments.4 tors of key features of the economic transac-
Our primary point is that dissecting the tion that have been neglected in the lab. Thus,
characteristics of field experiments helps field experiments can help one design better
define what might be better called an ideal lab experiments, and have a methodological
experiment, in the sense that one is able to role quite apart from their complementarity
observe a subject in a controlled setting but at a substantive level.
where the subject does not perceive any of In section 2 we offer a typology of field
the controls as being unnatural and there is no experiments in the literature, identifying the
deception being practiced. At first blush, the key characteristics defining the species. We
idea that one can observe subjects in a natural suggest some terminology to better identify
setting and yet have controls might seem different types of field experiments, or more
contradictory, but we will argue that it is not.5 accurately to identify different characteris-
tics of field experiments. We do not propose
3
We explain this jargon from experimental economics a bright line to define some experiments as
below. field experiments and others as something
4
This view is hardly novel: for example, in decision
research, Robert Winkler and Allan Murphy (1973) pro- else, but a set of criteria that one would
vide an excellent account of the difficulties of reconciling expect to see in varying degrees in a field
suboptimal probability assessments in artefactual laborato- experiment. We propose six factors that can
ry settings with field counterparts, as well as the limitations
of applying inferences from laboratory data to the field. be used to determine the field context of an
5
Imagine a classroom setting in which the class breaks experiment: the nature of the subject pool,
up into smaller tutorial groups. In some groups a video cov- the nature of the information that the sub-
ering certain material is presented, in another group a free
discussion is allowed, and in another group there is a more jects bring to the task, the nature of the com-
traditional lecture. Then the scores of the students in each modity, the nature of the task or trading rules
group are examined after they have taken a common exam. applied, the nature of the stakes, and the
Assuming that all of the other features of the experiment are
controlled, such as which student gets assigned to which environment in which the subjects operate.
group, this experiment would not seem unnatural to the sub- Having identified what defines a field exper-
jects. They are all students doing what comes naturally to iment, in section 3 we put experiments in
students, and these three teaching alternatives are each stan-
dardly employed. Along similar lines in economics, albeit general into methodological perspective, as
with simpler technology and less control than one might like, one of the ways that economists can identify
see Edward Duddy (1924). For recent novel examples in the treatment effects. This serves to remind us
economics literature, see Colin Camerer (1998) and David
Lucking-Reiley (1999). Camerer (1998) places bets at a race why we want control and internal validity in
track to examine if asset markets can be manipulated, while all such analyses, whether or not they consti-
Lucking-Reiley (1999) uses internet-based auctions in a pre- tute field experiments. In sections 4 through
existing market with an unknown number of participating
bidders to test the theory of revenue equivalence between 6 we describe strengths and weaknesses of
four major single-unit auction formats. the broad types of field experiments. Our
dec04_Article 1 12/14/04 2:25 PM Page 1011

Harrison and List: Field Experiments 1011

literature review is necessarily selective, material, language, animal, etc., and not in
although List (2004d) offers a more complete the laboratory, study, or office.” This orients
bibliography. us to think of the natural environment of the
In sections 7 and 8 we review two types of different components of an experiment.6
experiments that may be contrasted with It is important to identify what factors
ideal field experiments. One is called a social make up a field experiment so that we can
experiment, in the sense that it is a deliber- functionally identify what factors drive
ate part of social policy by the government. results in different experiments. To provide
Social experiments involve deliberate, ran- a direct example of the type of problem that
domized changes in the manner in which motivated us, when List (2001) obtains
some government program is implemented. results in a field experiment that differ from
They have become popular in certain areas, the counterpart lab experiments of Ronald
such as employment schemes and the detec- Cummings, Glenn Harrison, and Laura
tion of discrimination. Their disadvantages Osborne (1995) and Cummings and Laura
have been well documented, given their Taylor (1999), what explains the difference?
political popularity, and there are several Is it the use of data from a particular market
important methodological lessons from those whose participants have selected into the
debates for the design of field experiments. market instead of student subjects; the use
The other is called a “natural experiment.” of subjects with experience in related tasks;
The idea is to recognize that some event that the use of private sports-cards as the under-
naturally occurs in the field happens to have lying commodity instead of an environmen-
some of the characteristics of a field experi- tal public good; the use of streamlined
ment. These can be attractive sources of data instructions, the less-intrusive experimental
on large-scale economic transactions, but methods, mundane experimenter effects, or
usually at some cost due to the lack of con- is it some combination of these and similar
trol, forcing the researcher to make certain
6
identification assumptions. If we are to examine the role of “controls” in different
Finally, in section 9 we briefly examine experimental settings, it is appropriate that this word also
be defined carefully. The OED (2nd ed.) defines the verb
related types of experiments of the mind. In “control” in the following manner: “To exercise restraint or
one case these are the “thought experi- direction upon the free action of; to hold sway over, exer-
ments” of theorists and statisticians, and in cise power or authority over; to dominate, command.” So
the word means something more active and intervention-
the other they are the “neuro-economics ist than is suggested by its colloquial clinical usage.
experiments” provided by technology. The Control can include such mundane things as ensuring ster-
objective is simply to identify how they differ ile equipment in a chemistry lab, to restrain the free flow
of germs and unwanted particles that might contaminate
from other types of experiments we consider, some test. But when controls are applied to human behav-
and where they fit in. ior, we are reminded that someone’s behavior is being
restrained to be something other than it would otherwise
2. Defining Field Experiments be if the person were free to act. Thus we are immediate-
ly on alert to be sensitive, when studying responses from a
There are several ways to define words. controlled experiment, to the possibility that behavior is
One is to ascertain the formal definition by unusual in some respect. The reason is that the very con-
trol that defines the experiment may be putting the sub-
looking it up in the dictionary. Another is to ject on an artificial margin. Even if behavior on that
identify what it is that you want the word- margin is not different than it would be without the con-
label to differentiate. trol, there is the possibility that constraints on one margin
may induce effects on behavior on unconstrained margins.
The Oxford English Dictionary (Second This point is exactly the same as the one made in the “the-
Edition) defines the word “field” in the fol- ory of the second best” in public policy. If there is some
lowing manner: “Used attributively to immutable constraint on one of the margins defining an
optimum, it does not automatically follow that removing a
denote an investigation, study, etc., carried constraint on another margin will move the system closer
out in the natural environment of a given to the optimum.
dec04_Article 1 12/14/04 2:25 PM Page 1012

1012 Journal of Economic Literature, Vol. XLII (December 2004)

differences? We believe field experiments (McKinley Blackburn, Harrison, and


have matured to the point that some frame- Rutström 1994). Alternatively, the subject
work for addressing such differences in a pool can be designed to represent a target
systematic manner is necessary. population of the economy (e.g., traders at
the Chicago Board of Trade in Michael
2.1 Criteria that Define Field Experiments
Haigh and John List 2004) or the general
Running the risk of oversimplifying what population (e.g., the Danish population in
is inherently a multidimensional issue, we Harrison, Morton Igel Lau, and Melonie
propose six factors that can be used to deter- Williams 2002).
mine the field context of an experiment: In addition, nonstandard subject pools
• the nature of the subject pool, might bring experience with the commodity
• the nature of the information that the or the task to the experiment, quite apart
subjects bring to the task, from their wider array of demographic char-
• the nature of the commodity, acteristics. In the field, subjects bring cer-
• the nature of the task or trading rules tain information to their trading activities in
applied, addition to their knowledge of the trading
• the nature of the stakes, and institution. In abstract settings the impor-
• the nature of the environment that the tance of this information is diminished, by
subject operates in. design, and that can lead to behavioral
We recognize at the outset that these changes. For example, absent such informa-
characteristics will often be correlated to tion, risk aversion can lead to subjects
varying degrees. Nonetheless, they can be requiring a risk premium when bidding for
used to propose a taxonomy of field experi- objects with uncertain characteristics.
ments that will, we believe, be valuable as The commodity itself can be an impor-
comparisons between lab and field experi- tant part of the field. Recent years have
mental results become more common. seen a growth of experiments concerned
Student subjects can be viewed as the with eliciting valuations over actual goods,
standard subject pool used by experi- rather than using induced valuations over
menters, simply because they are a conven- virtual goods. The distinction here is
ience sample for academics. Thus when one between physical goods or actual services
goes “outdoors” and uses field subjects, they and abstractly defined goods. The latter
should be viewed as nonstandard in this have been the staple of experimental eco-
sense. But we argue that the use of nonstan- nomics since Edward Chamberlin (1948)
dard subjects should not automatically qual- and Vernon Smith (1962), but imposes an
ify the experiment as a field experiment. The artificiality that could be a factor influenc-
experiments of Cummings, Harrison, and E. ing behavior.7 Such influences are actually
Elizabet Rutström (1995), for example, used of great interest, or should be. If the nature
individuals recruited from churches in order of the commodity itself affects behavior in
to obtain a wider range of demographic a way that is not accounted for by the the-
characteristics than one would obtain in the ory being applied, then the theory has at
standard college setting. The importance of best a limited domain of applicability that
a nonstandard subject pool varies from we should be aware of, and at worse is sim-
experiment to experiment: in this case it sim- ply false. In either case, one can better
ply provided a less concentrated set of socio-
demographic characteristics with respect to 7
It is worth noting that neither Chamberlin (1948) nor
age and education level, which turned out to Smith (1962) used real payoffs to motivate subjects in their
market experiments, although Smith (1962) does explain
be important when developing statistical how that could be done and reports one experiment (fn 9.,
models to adjust for hypothetical bias p. 121) in which monetary payoffs were employed.
dec04_Article 1 12/14/04 2:25 PM Page 1013

Harrison and List: Field Experiments 1013

understand the limitations of the generality this is an important component of the inter-
of theory only via empirical testing.8 play between the lab and field. Early illus-
Again, however, just having one field char- trations of the value of this approach include
acteristic, in this case a physical good, does David Grether, R. Mark Isaac, and Charles
not constitute a field experiment in any fun- Plott [1981, 1989], Grether and Plott [1984],
damental sense. Rutström (1998) sold lots and James Hong and Plott [1982].
and lots of chocolate truffles in a laboratory The nature of the stakes can also affect
study of different auction institutions field responses. Stakes in the laboratory
designed to elicit values truthfully, but hers might be very different than those encoun-
was very much a lab experiment despite the tered in the field, and hence have an effect
tastiness of the commodity. Similarly, Ian on behavior. If valuations are taken seriously
Bateman et al. (1997) elicited valuations over when they are in the tens of dollars, or in the
pizza and dessert vouchers for a local restau- hundreds, but are made indifferently when
rant. While these commodities were not the price is less than one dollar, laboratory or
actual pizza or dessert themselves, but field experiments with stakes below one dol-
vouchers entitling the subject to obtain lar could easily engender imprecise bids. Of
them, they are not abstract. There are many course, people buy inexpensive goods in the
other examples in the experimental literature field as well, but the valuation process they
of designs involving physical commodities.9 use might be keyed to different stake levels.
The nature of the task that the subject is Alternatively, field experiments in relatively
being asked to undertake is an important poor countries offer the opportunity to eval-
component of a field experiment, since one uate the effects of substantial stakes within a
would expect that field experience could given budget.
play a major role in helping individuals The environment of the experiment can
develop heuristics for specific tasks. The lab also influence behavior. The environment
experiments of John Kagel and Dan Levin can provide context to suggest strategies and
(1999) illustrate this point, with “super-expe- heuristics that a lab setting might not. Lab
rienced” subjects behaving differently than experimenters have always wondered
inexperienced subjects in terms of their whether the use of classrooms might engen-
propensity to fall prey to the winners’ curse. der role-playing behavior, and indeed this is
An important question is whether the suc- one of the reasons experimental economists
cessful heuristics that evolve in certain field are generally suspicious of experiments
settings “travel” to the other field and lab without salient monetary rewards. Even
settings (Harrison and List 2003). Another with salient rewards, however, environmen-
aspect of the task is the specific parameteri- tal effects could remain. Rather than view
zation that is adopted in the experiment. them as uncontrolled effects, we see them as
One can conduct a lab experiment with worthy of controlled study.
parameter values estimated from the field
2.2 A Proposed Taxonomy
data, so as to study lab behavior in a “field-
relevant” domain. Since theory is often Any taxonomy of field experiments runs
domain-specific, and behavior can always be, the risk of missing important combinations of
the factors that differentiate field experi-
8 ments from conventional lab experiments.
To use the example of Chamberlin (1948) again, List
(2004e) takes the natural next step by exploring the pre- There is some value, however, in having
dictive power of neoclassical theory in decentralized, nat- broad terms to differentiate what we see as
urally occurring field markets. the key differences. We propose the following
9
We would exclude experiments in which the com-
modity was a gamble, since very few of those gambles take terminology:
the form of naturally occurring lotteries. • a conventional lab experiment is one
dec04_Article 1 12/14/04 2:25 PM Page 1014

1014 Journal of Economic Literature, Vol. XLII (December 2004)

that employs a standard subject pool of extent of discrimination in the sports-card


students, an abstract framing, and an marketplace.
imposed10 set of rules;
• an artefactual field experiment is the 3. Methodological Importance of Field
same as a conventional lab experiment Experiments
but with a nonstandard subject pool;11 Field experiments are methodologically
• a framed field experiment is the same as important because they mechanically force
an artefactual field experiment but with the rest of us to pay attention to issues that
field context in either the commodity, great researchers seem to intuitively
task, or information set that the subjects address. These issues cannot be comfortably
can use;12 forgotten in the field, but they are of more
• a natural field experiment is the same as general importance.
a framed field experiment but where The goal of any evaluation method for
the environment is one where the sub- “treatment effects” is to construct the prop-
jects naturally undertake these tasks er counterfactual, and economists have
and where the subjects do not know spent years examining approaches to this
that they are in an experiment.13 problem. Consider five alternative methods
We recognize that any such taxonomy of constructing the counterfactual: con-
leaves gaps, and that certain studies may not trolled experiments, natural experiments,
fall neatly into our classification scheme. propensity score matching (PSM), instru-
Moreover, it is often appropriate to con- mental variables (IV) estimation, and struc-
duct several types of experiments in order to tural approaches. Define y1 as the outcome
identify the issue of interest. For example, with treatment, y0 as the outcome without
Harrison and List (2003) conducted artefac- treatment, and let T1 when treated and
tual field experiments and framed field T0 when not treated.14 The treatment
experiments with the same subject pool, pre- effect for unit i can then be measured as
cisely to identify how well the heuristics that τiyi1yi0. The major problem, however, is
might apply naturally in the latter setting one of a missing counterfactual: τi is
“travel” to less context-ridden environments unknown. If we could observe the outcome
found in the former setting. And List (2004b) for an untreated observation had it been
conducted artefactual, framed, and natural treated, then there is no evaluation problem.
experiments to investigate the nature and “Controlled” experiments, which include
laboratory experiments and field experi-
10
The fact that the rules are imposed does not imply ments, represent the most convincing
that the subjects would reject them, individually or social- method of creating the counterfactual, since
ly, if allowed. they directly construct a control group via
11
To offer an early and a recent example, consider the
risk-aversion experiments conducted by Hans Binswanger randomization.15 In this case, the population
(1980, 1981) in India, and Harrison, Lau, and Williams
(2002), who took the lab experimental design of Maribeth
14
Coller and Melonie Williams (1999) into the field with a We simplify by considering a binary treatment, but
representative sample of the Danish population. the logic generalizes easily to multiple treatment levels and
12
For example, the experiments of Peter Bohm continuous treatments. Obvious examples from outside
(1984b) to elicit valuations for public goods that occurred economics include dosage levels or stress levels. In eco-
naturally in the environment of subjects, albeit with nomics, one might have some measure of risk aversion or
unconventional valuation methods; or the Vickrey auctions “other regarding preferences” as a continuous treatment.
15
and “cheap talk” scripts that List (2001) conducted with Experiments are often run in which the control is pro-
sport-card collectors, using sports cards as the commodity vided by theory, and the objective is to assess how well the-
and at a show where they trade such commodities. ory matches behavior. This would seem to rule out a role
13
For example, the manipulation of betting markets by for randomization, until one recognizes that some implicit
Camerer (1998) or the solicitation of charitable contribu- or explicit error structure is required in order to test theo-
tions by List and Lucking-Reiley (2002). ries meaningfully. We return to this issue in section 8.
dec04_Article 1 12/14/04 2:25 PM Page 1015

Harrison and List: Field Experiments 1015

average treatment effect is given by individuals with the same value for these fac-
τy∗1y∗0, where y∗1 and y∗0 are the treat- tors will display homogenous responses to
ed and nontreated average outcomes after the treatment, then the treatment effect can
the treatment. We have much more to say be measured without bias. In effect, one can
about controlled experiments, in particular use statistical methods to identify which two
field experiments, below. individuals are “more homogeneous lab rats”
“Natural experiments” consider the treat- for the purposes of measuring the treatment
ment itself as an experiment and find a natu- effect. More formally, the solution advocated
rally occurring comparison group to mimic is to find a vector of covariates, Z, such that
the control group: τ is measured by compar- y1,y0 ⊥ T | Z and pr(T1 | Z) ∈ (0,1), where
ing the difference in outcomes before and ⊥ denotes independence.16
after for the treated group with the before Another alternative to the DID model is
and after outcomes for the nontreated group. the use of instrumental variables (IV), which
Estimation of the treatment effect takes the approaches the structural econometric
form YitXitτ Titηit, where i indexes the method in the sense that it relies on exclusion
unit of observation, t indexes years, Yit is the restrictions (Joshua D. Angrist, Guido W.
outcome in cross-section i at time t, Xit is a Imbens, and Donald B. Rubin 1996; and
vector of controls, Tit is a binary variable, Joshua D. Angrist and Alan B. Krueger 2001).
ηitαiλtεit, and t is the difference-in-dif- The IV method, which essentially assumes
ferences (DID) average treatment effect. If that some components of the non-experi-
we assume that data exists for two periods, mental data are random, is perhaps the most
then τ(yt∗t1yt∗t0)(yu∗t1yu∗t0) where, widely utilized approach to measuring treat-
for example, yt∗t1 is the mean outcome for ment effects (Mark Rosenzweig and Kenneth
the treated group. Wolpin 2000). The crux of the IV approach is
A major identifying assumption in DID to find a variable that is excluded from the
estimation is that there are no time-varying, outcome equation, but which is related to
unit-specific shocks to the outcome variable treatment status and has no direct association
that are correlated with treatment status, with the outcome. The weakness of the IV
and that selection into treatment is inde- approach is that such variables do not often
pendent of temporary individual-specific exist, or that unpalatable assumptions must
effect: E(ηit | Xit, Dit)E(αi | Xit, Dit)λt. If be maintained in order for them to be used to
εit, and τ are related, DID is inconsistently identify the treatment effect of interest.
estimated as E(τt)τE(εit1εit0 | D1) A final alternative to the DID model is
E(εit1εit0 | D0). structural modeling. Such models often entail
One alternative method of assessing the a heavy mix of identifying restrictions (e.g.,
impact of the treatment is the method of
propensity score matching (PSM) developed 16
If one is interested in estimating the average treat-
in P. Rosenbaum and Donald Rubin (1983). ment effect, only the weaker condition E(y0|T1,
This method has been used extensively in Z)E(y0|T0, Z)E(y0 | Z) is required. This assumption is
the debate over experimental and nonexper- called the “conditional independence assumption,” and
intuitively means that given Z, the nontreated outcomes
imental evaluation of treatment effects initi- are what the treated outcomes would have been had they
ated by Lalonde (1986): see Rajeev Dehejia not been treated. Or, likewise, that selection occurs only
and Sadek Wahba (1999, 2002) and Jeffrey on observables. Note that the dimensionality of the prob-
lem, as measured by Z, may limit the use of matching. A
Smith and Petra Todd (2000). The goal of more feasible alternative is to match on a function of Z.
PSM is to make non-experimental data “look Rosenbaum and Rubin (1983, 1984) showed that matching
like” experimental data. The intuition on p(Z) instead of Z is valid. This is usually carried out on
the “propensity” to get treated p(Z), or the propensity
behind PSM is that if the researcher can score, which in turn is often implemented by a simple pro-
select observable factors so that any two bit or logit model with T as the dependent variable.
dec04_Article 1 12/14/04 2:25 PM Page 1016

1016 Journal of Economic Literature, Vol. XLII (December 2004)

separability), impose structure on technology could be applied to real people, but to actu-
and preferences (e.g., constant returns to ally do so entails some serious and often
scale or unitary income elasticities), and sim- unattractive logistical problems.19
plifying assumptions about equilibrium out- A more substantial response to this criti-
comes (e.g., zero-profit conditions defining cism is to consider what it is about students
equilibrium industrial structure). Perhaps the that is viewed, a priori, as being nonrepre-
best-known class of such structural models is sentative of the target population. There are
computable general equilibrium models, at least two issues here. The first is whether
which have been extensively applied to evalu- endogenous sample selection or attrition has
ate trade policies, for example.17 It typically occurred due to incomplete control over
relies on complex estimation strategies, but recruitment and retention, so that the
yields structural parameters that are well- observed sample is unreliable in some statis-
suited for ex ante policy simulation, provided tical sense (e.g., generating inconsistent esti-
one undertakes systematic sensitivity analysis mates of treatment effects). The second is
of those parameters.18 In this sense, structur- whether the observed sample can be inform-
al models have been the cornerstone of non- ative on the behavior of the population,
experimental evaluation of tax and welfare assuming away sample selection issues.
policies (R. Blundell and Thomas MaCurdy
4.2 Sample Selection in the Field
1999; and Blundell and M. Costas Dias 2002).
Conventional lab experiments typically
4. Artefactual Field Experiments use students who are recruited after being
told only general statements about the
4.1 The Nature of the Subject Pool experiment. By and large, recruitment pro-
A common criticism of the relevance of cedures avoid mentioning the nature of the
inferences drawn from laboratory experi- task, or the expected earnings. Most lab
ments is that one needs to undertake an experiments are also one-shot, in the sense
experiment with “real” people, not students. that they do not involve repeated observa-
This criticism is often deflected by experi- tions of a sample subject to attrition. Of
menters with the following imperative: if you course, neither of these features is essential.
think that the experiment will generate differ- If one wanted to recruit subjects with specif-
ent results with “real” people, then go ahead ic interest in a task, it would be easy to do
and run the experiment with real people. A (e.g., Peter Bohm and Hans Lind 1993). And
variant of this response is to challenge the crit- if one wanted to recruit subjects for several
ics’ assertion that students are not representa- sessions, to generate “super-experienced”
tive. As we will see, this variant is more subtle subjects20 or to conduct pre-tests of such
and constructive than the first response. things as risk aversion, trust, or “other-
The first response, to suggest that the crit- regarding preferences,”21 that could be built
ic run the experiment with real people, is into the design as well.
often adequate to get rid of unwanted refer- One concern with lab experiments con-
ees at academic journals. In practice, howev- ducted with convenience samples of students
er, few experimenters ever examine field
behavior in a serious and large-sample way. 19
Or one can use “real” nonhuman species: see John
It is relatively easy to say that the experiment Kagel, Don MacDonald, and Raymond Battalio (1990) and
Kagel, Battalio, and Leonard Green (1995) for dramatic
demonstrations of the power of economic theory to organ-
17
For example, the evaluation of the Uruguay Round ize data from the animal kingdom.
20
of multilateral trade liberalization by Harrison, Thomas For example, John Kagel and Dan Levin (1986, 1999,
Rutherford, and David Tarr (1997). 2002).
18 21
For example, see Harrison and H.D. Vinod (1992). For example, Cox (2004).
dec04_Article 1 12/14/04 2:25 PM Page 1017

Harrison and List: Field Experiments 1017

is that students might be self-selected in allows one to remove this recruitment bias
some way, so that they are a sample that from the resulting inference.
excludes certain individuals with characteris- Some field experiments face a more seri-
tics that are important determinants of ous problem of sample selection that
underlying population behavior. Although depends on the nature of the task. Once the
this problem is a severe one, its potential experiment has begun, it is not as easy as it is
importance in practice should not be in the lab to control information flow about
overemphasized. It is always possible to sim- the nature of the task. This is obviously a
ply inspect the sample to see if certain strata matter of degree, but can lead to endoge-
of the population are not represented, at nous subject attrition from the experiment.
least under the tentative assumption that it is Such attrition is actually informative about
only observables that matter. In this case it subject preferences, since the subject’s exit
would behoove the researcher to augment from the experiment indicates that the sub-
the initial convenience sample with a quota ject had made a negative evaluation of it
sample, in which the missing strata were sur- (Tomas Philipson and Larry Hedges 1998).
veyed. Thus one tends not to see many con- The classic problem of sample selection
victed mass murderers or brain surgeons in refers to possible recruitment biases, such
student samples, but we certainly know that the observed sample is generated by a
where to go if we feel the need to include process that depends on the nature of the
them in our sample. experiment. This problem can be serious for
Another consideration, of increasing any experiment, since a hallmark of virtually
importance for experimenters, is the possi- every experiment is the use of some ran-
bility of recruitment biases in our proce- domization, typically to treatment.22 If the
dures. One aspect of this issue is studied by population from which volunteers are being
Rutström (1998). She examines the role of recruited has diverse risk attitudes and plau-
recruitment fees in biasing the samples of sibly expects the experiment to have some
subjects that are obtained. The context for element of randomization, then the
her experiment is particularly relevant here observed sample will tend to look less risk-
since it entails the elicitation of values for a averse than the population. It is easy to
private commodity. She finds that there are imagine how this could then affect behavior
some significant biases in the strata of the differentially in some treatments. James
population recruited as one varies the Heckman and Jeffrey Smith (1995) discuss
recruitment fee from zero dollars to two dol- this issue in the context of social experi-
lars, and then up to ten dollars. An important ments, but the concern applies equally to
finding, however, is that most of those biases field and lab experiments.
can be corrected simply by incorporating the
4.3 Are Students Different?
relevant characteristics in a statistical model
of the behavior of subjects and thereby con- This question has been addressed in sev-
trolling for them. In other words, it does not eral studies, including early artefactual field
matter if one group of subjects in one treat- experiments by Sarah Lichtenstein and Paul
ment has 60 percent females and the other Slovic (1973), and Penny Burns (1985).
sample of subjects in another treatment has Glenn Harrison and James Lesley (1996)
only 40 percent females, provided one con- (HL) approach this question with a simple
trols for the difference in gender when pool- statistical framework. Indeed, they do not
ing the data and examining the key treatment consider the issue in terms of the relevance
effect. This is a situation in which gender
might influence the response or the effect of 22
If not to treatment, then randomization often occurs
the treatment, but controlling for gender over choices to determine payoff.
dec04_Article 1 12/14/04 2:25 PM Page 1018

1018 Journal of Economic Literature, Vol. XLII (December 2004)

of experimental methods, but rather in the subject was asked whether he or she
terms of the relevance of convenience sam- would be willing to pay $X towards a public
ples for the contingent valuation method.23 good, where $X was randomly selected to
However, it is easy to see that their methods be $10, $30, $60, or $120. A subject would
apply much more generally. respond to this question with a “yes,” a
The HL approach may be explained in “no,” or a “not sure.” A simple statistical
terms of their attempt to mimic the results model is developed to explain behavior as a
of a large-scale national survey conducted function of the observable socioeconomic
for the Exxon Valdez oil-spill litigation. A characteristics.24
major national survey was undertaken in this Assuming that a statistical model has
case by Richard Carson et al. (1992) for the been developed, HL then proceeded to the
attorney general of the state of Alaska. This key stage of their method. This is to assume
survey used then-state-of-the-art survey that the coefficient estimates from the statis-
methods but, more importantly for present tical model based on the student sample
purposes, used a full probability sample of apply to the population at large. If this is the
the nation. HL asked if one can obtain case, or if this assumption is simply main-
essentially the same results using a conven- tained, then the statistical model may be
ience sample of students from the University used to predict the behavior of the target
of South Carolina. Using students as a con- population if one can obtain information
venience sample is largely a matter of about the socioeconomic characteristics of
methodological bravado. One could readily the target population.
obtain convenience samples in other ways, The essential idea of the HL method is
but using students provides a tough test of simple and more generally applicable than
their approach. this example suggests. If students are repre-
They proceeded by developing a simpler sentative in the sense of allowing the
survey instrument than the one used in the researcher to develop a “good” statistical
original study. The purpose of this is purely model of the behavior under study, then
to facilitate completion of the survey and is one can often use publicly available infor-
not essential to the use of the method. This mation on the characteristics of the target
survey was then administered to a relatively population to predict the behavior of that
large sample of students. An important part population. Their fundamental point is that
of the survey, as in any field survey that aims the “problem with students” is the lack of
to control for subject attributes, is the col- variability in their socio-demographic char-
lection of a range of standard socioeconom- acteristics, not necessarily the unrepresen-
ic characteristics of the individual (e.g., sex, tativeness of their behavioral responses
age, income, parental income, household conditional on their socio-demographic
size, and marital status). Once these data characteristics.
are collated, a statistical model is developed To the extent that student samples exhibit
in order to explain the key responses in the limited variability in some key characteris-
survey. In this case the key response is a tics, such as age, then one might be wary of
simple “yes” or “no” to a single dichotomous the veracity of the maintained assumption
choice valuation question. In other words, involved here. However, the sample does not
have to look like the population in order for
23
The contingent valuation method refers to the use of the statistical model to be an adequate one
hypothetical field surveys to value the environment, by
24
posing a scenario that asks the subject to place a value on The exact form of that statistical model is not impor-
an environmental change contingent on a market for it tant for illustrative purposes, although the development of
existing. See Cummings and Harrison (1994) for a critical an adequate statistical model is important to the reliability
review of the role of experimental economics in this field. of this method.
dec04_Article 1 12/14/04 2:25 PM Page 1019

Harrison and List: Field Experiments 1019

for predicting the population response.25 All The reason is simple to understand. It is
that is needed is for the behavioral respons- much easier to predict the behavior of a 26-
es of students to be the same as the behav- year-old when one has a model that is based
ioral responses of nonstudents. This can on the behavior of people whose ages range
either be assumed a priori or, better yet, from 21 to 79 than it is to estimate the
tested by sampling nonstudents as well as behavior of a 69-year-old based on the
students. behavioral model from a sample whose ages
Of course, it is always better to be fore- range from 19 to 27.
casting on the basis of an interpolation What is the relevance of these methods for
rather than an extrapolation, and that is the the original criticism of experimental proce-
most important problem one has with stu- dures? Think of the experimental subjects as
dent samples. This issue is discussed in some the convenience sample in the HL approach.
detail by Blackburn, Harrison, and Rutström The lessons that are learned from this stu-
(1994). They estimated a statistical model of dent sample could be embodied in a statisti-
subject response using a sample of college cal model of their behavior, with implications
students and also estimated a statistical drawn for a larger target population.
model of subject response using field sub- Although this approach rests on an assump-
jects drawn from a wide range of churches in tion that is as yet untested, concerning the
the same urban area. Each were conven- representativeness of student behavioral
ience samples. The only difference is that responses conditional on their characteris-
the church sample exhibited a much wider tics, it does provide a simple basis for evalu-
variability in their socio-demographic char- ating the extent to which conclusions about
acteristics. In the church sample, ages students apply to a broader population.
ranged from 21 to 79; in the student sample, How could this method ever lead to inter-
ages ranged from 19 to 27. When predicting esting results? The answer depends on the
behavior of students based on the church- context. Consider a situation in which the
estimated behavioral model, interpolation behavioral model showed that age was an
was used and the predictions were extreme- important determinant of behavior. Consider
ly accurate. In the reverse direction, howev- further a situation in which the sample used
er, when predicting church behavior from to estimate the model had an average age that
the student-estimated behavioral model, the was not representative of the population as a
predictions were disastrous in the sense of whole. In this case, it is perfectly possible that
having extremely wide forecast variances.26 the responses of the student sample could be
quite different than the predicted responses
25
For example, assume a population of 50 percent men of the population. Although no such instances
and 50 percent women, but where a sample drawn at ran- have appeared in the applications of this
dom happens to have 60 percent men. If responses differ method thus far, they should not be ruled out.
according to sex, predicting the population is simply a mat-
ter of reweighting the survey responses. We conclude, therefore, that many of the
26
On the other hand, reporting large variances may be concerns raised by this criticism, while valid,
the most accurate reflection of the wide range of valua- are able to be addressed by simple exten-
tions held by this sample. We should not always assume
that distributions with smaller variances provide more sions of the methods that experimenters cur-
accurate reflections of the underlying population just rently use. Moreover, these extensions
because they have little dispersion; for this to be true, would increase the general relevance of
many auxiliary assumptions about randomness of the sam-
pling process must be assumed, not to mention issues experimental methods obtained with student
about the stationarity of the underlying population convenience samples.
process. This stationarity is often assumed away in contin- Further problems arise if one allows unob-
gent valuation research (e.g., the proposal to use double-
bounded dichotomous choice formats without allowing for served individual effects to play a role. In
possible correlation between the two questions). some statistical settings it is possible to allow
dec04_Article 1 12/14/04 2:25 PM Page 1020

1020 Journal of Economic Literature, Vol. XLII (December 2004)

for those effects by means of “fixed effect” or beauty. Again, the immediate implication is
“random effects” analyses. But these stan- to collect a standard battery of measures of
dard devices, now quite common in the tool- individual characteristics to allow some sta-
kit of experimental economists, do not tistical comparisons of conditional treatment
address a deeper problem. The internal effects to be drawn.27 But even here we can
validity of a randomized design is maximized only easily condition on observable charac-
when one knows that the samples in each teristics, and additional identifying assump-
treatment are identical. This happy extreme tions will be needed to allow for correlated
leads many to infer that matching subjects differences in unobservables.
on a finite set of characteristics must be bet-
4.4 Precursors
ter in terms of internal validity than not
matching them on any characteristics. Several experimenters have used artefac-
But partial matching can be worse than tual field experiments; that is, they have
no matching. The most important example deliberately sought out subjects in the
of this is due to James Heckman and Peter “wild,” or brought subjects from the “wild”
Siegelman (1993) and Heckman (1998), into labs. It is notable that this effort has
who critique paired-audit tests of discrimi- occurred from the earliest days of experi-
nation. In these experiments, two applicants mental economics, and that it has only
for a job are matched in terms of certain recently become common.
observables, such as age, sex, and education, Lichtenstein and Slovic (1973) replicated
and differ in only one protected characteris- their earlier experiments on “preference
tic, such as race. However, unless some reversals” in “… a nonlaboratory real-play
extremely strong assumptions about how setting unique to the experimental litera-
characteristics map into wages are made, ture on decision processes—a casino in
there will be a predetermined bias in out- downtown Las Vegas” (p. 17). The experi-
comes. The direction of the bias “depends,” menter was a professional dealer, and the
and one cannot say much more. A metaphor subjects were drawn from the floor of the
from Heckman (1998, p. 110) illustrates: casino. Although the experimental equip-
Boys and girls of the same age are in a high- ment may have been relatively forbidding
jump competition, and jump the same (it included a PDP-7 computer, a DEC-339
height on average. But boys have a higher CRT, and a keyboard), the goal was to iden-
variance in their jumping technique, for any tify gamblers in their natural habitat. The
number of reasons. If the bar is set very low subject pool of 44 did include seven known
relative to the mean, then the girls will look dealers who worked in Las Vegas, and the
like better jumpers; if the bar is set very “… dealer’s impression was that the game
high then the boys will look like better attracted a higher proportion of profession-
jumpers. The implications for numerous al and educated persona than the usual
(lab and field) experimental studies of the casino clientele” (p. 18).
effect of gender, that do not control for Kagel, Battalio, and James Walker (1979)
other characteristics, should be apparent. provide a remarkable, early examination of
This metaphor also serves to remind us many of the issues we raise. They were con-
that what laboratory experimenters think of cerned with “volunteer artifacts” in lab
as a “standard population” need not be a experiments, ranging from the characteristics
homogeneous population. Although stu- that volunteers have to the issue of sample
dents from different campuses in a given
27
country may have roughly the same age, George Lowenstein (1999) offers a similar criticism
of the popular practice in experimental economics of not
they can differ dramatically in influential conditioning on any observable characteristics or random-
characteristics such as intelligence and izing to treatment from the same population.
dec04_Article 1 12/14/04 2:25 PM Page 1021

Harrison and List: Field Experiments 1021

selection bias.28 They conducted a field reason for trade in this environment.30 The
experiment in the homes of the volunteer major empirical result is the large number of
subjects, examining electricity demand in observed price bubbles: fourteen of the 22
response to changes in prices, weekly feed- experiments can be said to have had some
back on usage, and energy conservation price bubble.
information. They also examined a compar- In an effort to address the criticism that
ison sample drawn from the same popula- bubbles were just a manifestation of using
tion, to check for any biases in the volunteer student subjects, Smith, Suchanek, and
sample. Williams (1988) recruited nonstudent sub-
Binswanger (1980, 1981) conducted jects for one experiment. As they put it, one
experiments eliciting measures of risk aver- experiment “… is noteworthy because of its
sion from farmers in rural India. Apart from use of professional and business people from
the policy interest of studying agents in the Tucson community, as subjects. This
developing countries, one stated goal of market belies any notion that our results are
using artefactual field experiments was to an artifact of student subjects, and that busi-
assess risk attitudes for choices in which the nessmen who ‘run the real world’ would
income from the experimental task was a quickly learn to have rational expectations.
substantial fraction of the wealth or annual This is the only experiment we conducted
income of the subject. The method he devel- that closed on a mean price higher than in all
oped has been used recently in conventional previous trading periods” (p. 1130–31). The
laboratory settings with student subjects by reference at the end is to the observation
Charles Holt and Susan Laury (2002). that the price bubble did not burst as the
Burns (1985) conducted induced-value finite horizon of the experiment was
market experiments with floor traders from approaching. Another notable feature of this
wool markets, to compare with the behav- price bubble is that it was accompanied by
ior of student subjects in such settings. The heavy volume, unlike the price bubbles
goal was to see if the heuristics and deci- observed with experienced subjects.31
sion rules these traders evolved in their Although these subjects were not students,
natural field setting affected their behavior. they were inexperienced in the use of the
She did find that their natural field rivalry double auction experiments. Moreover,
had a powerful motivating effect on their there is no presumption that their field expe-
behavior. rience was relevant for this type of asset
Vernon Smith, G. L. Suchanek, and market.
Arlington Williams (1988) conducted a large
series of experiments with student subjects 30
There are only two reasons players may want to trade
in an “asset bubble” experiment. In the 22 in this market. First, if players differ in their risk attitudes
then we might see the asset trading below expected divi-
experiments they report, nine to twelve dend value (since more-risk-averse players will pay less-
traders with experience in the double-auc- risk-averse players a premium over expected dividend
tion institution29 traded a number of fifteen value to take their assets). Second, if subjects have diverse
price expectations, we can expect trade to occur because of
or thirty period assets with the same com- expected capital gains. This second reason for trading
mon value distribution of dividends. If all (diverse price expectations) can actually lead to contract
subjects are risk neutral and have common prices above expected dividend value, provided some sub-
ject believes that there are other subjects who believe the
price expectations, then there would be no price will go even higher.
31
Harrison (1992b) reviews the detailed experimental
28
They also have a discussion of the role that these pos- evidence on bubbles, and shows that very few significant
sible biases play in social psychology experiments, and how bubbles occur with subjects who are experienced in asset
they have been addressed in the literature. market experiments in which there is a short-lived asset,
29
And either inexperienced, once experienced, or such as those under study. A bubble is significant only if
twice experienced in asset market trading. there is some nontrivial volume associated with it.
dec04_Article 1 12/14/04 2:25 PM Page 1022

1022 Journal of Economic Literature, Vol. XLII (December 2004)

Artefactual field experiments have also tasks, experience is generated in the field
made use of children and high school sub- and not the lab. These results provide sup-
jects. For example, William Harbaugh and port for the notion that context-specific
Kate Krause (2000), Harbaugh, Krause, and experience does appear to carry over to com-
Timothy Berry (2001), and Harbaugh, parable settings, at least with respect to
Krause, and Lise Vesterlund (2002) explore these types of auctions.
other-regarding preferences, individual This experimental design emphasizes the
rationality, and risk attitudes among children identification of a naturally occurring setting
in school environments. in which one can control for experience in
Joseph Henrich (2000) and Henrich and the way that it is accumulated in the field.
Richard McElreath (2002), and Henrich et Experienced traders gain experience over
al. (2001, 2004) have even taken artefactual time by observing and surviving a relatively
field experiments to the true “wilds” of a wide range of trading circumstances. In
number of peasant societies, employing the some settings this might be proxied by the
procedures of cultural anthropology to manner in which experienced or super-expe-
recruit and instruct subjects and conduct rienced subjects are defined in the lab, but it
artefactual field experiments. Their focus remains on open question whether standard
was on the ultimatum bargaining game and lab settings can reliably capture the full
measures of risk aversion. extent of the field counterpart of experience.
This is not a criticism of lab experiments,
just their domain of applicability.
5. Framed Field Experiments The methodological lesson we draw is that
5.1 The Nature of the Information Subjects one should be careful not to generalize from
Already Have the evidence of a winner’s curse by student
subjects that have no experience at all with
Auction theory provides a rich set of pre- the field context. These results do not imply
dictions concerning bidders’ behavior. One that every field context has experienced sub-
particularly salient finding in a plethora of jects, such as professional sports-card deal-
laboratory experiments that is not predicted ers, that avoid the winner’s curse. Instead,
in first-price common-value auction theory they point to a more fundamental need to
is that bidders commonly fall prey to the consider the field context of experiments
winner’s curse. Only “super-experienced” before drawing general conclusions. It is not
subjects, who are in fact recruited on the the case that abstract, context-free experi-
basis of not having lost money in previous ments provide more general findings if the
experiments, avoid it regularly. This would context itself is relevant to the performance
seem to suggest that experience is a suffi- of subjects. In fact, one would generally
cient condition for an individual bidder to expect such context-free experiments to be
avoid the winner’s curse. Harrison and List unusually tough tests of economic theory,
(2003) show that this implication is support- since there is no control for the context that
ed when one considers a natural setting in subjects might themselves impose on the
which it is relatively easy to identify traders abstract experimental task.
that are more or less experienced at the task. The main result is that if one wants to
In their experiments the experience of sub- draw conclusions about the validity of theory
jects is either tied to the commodity, the val- in the field, then one must pay attention to
uation task, and the use of auctions (in the the myriad of ways in which field context can
field experiments with sports cards), or sim- affect behavior. We believe that convention-
ply to the use of auctions (in the laboratory al lab experiments, in which roles are exoge-
experiments with induced values). In all nously assigned and defined in an abstract
dec04_Article 1 12/14/04 2:25 PM Page 1023

Harrison and List: Field Experiments 1023

manner, cannot ubiquitously provide reliable rather than abstract commodities, is not
insights into field behavior. One might be unique to the field, nor does one have to
able to modify the lab experimental design to eschew experimenter-induced valuations in
mimic those field contexts more reliably, and the field. But the use of real goods does have
that would make for a more robust applica- consequences that apply to both lab and
tion of the experimental method in general. field experiments.32
Consider, as an example, the effect of Abstraction Requires Abstracting. One
“insiders” on the market phenomenon simple example is the Tower of Hanoi
known as the “winner’s curse.” For now we game, which has been extensively studied
define an insider as anyone who has better by cognitive psychologists (e.g., J. R. Hayes
information than other market participants. and H. A. Simon 1974) and more recently
If insiders are present in a market, then one by economists (Tanga McDaniel and
might expect that the prevailing prices in the Rutström 2001) in some fascinating experi-
market will reflect their better information. ments. The physical form of the game, as
This leads to two general questions about found in all serious Montessori classrooms
market performance. First, do insiders fall and in Judea Pearl (1984, p. 28), is shown in
prey to the winner’s curse? Second, does the figure 1.
presence of insiders mitigate the winner’s The top picture shows the initial state, in
curse for the market as a whole? which n disks are on peg 1. The goal is to
The approach adopted by Harrison and move all of the disks to peg 3, as shown in
List (2003) is to undertake experiments in the goal state in the bottom picture. The
naturally occurring settings in which the fac- constraints are that only one disk may be
tors that are at the heart of the theory are moved at a time, and no disk may ever lie
identifiable and arise endogenously, and under a bigger disk. The objective is to reach
then to impose the remaining controls need- the goal state in the least number of moves.
ed to implement a clean experiment. In other The “trick” to solving the Tower of Hanoi is
words, rather than impose all controls exoge- to use backwards induction: visualize the
nously on a convenience sample of college final, goal state and use the constraints to fig-
students, they find a population in the field ure out what the penultimate state must
in which one of the factors of interest arises have looked like (viz., the tiny disk on the top
naturally, where it can be identified easily, of peg 3 in the goal state would have to be on
and then add the necessary controls. To test peg 1 or peg 2 by itself). Then work back
their methodological hypotheses, they also from that penultimate state, again respecting
implement a fully controlled laboratory the constraints (viz., the second smallest disk
experiment with subjects drawn from the on peg 3 in the goal state would have to be
same field population. We discuss some of on whichever of peg 1 or peg 2 the smallest
their findings below. disk is not on). One more step in reverse and
the essential logic should be clear (viz., in
5.2 The Nature of the Commodity
order for the third largest disk on peg 3 to be
Many field experiments involve real, phys- off peg 3, one of peg 1 or peg 2 will have to
ical commodities and the values that subjects be cleared, so the smallest disk should be on
place on them in their daily lives. This is dis- top of the second-smallest disk).
tinct from the traditional focus in experi- Observation of students in Montessori
mental economics on experimenter-induced classrooms makes it clear how they (eventu-
valuations on an abstract commodity, often ally) solve the puzzle, when confronted with
referred to as “tickets” just to emphasize the
lack of any field referent that might suggest 32
See Harrison, Ronald Harstad, and Rutström (2004)
a valuation. The use of real commodities, for a general treatment.
dec04_Article 1 12/14/04 2:25 PM Page 1024

1024 Journal of Economic Literature, Vol. XLII (December 2004)

Figure 1. The Tower of Hanoi Game

the initial state. They shockingly violate the induction, if attained, it is quite possible that
constraints and move all the disks to the goal it posed an insurmountable cognitive burden
state en masse, and then physically work for some of the experimental subjects.
backwards along the lines of the above It might be tempting to think of this as just
thought experiment in backwards induction. two separate tasks, instead of a real com-
The critical point here is that they temporar- modity and its abstract analogue. But we
ily violate the constraints of the problem in believe that this example does identify an
order to solve it “properly.” important characteristic of commodities in
Contrast this behavior with the laboratory ideal field experiments: the fact that they
subjects in McDaniel and Rutström (2001). allow subjects to adopt the representation of
They were given a computerized version of the commodity and task that best suits their
the game and told to try to solve it. However, objective. In other words, the representation
the computerized version did not allow them of the commodity by the subject is an inte-
to violate the constraints. Hence the labora- gral part of how the subject solves the task.
tory subjects were unable to use the class- One simply cannot untangle them, at least
room Montessori method, by which the not easily and naturally.
student learns the idea of backwards induc- This example also illustrates that off-
tion by exploring it with physical referents. equilibrium states, in which one is not opti-
This is not a design flaw of the McDaniel and mizing in terms of the original constrained
Rutström (2001) lab experiments, but simply optimization task, may indeed be critical
one factor to keep in mind when evaluating to the attainment of the equilibrium state.33
the behavior of their subjects. Without the
physical analogue of the final goal state being 33
This is quite distinct from the valid point made by
allowed in the experiment, the subject was Smith (1982, p. 934, fn. 17), that it is appropriate to design
the experimental institution so as to make the task as sim-
forced to visualize that state conceptually, ple and transparent as possible, providing one holds con-
and to likewise imagine conceptually the stant these design features as one compares experimental
penultimate states. Although that may treatments. Such designs may make the results of less
interest for those wanting to make field inferences, but
encourage more fundamental conceptual that is a trade-off that every theorist and experimenter
understanding of the idea of backwards faces to varying degrees.
dec04_Article 1 12/14/04 2:25 PM Page 1025

Harrison and List: Field Experiments 1025

Thus we should be mindful of possible field reasons why homegrown values might be
devices that allow subjects to explore off- affiliated in such experiments.
equilibrium states, even if those states are The first is that the good being auctioned
ruled out in our null hypotheses. might have some uncertain attributes, and fel-
Field Goods Have Field Substitutes. There low bidders might have more or less informa-
are two respects in which “field substitutes” tion about those attributes. Depending on how
play a role whenever one is conducting an one perceives the knowledge of other bidders,
experiment with naturally occurring, or observation of their bidding behavior35 can
field, goods. We can refer to the former as affect a given bidder’s estimate of the true sub-
the natural context of substitutes, and to the jective value to the extent that they change the
latter as an artificial context of substitutes. bidder’s estimate of the lottery of attributes
The former needs to be captured if reliable being auctioned.36 Note that what is being
valuations are to be elicited; the latter needs 35
The term “bidding behavior” is used to allow for
to be minimized or controlled. information about bids as well as non-bids. In the repeat-
The first way in which substitutes play a ed Vickrey auction it is the former that is provided (for
role in an experiment is the traditional sense winners in previous periods). In the one-shot English auc-
tion it is the latter (for those who have not yet caved in at
of demand theory: to some individuals, a the prevailing price). Although the inferential steps in
bottle of scotch may substitute for a bible using these two types of information differ, they are each
when seeking peace of mind. The degree of informative in the same sense. Hence any remarks about
the dangers of using repeated Vickrey auctions apply
substitutability here is the stuff of individual equally to the use of English auctions.
36
demand elasticities, and can reasonably be To see this point, assume that a one-shot Vickrey auc-
expected to vary from subject to subject. The tion was being used in one experiment and a one-shot
English auction in another experiment. Large samples of
upshot of this consideration is, yet again, that subjects are randomly assigned to each institution, and the
one should always collect information on commodity differs. Let the commodity be something whose
observable individual characteristics and quality is uncertain; an example used by Cummings,
Harrison and Rutström (1995) and Rutström (1998) might be
control for them. a box of gourmet chocolate truffles. Amongst undergraduate
The second way in which substitutes play students in South Carolina, these boxes present something of
a role in an experiment is the more subtle a taste challenge. The box is not large in relation to those
found in more common chocolate products, and many of the
issue of affiliation which arises in lab or field students have not developed a taste for gourmet chocolates.
settings that involve preferences over a field A subject endowed with a diverse pallet is faced with an
good. To see this point, consider the use of uncertain lottery. If these are just ordinary chocolates dressed
up in a small box, then the true value to the subject is small
repeated Vickrey auctions in which subjects (say, $2). If they are indeed gourmet chocolates then the true
learn about prevailing prices. This results in value to the subject is much higher (say, $10). Assuming an
a loss of control, since we are dealing with equal chance of either state of chocolate, the risk-neutral sub-
ject would bid their true expected value (in this example, $6).
the elicitation of homegrown values rather In the Vickrey auction this subject will have an incentive to
than experimenter-induced private values. write down her reservation price for this lottery as described
To the extent that homegrown values are above. In the English auction, however, this subject is able to
see a number of other subjects indicate that they are willing
affiliated across subjects, we can expect an to pay reasonably high sums for the commodity. Some have
effect on elicited values from using repeated not dropped out of the auction as the price has gone above
Vickrey auctions rather than a one-shot $2, and it is closing on $6. What should the subject do? The
answer depends critically on how knowledgeable he thinks
Vickrey auction.34 There are, in turn, two the other bidders are as to the quality of the chocolates. If
those who have dropped out are the more knowledgeable
34 ones, then the correct inference is that the lottery is more
The theoretical and experimental literature makes
this point clearly by comparing real-time English auctions heavily weighted towards these being common chocolates. If
with sealed-bid Vickrey auctions: see Paul Milgrom and those remaining in the auction are the more knowledgeable
Robert Weber (1982) and Kagel, Harstad, and Levin ones, however, then the opposite inference is appropriate. In
(1987). The same logic that applies for a one-shot English the former case the real-time observation should lead the
auction applies for a repeated Vickrey auction, even if the subject to bid lower than in the Vickrey auction, and in the
specific bidding opponents were randomly drawn from the latter case the real-time observation should lead the subject
population in each round. to bid higher than in the Vickrey auction.
dec04_Article 1 12/14/04 2:25 PM Page 1026

1026 Journal of Economic Literature, Vol. XLII (December 2004)

affected here by this knowledge is the subject’s 5.3 The Nature of the Task
best estimate of the subjective value of the
Who Cares If Hamburger Flippers Violate
good. The auction is still eliciting a truthful
EUT? Who cares if a hamburger flipper vio-
revelation of this subjective value; it is just that
lates the independence axiom of expected
the subjective value itself can change with
utility theory in an abstract task? His job
information on the bidding behavior of others.
description, job evaluation, and job satisfac-
The second reason that bids might be
tion do not hinge on it. He may have left
affiliated is that the good might have some
some money on the table in the abstract task,
extra-experimental market price. Assuming
but is there any sense in which his failure
transaction costs of entering the “outside”
suggests that he might be poor at flipping
market to be zero for a moment, information
hamburgers?
gleaned from the bidding behavior of others
Another way to phrase this point is to
can help the bidder infer what that market
actively recruit subjects who have experi-
price might be. To the extent that it is less
ence in the field with the task being stud-
than the subjective value of the good, this
ied.39 Trading houses do not allow neophyte
information might result in the bidder delib-
pit-traders to deviate from proscribed limits,
erately bidding low in the experiment.37 The
in terms of the exposure they are allowed. A
reason is that the expected utility of bidding
survival metric is commonly applied in the
below the true value is clearly positive: if
field, such that the subjects who engage in
lower bidding results in somebody else win-
certain tasks of interest have specific types of
ning the object at a price below the true value,
training.
then the bidder can (costlessly) enter the out-
The relevance of field subjects and field
side market anyway. If lower bidding results
environments for tests of the winner’s curse
in the bidder winning the object, and market
is evident from Douglas Dyer and Kagel
price and bids are not linked, then consumer
(1996, p. 1464), who review how executives
surplus is greater than if the object had been
in the commercial construction industry
bought in the outside market. Note that this
avoid the winner’s curse in the field:
argument suggests that subjects might have
an incentive to strategically misrepresent Two broad conclusions are reached. One is that
their true subjective value.38 the executives have learned a set of situation-
The upshot of these concerns is that specific rules of thumb which help them to avoid
unless one assumes that homegrown values the winner’s curse in the field, but which could
for the good are certain and not affiliated not be applied in the laboratory markets. The
second is that the bidding environment created
across bidders, or can provide evidence that in the laboratory and the theory underlying it
they are not affiliated in specific settings, are not fully representative of the field environ-
one should avoid the use of institutions that ment. Rather, the latter has developed escape
can have uncontrolled influences on esti- mechanisms for avoiding the winner’s curse that
mates of true subjective value and/or the are mutually beneficial to both buyers and sell-
ers and which have not been incorporated into
incentive to truthfully reveal that value. the standard one-shot auction theory literature.

37
These general insights motivated the
Harrison (1992a) makes this point in relation to some
previous experimental studies attempting to elicit home-
design of the field experiments of Harrison
grown values for goods with readily accessible outside
39
markets. The subjects may also have experience with the good
38
It is also possible that information about likely out- being traded, but that is a separate matter worthy of study.
side market prices could affect the individual’s estimate of For example, List (2004c) had sports-card enthusiasts
true subjective value. Informal personal experience, albeit trade coffee mugs and chocolates in tests of loss aversion,
over a panel data set, is that higher-priced gifts seem to even though they had no experience in openly trading
elicit warmer glows from spouses and spousal-equivalents. those goods.
dec04_Article 1 12/14/04 2:25 PM Page 1027

Harrison and List: Field Experiments 1027

and List (2003), mentioned earlier. They played by dealers, they frequently fall prey
study the behavior of insiders in their field to the winner’s curse. We conclude that the
context, while controlling the “rules of the theory predicts field behavior well when
game” to make their bidding behavior fall one is able to identify naturally occurring
into the domain of existing auction theory. In field counterparts to the key theoretical
this instance, the term “field context” means conditions.
the commodity with which the insiders are At a more general level, consider the
familiar, as well as the type of bidders they argument that subjects who behave irra-
normally encounter. tionally could be subjected to a “money-
This design allows one to tease apart the pump” by some arbitrager from hell. When
two hypotheses implicit in the conclusions of we explain transitivity of preferences to
Dyer and Kagel (1996). If these insiders fall undergraduates, the common pedagogy
prey to the winner’s curse in the field exper- includes stories of intransitive subjects
iment, then it must be40 that they avoid it by mindlessly cycling forever in a series of low-
using market mechanisms other than those cost trades. If these cycles continue, the sub-
under study. The evidence is consistent with ject is pumped of money until bankrupt. In
the notion that dealers in the field do not fall fact, the absence of such phenomena is
prey to the winner’s curse in the field exper- often taken as evidence that contracts or
iment, providing tentative support for the markets must be efficient.
hypothesis that naturally occurring markets There are several reasons why this may
are efficient because certain traders use not be true. First, it is only when certain
heuristics to avoid the inferential error that consistency conditions are imposed that suc-
underlies the winner’s curse. cessful money-pumps provide a general indi-
This support is only tentative, however, cator of irrationality, defeating their use as a
because it could be that these dealers have sole indicator (Robin Cubitt and Robert
developed heuristics that protect them Sugden 2001).
from the winner’s curse only in their spe- Second, and germane to our concern
cialized corner of the economy. That would with the field, subjects might have devel-
still be valuable to know, but it would mean oped simple heuristics to avoid such
that the type of heuristics they learn in their money-pumps: for example, never retrade
corner are not general and do not transfer the same objects with the same person.41
to other settings. Hence, the complete As John Conlisk (1996, p. 684) notes,
design also included laboratory experi- “Rules of thumb are typically exploitable
ments in the field, using induced valuations by ‘tricksters,’ who can in principle ‘money
as in the laboratory experiments of Kagel pump’ a person using such rules. …
and Levin (1999), to see if the heuristic of Although tricksters abound—at the door,
insiders transfers. We find that it does when on the phone, and elsewhere—people can
they are acting in familiar roles, adding fur- easily protect themselves, with their
ther support to the claim that these insiders pumpable rules intact, by such simple
have indeed developed a heuristic that devices as slamming the door and hanging
“travels” from problem domain to problem up the phone. The issue is again a matter
domain. Yet when dealers are exogenously of circumstance and degree.” The last
provided with less information than their point is important for our argument—only
bidding counterparts, a role that is rarely when the circumstance is natural might

40 41
This inference follows if one assumes that a dealer’s Slightly more complex heuristics work against arbi-
survival in the industry provides sufficient evidence that he tragers from meta-hell who understand that this simple
does not make persistent losses. heuristic might be employed.
dec04_Article 1 12/14/04 2:25 PM Page 1028

1028 Journal of Economic Literature, Vol. XLII (December 2004)

one reasonably expect the subject to be the sense of knowing what actions are feasi-
able to call upon survival heuristics that ble and what the consequences of different
protect against such irrationality. To be actions might be, then control has been lost
sure, some heuristics might “travel,” and at a basic level. In cases where the subject
that was precisely the research question understands all the relevant aspects of the
examined by Harrison and List (2003) with abstract game, problems may arise due to
respect to the dreaded winner’s curse. But the triggering of different methods for solv-
they might not; hence we might have sight- ing the decision problem. The use of field
ings of odd behavior in the lab that would referents could trigger the use of specific
simply not arise in the wild. heuristics from the field to solve the specif-
Third, subjects might behave in a non- ic problem in the lab, which otherwise may
separable manner with respect to sequen- have been solved less efficiently from first
tial decisions over time, and hence avoid principles (e.g., Gerd Gigerenzer et al.
the pitfalls of sequential money pumps 2000). For either of these reasons—a lack
(Mark Machina 1989; and Edward of understanding of the task or a failure to
McClennan 1990). Again, the use of such apply a relevant field heuristic—behavior
sophisticated characterizations of choices may differ between the lab and the field.
over time might be conditional on the indi- The implication for experimental design is
vidual having familiarity with the task and to just “do it both ways,” as argued by Chris
the consequences of simpler characteriza- Starmer (1999) and Harrison and Rutström
tions, such as those employing intertempo- (2001). Experimental economists should be
ral additivity. It is an open question if the willing to consider the effect in their exper-
richer characterization that may have iments of scripts that are less abstract, but
evolved for familiar field settings travels to in controlled comparisons with scripts that
other settings in which the individual has are abstract in the traditional sense.
less experience. Nevertheless, it must also be recognized
Our point is that one should not assume that inappropriate choice of field referents
that heuristics or sophisticated characteriza- may trigger uncontrolled psychological
tions that have evolved for familiar field set- motivations. Ultimately, the choice
tings do travel to the unfamiliar lab. If they between an abstract script and one with
do exist in the field, and do not travel, then field referents must be guided by the
evidence from the lab might be misleading. research question.
“Context” Is Not a Dirty Word. One tra- This simple point can be made more
dition in experimental economics is to use forcefully by arguing that the passion for
scripts that abstract from any field counter- abstract scripts may in fact result in less con-
part of the task. The reasoning seems to be trol than context-ridden scripts. It is not the
that this might contaminate behavior, and case that abstract, context-free experiments
that any observed behavior could not then provide more general findings if the context
be used to test general theories. There is itself is relevant to the performance of sub-
logic to this argument, but context should jects. In fact, one would generally expect
not be jettisoned without careful consider- such context-free experiments to be unusu-
ation of the unintended consequences. ally tough tests of economic theory, since
Field referents can often help subjects there is no control for the context that sub-
overcome confusion about the task. jects might themselves impose on the
Confusion may be present even in settings abstract experimental task. This is just one
that experimenters think are logically or part of a general plea for experimental econ-
strategically transparent. If the subject does omists to take the psychological process of
not understand what the task is about, in “task representation” seriously.
dec04_Article 1 12/14/04 2:25 PM Page 1029

Harrison and List: Field Experiments 1029

This general point has already emerged in the specific information in the foreground
several areas of research in experimental eco- of the task (e.g., Ulric Neisser and Ira
nomics. Noticing large differences between Hyman 2000).42
contributions to another person and a charity At a more homely level, the “simple”
in between-subjects experiments that were choice of parameters can add significant
otherwise identical in structure and design, field context to lab experiments. The idea,
Catherine Eckel and Philip Grossman (1996, pioneered by Grether, Isaac, and Plott
p. 188ff.) drew the following conclusion: (1981, 1989), Grether and Plott (1984), and
Hong and Plott (1982), is to estimate param-
It is received wisdom in experimental econom- eters that are relevant to field applications
ics that abstraction is important. Experimental and take these into the lab.
procedures should be as context-free as possi-
ble, and the interaction among subjects should 5.4 The Nature of the Stakes
be carefully limited by the rules of the experi-
ment to ensure that they are playing the game One often hears the criticism that lab
we intend them to play. For tests of economic experiments involve trivial stakes, and that
theory, these procedural restrictions are critical. they do not provide information about
As experimenters, we aspire to instructions that agents’ behavior in the field if they faced
most closely mimic the environments implicit in serious stakes, or that subjects in the lab
the theory, which is inevitably a mathematic
abstraction of an economic situation. We are experiments are only playing with “house
careful not to contaminate our tests by unnec- money.”43 The immediate response to this
essary context. But it is also possible to use
experimental methodology to explore the
42
importance and consequence of context. A healthy counter-lashing was offered by Mahzarin
Economists are becoming increasingly aware Banaji and Robert Crowder (1989), who concede that
that social and psychological factors can only be needlessly artefactual designs are not informative. But
they conclude that “we students of memory are just as
introduced by abandoning, at least to some interested as anybody else in why we forget where we left
extent, abstraction. This may be particularly the car in the morning or in who was sitting across the
true for the investigation of other-regarding table at yesterday’s meeting. Precisely for this reason we
behavior in the economic arena. are driven to laboratory experimentation and away from
naturalistic observation. If the former method has been
disappointing to some after about 100 years, so should
Our point is simply that this should be a the latter approach be disappointing after about 2,000.
more general concern. Above all, the superficial glitter of everyday methods
should not be allowed to replace the quest for generalizable
Indeed, research in memory reminds us principles.” (p. 1193).
that subjects will impose a natural context 43
This problem is often confused with another issue:
on a task even if it literally involves “non- the validity and relevance of hypothetical responses in the
lab. Some argue that hypothetical responses are the only
sense.” Long traditions in psychology, no way that one can mimic the stakes found in the field.
doubt painful to the subjects, involved Conlisk (1989) runs an experiment to test the Allais
detecting how many “nonsense syllables” a Paradox with small, real stakes and finds that virtually no
subjects violated the predictions of expected utility theory.
subject could recall. The logic behind the Subjects drawn from the same population did violate the
use of nonsense was that the researchers “original recipe” version of the Allais Paradox with large,
were not interested in the role of specific hypothetical stakes. Conlisk (1989; p. 401ff.) argues that
inferences from this evidence confound hypothetical
semantic or syntactic context as an aid to rewards with the reward scale, which is true. Of course,
memory, and in fact saw those as nuisance one could run an experiment with small, hypothetical
variables to be controlled by the use of ran- stakes and see which factor is driving this result. Chinn-
Ping Fan (2002) did this, using Conlisk’s design, and found
dom syllables. Such experiments generated that subjects given low, hypothetical stakes tended to
a backlash of sorts in memory research, avoid the Allais Paradox, just as his subjects with low, real
with many studies focusing instead on stakes avoided it. Many of the experiments that find viola-
tions of the Allais Paradox in small, real stake settings
memory within a natural context, in which embed these choices in a large number of tasks, which
cues and frames could be integrated with could affect outcomes.
dec04_Article 1 12/14/04 2:25 PM Page 1030

1030 Journal of Economic Literature, Vol. XLII (December 2004)

point is perhaps obvious: increase the stakes exchange rates to the U.S. dollar prevailing
in the lab and see if it makes a difference at the time, these stakes were $1.90, $9.70,
(e.g., Elizabeth Hoffman, Kevin McCabe, and $48.40, respectively. In terms of average
and Vernon Smith (1996), or have subjects local monthly wages, they were equivalent to
earn their stakes in the lab (e.g., Rutström approximately 2.5 hours, 12.5 hours, and
and Williams 2000; and List 2004a), or seek 62.5 hours of work, respectively.
out lab subjects in developing countries for They conclude that there was no effect
whom a given budget is a more substantial on initial offer behavior in the first round,
fraction of their income (e.g., Steven but that the higher stakes did have an
Kachelmeier and Mohamed Shehata 1992; effect on offers as the subjects gained
Lisa Cameron 1999; and Robert Slonim and experience with subsequent rounds. They
Alvin Roth 1998). also conclude that acceptances were
Colin Camerer and Robin Hogarth (1999) greater in all rounds with higher payoffs,
review the issues here, identifying many but that they did not change over time.
instances in which increased stakes are asso- Their experiment is particularly significant
ciated with improved performance or less because they varied the stakes by a factor
variation in performance. But they also alert of 25 and used procedures that have been
us to important instances in which increased widely employed in comparable experi-
stakes do not improve performance, so that ments.46 On the other hand, one might
one does not casually assume that there will question if there was any need to go to the
be such an improvement. field for this treatment. Fifty subjects
Taking the Stakes to Subjects Who Are dividing roughly $50 per game is only
Relatively Poor. One of the reasons for run- $1,250, and this is quite modest in terms of
ning field experiments in poor countries is most experimental budgets. But fifty sub-
that it is easier to find subjects who are rela- jects dividing the monetary equivalent of
tively poor. Such subjects are presumably 62.5 hours is another matter. If we assume
more motivated by financial stakes of a given $10 per hour in the United States for
level than subjects in richer countries. lower-skilled blue-collar workers or stu-
Slonim and Roth (1998) conducted bar- dents, that is $15,625, which is substantial
gaining experiments in the Slovak Republic but feasible.47
to test for the effect of “high stakes” on Similarly, consider the “high payoff”
behavior.44 The bargaining game they stud- experiments from China reported by
ied entails one person making an offer to the Kachelmeir and Shehata (1992) (KS).
other person, who then decides whether to These involved subjects facing lotteries
accept it. Bargaining was over a pie worth 60 with prizes equal to 0.5 yuan, 1 yuan, 5
Slovak Crowns (Sk) in one session, a pie yuan, or 10 yuan, and being asked to state
worth 300 Sk in another session, and a pie certainty-equivalent selling prices using
worth 1500 Sk in a third session.45 At the “BDM” mechanism due to Gordon
Becker, Morris DeGroot, and Jacob
44
Their subjects were students from universities, so Marschak (1964). Although 10 yuan only
one could question how “nonstandard” this population is.
But the design goal was to conduct the experiment in a converted to about $2.50 at the time of the
country in which the wage rates were low relative to the experiments, this represented a consider-
United States (p. 569), rather than simply conduct the able amount of purchasing power in that
same experiment with students from different countries as
in Roth et al. (1991).
45 46
Actually, the subjects bargained over points which Harrison (2005a) reconsiders their conclusions.
47
were simply converted to currency at different exchange For July 2002 the Bureau of Labor Statistics estimat-
rates. This procedure seems transparent enough, and ed average private sector hourly wages in the United States
served to avoid possible focal points defined over differing at $16.40, with white-collar workers earning roughly $4
cardinal ranges of currency. more and blue-collar workers roughly $2 less than that.
dec04_Article 1 12/14/04 2:25 PM Page 1031

Harrison and List: Field Experiments 1031

region of China, as discussed by KS (p. has repeatedly stressed the importance of


1123). Their results support several conclu- recruiting subjects who have some field
sions. First, the coefficients for lotteries experience with the task or who have an
involving low win probabilities imply interest in the particular task. His experi-
extreme risk loving. This is perfectly plausi- ments have generally involved imposing
ble given the paltry stakes involved in such institutions on the subjects who are not
lotteries using the BDM elicitation proce- familiar with the institution, since the
dure. Second, “bad joss,” as measured by objective of the early experiments was to
the fraction of random buying prices below study new ways of overcoming free-rider
the expected buying price of 50 percent of bias. But his choice of commodity has usu-
the prize, is associated with a large increase ally been driven by a desire to confront sub-
in risk-loving behavior.48 Third, experience jects with stakes and consequences that are
with the general task increases risk aver- natural to them. In other words, his experi-
sion. Fourth, increasing the prize from 5 ments illustrate how one can seek out sub-
yuan to 10 yuan increases risk aversion sig- ject pools for whom certain stakes are
nificantly. Of course, this last result is con- meaningful.
sistent with non-constant RRA, and should Bohm (1972) is a landmark study that had
not be necessarily viewed as a problem a great impact on many researchers in the
unless one insisted on applying the same areas of field public-good valuation and
CRRA coefficient over these two reward experimentation on the extent of free-riding.
domains. The commodity was a closed-circuit broad-
Again, however, the question is whether cast of a new Swedish TV program. Six elic-
one needed to go to nonstandard popula- itation procedures were used. In each case
tions in order to scale up the stakes to draw except one, the good was produced, and
these conclusions. Using an elicitation pro- the group was able to see the program, if
cedure different than the BDM procedure, aggregate WTP (willingness to pay)
Holt and Laury (2002) undertake conven- equaled or exceeded a known total cost.
tional laboratory experiments in which they Every subject received SEK50 upon arrival
scale up stakes and draw the same conclu- at the experiment, broken down into stan-
sions about experience and stake size. Their dard denominations. Bohm employed five
scaling factors are generally twenty com- basic procedures for valuing his com-
pared to a baseline level, although they also modity.49 No formal theory is provided to
conducted a handful of experiments with
factors as high as fifty and ninety. The over-
all cost of these scale treatments was 49
In Procedure I the subject pays according to his stat-
$17,000, although $10,000 was sufficient for ed WTP. In Procedure II the subject pays some fraction of
their primary results with a scaling of twen- stated WTP, with the fraction determined equally for all in
the group such that total costs are just covered (and the
ty. These are not cheap experiments, but fraction is not greater than one). In Procedure III the pay-
budgets of this kind are now standard for ment scheme is unknown to subjects at the time of their
many experimenters. bid. In Procedure IV each subject pays a fixed amount. In
Procedure V the subject pays nothing. For comparison, a
Taking the Task to the Subjects Who Care quite different Procedure VI was introduced in two stages.
About It. Bohm (1972; 1979; 1984a,b; 1994) The first stage, denoted VI:1, approximates a CVM, since
nothing is said to the subject as to what considerations
would lead to the good being produced or what it would
48
Although purely anecdotal, our own experience is cost him if it was produced. The second stage, VI:2,
that many subjects faced with the BDM task believe that involves subjects bidding against what they think is a group
the buying price depends in some way on their selling of 100 for the right to see the program. This auction is con-
price. To mitigate such possible perceptions, we have ducted as a discriminative auction, with the ten highest
tended to use physical randomizing devices that are less bidders actually paying their bid and being able to see the
prone to being questioned. program.
dec04_Article 1 12/14/04 2:25 PM Page 1032

1032 Journal of Economic Literature, Vol. XLII (December 2004)

generate free-riding hypotheses for these SEK500 in group 2 would be excluded from
procedures. 50 The major result from enjoying the good.
Bohm’s study was that bids were virtually In group 1 a subject has an incentive to
identical for all institutions, averaging understate only if he conjectures that the sum
between SEK7.29 and SEK10.33. of the contributions of others in his group is
Bohm (1984a) uses two procedures that greater than or equal to total cost minus his
elicit a real economic commitment, albeit true valuation. Total cost was known to be
under different (asserted) incentives for SEK200,000, but the contributions of (many)
free-riding. He implemented this experi- others must be conjectured. It is not possible
ment in the field with local government to say what the extent of free-riding is in this
bureaucrats bidding on the provision of a case without further information as to expec-
new statistical service from the Central tations that were not observed. In group 2
Bureau of Statistics.51 The two procedures only those subjects who actually stated a
are used to extract a lower and an upper WTP greater than or equal to SEK500 might
bound, respectively, to the true average have had an incentive to free-ride. Forty-nine
WTP for an actual good. Each agent in subjects reported exactly SEK500 in group 2,
group 1 was to state his individual WTP, and whereas 93 reported a WTP of SEK500 or
his actual cost would be a percentage of that higher. Thus the extent of free-riding in
stated WTP such that costs for producing group 2 could be anywhere from 0 percent (if
the good would be covered exactly. This per- those reporting SEK500 indeed had a true
centage could not exceed 100 percent. WTP of exactly that amount) to 53 percent
Subjects in group 2 were asked to state their (49 free-riders out of 93 possible free-riders).
WTP. If the interval estimated for total stat- The main result reported by Bohm (1984a)
ed WTP equaled or exceeded the (known) is that the average WTP interval between the
total cost, the good was to be provided and two groups was quite small. Group 1 had an
subjects in group 2 would pay only SEK500. average WTP of SEK827 and group 2 an
Subjects bidding zero in group 1 or below average WTP of SEK889, for an interval that
is only 7.5 percent of the smaller average
WTP of group 1. Thus the conclusion in this
50
Procedure I is deemed the most likely to generate case must be that if free-riding incentives
strategic under-bidding (p. 113), and procedure V the were present in this experiment, they did not
most likely to generate strategic over-bidding. The other
procedures, with the exception of VI, are thought to lie make much of a difference to the outcome.
somewhere in between these two extremes. Explicit admo- One can question, however, the extent to
nitions against strategic bidding were given to subjects in which these results generalize. The subjects
procedures I, II, IV, and V (see p. 119, 127–29). Although
no theory is provided for VI:2, it can be recognized as a were representatives of local governments,
multiple-unit auction in which subjects have independent and it was announced that all reported WTP
and private values. It is well-known that optimal bids for values would be published. This is not a fea-
risk-neutral agents can be well below the true valuation of
the agent in a Nash Equilibrium, and will never exceed the ture of most surveys used to study public pro-
true valuation (e.g., bidders truthfully reveal demand for grams, which often go to great lengths to
the first unit, but understate demand for subsequent units ensure subject confidentiality. On the other
to influence the price). Unfortunately there is insufficient
information to be able to say how far below true valuations hand, the methodological point is clear: some
these optimal bids will be, since we do not know the con- subjects may simply care more about under-
jectured range of valuations for subjects. List and Lucking- taking certain tasks, and in many field set-
Reiley (2000) use a framed field experiment to test for
demand reduction in the field and find significant demand tings this is not difficult to identify. For
reduction. example, Juan Cardenas (2003) collects
51
In addition, he conducted some comparable experi- experimental data on common pool extrac-
ments in a more traditional laboratory setting, albeit for a
non-hypothetical good (the viewing of a pilot of a TV tion from participants that have direct, field
show). experience extracting from a common pool
dec04_Article 1 12/14/04 2:25 PM Page 1033

Harrison and List: Field Experiments 1033

resource. Similarly, Jeffrey Carpenter, Amrita We consider here two potentially important
Daniere, and Lois Takahashi (2003) conduct parts of the experimental environment: the
social dilemma experiments with urban slum physical place of the actual experiment, and
dwellers who face daily coordination and col- whether subjects are informed that they are
lective action problems, such as access to taking part in an experiment.
clean water and solid waste disposal. Experimental Site. The relationship
between behavior and the environmental
6. Natural Field Experiments context in which it occurs refers to one’s
6.1 The Nature of the Environment physical surroundings (viz., noise level,
extreme temperatures, and architectural
Most of the stimuli a subject encounters in design) as well as the nature of the human
a lab experiment are controlled. The labora- intervention (viz., interaction with the
tory, in essence, is a pristine environment experimental monitor). For simplicity and
where the only thing varied is the stressor in concreteness, we view the environment as a
which one is interested.52 Indeed, some labo- whole rather than as a bundle of stimuli. For
ratory researchers have attempted to expunge example, a researcher interested in the
all familiar contextual cues as a matter of con- labels attached to colors may expose sub-
trol. This approach is similar to mid-twenti- jects to color stimuli under sterile laborato-
eth-century psychologists who attempted to ry conditions (e.g., Brent Berlin and Paul
conduct experiments in “context-free” envi- Kay 1969). A field experimenter, and any
ronments: egg-shaped enclosures where tem- artist, would argue that responses to color
peratures and sound were properly regulated stimuli could very well be different from
(Lowenstein 1999, p. F30). This approach those in the real world, where colors occur
omits the context in which the stressor is nor- in their natural context (e.g., Anna
mally considered by the subject. In the “real Wierzbicka 1996, ch. 10). We argue that, to
world” the individual is paying attention not fully examine such a situation, the laborato-
only to the stressor, but also to the environ- ry should not be abandoned but supple-
ment around him and various other influ- mented with field research. Since it is often
ences. In this sense, individuals have natural difficult to maintain proper experimental
tools to help cope with several influences, procedures in the field, laboratory work is
whereas these natural tools are not available often needed to eliminate alternatives and
to individuals in the lab, and thus the full to refine concepts.
effect of the stressor is not being observed. Of course, the emphasis on the interrelat-
An ideal field experiment not only increas- edness of environment and behavior should
es external validity, but does so in a manner not be oversold: the environment clearly
in which little internal validity is foregone.53 constrains behavior, providing varying
52 options in some instances, and influences
Of course, the stressor could be an interaction of two
treatments. behavior more subtly at other times.
53
We do not like the expression “external validity.” However, people also cope by changing their
What is valid in an experiment depends on the theoretical environments. A particular arrangement of
framework that is being used to draw inferences from the
observed behavior in the experiment. If we have a theory space, or the number of windows in an
that (implicitly) says that hair color does not affect behav- office, may affect employee social interac-
ior, then any experiment that ignores hair color is valid tion. One means of changing interaction is to
from the perspective of that theory. But one cannot iden-
tify what factors make an experiment valid without some change the furniture arrangement or win-
priors from a theoretical framework, which is crossing into dow cardinality, which of course changes the
the turf of “internal validity.” Note also that the “theory” environment’s effect on the employees.
we have in mind here should include the assumptions
required to undertake statistical inference with the exper- Environment-behavior relationships are
imental data. more or less in flux continuously.
dec04_Article 1 12/14/04 2:25 PM Page 1034

1034 Journal of Economic Literature, Vol. XLII (December 2004)

Experimental Proclamation. Whether sub- performed identically on achievement tests


jects are informed that they are taking part (R. Rosenthal and L. Jacobsen 1968), teachers’
in an experiment may be an important factor. expectations based on the labeling led to dif-
In physics, the Heisenberg Uncertainty ferences in student performance. Krueger
Principle reminds us that the act of meas- (1999) offers a dissenting view, arguing that
urement and observation alters that which is Hawthorne Effects are unlikely.
being measured and observed. In the study Project Star studied class sizes in Tennessee
of human subjects, a related, though distinct, schools. Teachers in the schools with smaller
concept is the Hawthorne Effect. It suggests classes were informed that if their students
“… that any workplace change, such as a performed well, class sizes would be reduced
research study, makes people feel important statewide. If not, they would return to their
and thereby improves their performance.”54 earlier levels. In other words, Project Star’s
The notion that agents may alter their teachers had a powerful incentive to improve
behavior when observed by others, especial- student performance that would not exist
ly when they know what the observer is under ordinary circumstances. Recent empiri-
looking for, is not novel to the Hawthorne cal results have shown that students performed
Effect. Other terminology includes “inter- better in smaller classrooms. Caroline Hoxby
personal self-fulfilling prophecies” and the (2000) reported on a natural experiment using
“Pygmalion Effect.” data from a large sample of Connecticut
Studies that claim to demonstrate the exis- schools which was free from the bias of the
tence of the Hawthorne Effect include Phyllis experiment participants knowing about the
Gimotty (2002), who used a treatment that study’s goal. She found no effect of smaller
reminded physicians to refer women for free class sizes. Using data from the same natural
mammograms. In this treatment she observed experiment, Krueger (1999) did find a positive
declining referral rates from the beginning of effect from small class sizes. Similarly, Angrist
the twelve-month study to the end. This result and Lavy (1999) find a positive effect in Israel,
led her to argue that the results were “consis- exploiting data from a natural experiment
tent with the Hawthorne Effect where a tem- “designed” by ancient rabbinic dogma.
porary increase in referrals is observed in Who Makes the Decisions? Many decisions
response to the initiation of the breast cancer in life are not made by individuals. In some
control program.” Many other studies, ranging cases “households” arrive at a decision, which
from asthma incidence to education to crimi- can be variously characterized as the out-
nal justice, have attributed empirical evidence come of some cooperative or noncooperative
to support the concept of the Hawthorne process. In some cases, groups, such as com-
Effect. For example, in an experiment in edu- mittees, make decisions. To the extent that
cation research in the 1960s where some chil- experimenters focus on individual decision-
dren were labeled as high performers and making when group decision-making is more
others low performers, when they had actually natural, there is a risk that the results will be
misleading. Similarly, even if the decision is
54
From P. G. Benson (2000, p. 688). The Hawthorne made by an individual, there is a possibility of
Effect was first demonstrated in an industrial/organiza-
tional psychological study by Professor Elton Mayo of the social learning or “cheap talk” advice to aid
Harvard Business School at the Hawthorne Plant of the the decision. Laboratory experimenters have
Western Electric Company in Cicero, Illinois, from 1927 begun to study this characteristic of field
to 1932. Researchers were confounded by the fact that
productivity increased each time a change was made to the decision-making, in effect taking one of the
lighting no matter if it was an increase or a decrease. What characteristics of naturally occurring field
brought the Hawthorne Effect to prominence in behav- environments back into the lab: for example,
ioral research was the publication of a major book in 1939
describing Mayo’s research by his associates F. J. see Gary Bornstein and Ilan Yaniv (1998),
Roethlisberger and William J. Dickson. James Cox and Stephen Hayne (2002), and T.
dec04_Article 1 12/14/04 2:25 PM Page 1035

Harrison and List: Field Experiments 1035

Parker Ballinger, Michael Palumbo, and specific agenda was designed to generate
Nathaniel Wilcox (2003). the preferred outcome to Levine.
Plott and Levine (1978) took this field
6.2 Three Examples of Minimally Invasive
result back into the lab, as well as to the the-
Experiments
ory chalkboard. This process illustrates the
Committees in the Field. Michael Levine complementarity we urge in all areas of
and Charles Plott (1977) report on a field research with lab and field experiments.
experiment they conducted on members of Betting in the Field. Camerer (1998) is a
a flying club in which Levine was a mem- wonderful example of a field experiment
ber.55 The club was to decide on a particular that allowed the controls necessary for an
configuration of planes for the members, experiment, but otherwise studied naturally
and Levine wanted help designing a fair occurring behavior. He recognized that
agenda to deal with this problem. Plott sug- computerized betting systems allowed bets
gested to Levine that there were many fair to be placed and cancelled before the race
agendas, each of which would lead to a dif- was run. Thus he could try to manipulate
ferent outcome, and suggested choosing the the market by placing bets in certain ways to
one that got the outcome Levine desired. move the market odds, and then cancelling
Levine agreed, and the agenda was designed them. The cancellation keeps his net budg-
using principles that Plott understood from et at zero, and in fact is one of the main
committee experiments (but not agenda treatments—to see if such a temporary bet
experiments, which had never been affects prices appreciably. He found that it
attempted at that stage). The parameters did not, but the methodological cleanliness
assumed about the field were from Levine’s of the test is remarkable. It is also of inter-
impressions and his chatting among mem- est to see that the possibility of manipulat-
bers. The selected agenda was implemented ing betting markets in this way was
and Levine got what he wanted: the group motivated in part by observations of such
even complemented him on his work. efforts in laboratory counterparts (p. 461).
A controversy at the flying club followed The only issue is how general such oppor-
during the process of implementing the group tunities are. This is not a criticism of their
decision. The club president, who did not like use: serendipity has always been a hand-
the choice, reported to certain decision-mak- maiden of science. One cannot expect that
ers that the decision was something other all problems of interest can be addressed in
than the actual vote. This resulted in another a natural setting in such a minimally invasive
polling of the group, using a questionnaire manner.
that Plott was allowed to design. He designed Begging in the Field. List and Lucking-
it to get the most complete and accurate pic- Reiley (2002) designed charitable solicita-
ture possible of member preferences. tions to experimentally compare outcomes
Computation and laboratory experiments, between different seed-money amounts and
using induced values with the reported pref- different refund rules by using three differ-
erences, demonstrated that in the lab the ent seed proportion levels: 10 percent, 33
outcomes were essentially as predicted. percent, or 67 percent of the $3,000
Levine and Plott (1977) counts as a “min- required to purchase a computer. These pro-
imally invasive” field experiment, at least in portions were chosen to be as realistic as
the ex ante sense, since there is evidence possible for an actual fundraising campaign
that the members did not know that the while also satisfying the budget constraints
they were given for this particular fundraiser.
55
We are grateful to Charles Plott for the following They also experimented with the use of a
account of the events “behind the scenes.” refund, which guarantees the individual her
dec04_Article 1 12/14/04 2:25 PM Page 1036

1036 Journal of Economic Literature, Vol. XLII (December 2004)

money back if the goal is not reached by the treatment to another. In treatment 10NR,
group. Thus, potential donors were assigned for example, the first of two crucial sen-
to one of six treatments, each funding a dif- tences read as follows: “We have already
ferent computer. They refer to their six obtained funds to cover 10 percent of the
treatments as 10, 10R, 33, 33R, 67, and 67R, cost for this computer, so we are soliciting
with the numbers denoting the seed-money donations to cover the remaining $2,700.” In
proportion, and R denoting the presence of treatments where the seed proportion dif-
a refund policy. fered from 10 percent, the 10 percent and
In carrying out their field experiments, $2,700 numbers were changed appropriate-
they wished to solicit donors in a way that ly. The second crucial sentence stated: “If we
matched, as closely as possible, the current fail to raise the $2,700 from this group of 500
state of the art in fundraising. With advice individuals, we will not be able to purchase
from fundraising companies Donnelley the computer, but we will use the received
Marketing in Englewood, Colorado, and funds to cover other operating expenditures
Caldwell in Atlanta, Georgia, they followed of CEPA.” The $2,700 number varied with
generally accepted rules believed to maxi- the seed proportion, and in refund treat-
mize overall contributions. First, they pur- ments this sentence was replaced with: “If
chased the names and addresses of 3,000 we fail to raise the $2,700 from this group of
households in the Central Florida area that 500 individuals, we will not be able to pur-
met two important criteria: 1) annual house- chase the computer, so we will refund your
hold income above $70,000, and 2) house- donation to you.” All other sentences were
hold was known to have previously given to a identical across the six treatments.
charity (some had in fact previously given to In this experiment the responses from
the University of Central Florida). They then agents were from their typical environments,
assigned 500 of these names to each of the six and the subjects were not aware that they
treatments. Second, they designed an attrac- were participating in an experiment.
tive brochure describing the new center and
its purpose. Third, they wrote a letter of solic- 7. Social Experiments
itation with three main goals in mind: making
7.1 What Constitutes a Social Experiment
the letter engaging and easy to read, promot-
in Economics?
ing the benefits of a proposed Center for
Environmental Policy Analysis (CEPA), and Robert Ferber and Warner Hirsch (1982,
clearly stating the key points of the experi- p. 7) define social experiments in economics
mental protocol. In the personalized letter, as “… a publicly funded study that incorpo-
they noted CEPA’s role within the Central rates a rigorous statistical design and whose
Florida community, the total funds required experimental aspects are applied over a peri-
to purchase the computer, the amount of od of time to one or more segments of a
seed money available, the number of solicita- human population, with the aim of evaluating
tions sent out, and the refund rule (if any). the aggregate economic and social effects of
They also explained that contributions in the experimental treatments.” In many
excess of the amount required for the com- respects this definition includes field experi-
puter would be used for other purposes at ments and even lab experiments. The point
CEPA, noted the tax deductibility of the con- of departure for social experiments seems to
tribution, and closed the letter with contact be that they are part of a government agency’s
information in case the donors had questions. attempt to evaluate programs by deliberate
The text of the solicitation letter was com- variations in agency policies. Thus they typi-
pletely identical across treatments, except cally involve variations in the way that the
for the variables that changed from one agency does its normal business, rather than
dec04_Article 1 12/14/04 2:25 PM Page 1037

Harrison and List: Field Experiments 1037

de novo programs. This characterization fits determining punitive damages in the civil
well with the tradition of large-scale social lawsuits generated by the Exxon Valdez oil
experiments in the 1960s and 1970s, dealing spill. It is also playing a major role in ongo-
with negative income taxes, employment pro- ing efforts by some corporations to affect
grams, health insurance, electricity pricing, “tort reform” with respect to limiting appeal
and housing allowances.56 bonds for punitive awards and even caps on
In recent years the lines have become punitive awards.
blurred. Government agencies have been
7.2 Methodological Lessons
using experiments to examine issues or poli-
cies that have no close counterpart, so that The literature on social experiments has
their use cannot be viewed as variations on a been the subject of sustained methodologi-
bureaucratic theme. Perhaps the most cal criticism. Unfortunately, this criticism
notable social experiments in recent years has created a false tension between the use
have been paired-audit experiments to iden- of experiments and the use of econometrics
tify and measure discrimination. These applied to field data. We believe that virtual-
involve the use of “matched pairs” of indi- ly all of the criticisms of social experiments
viduals, who are made to look as much alike potentially apply in some form to field
as possible apart from the protected charac- experiments unless they are run in an ideal
teristics (e.g., race). These pairs then con- manner, so we briefly review the important
front the target subjects, who are employers, ones. Indeed, many of them also apply to
landlords, mortgage loan officers, or car conventional lab experiments.
salesmen. The majority of audit studies con- Recruitment and the Evaluation Problem.
ducted to date have been in the fields of Heckman and Smith (1995, p. 87) go to the
employment discrimination and housing dis- heart of the role of experiments in a social-
crimination (see P. A. Riach and J. Rich 2002 policy setting, when they note that “the
for a review).57 strongest argument in favor of experiments
The lines have also been blurred by open is that under certain conditions they solve
lobbying efforts by private companies to the fundamental evaluation problem that
influence social-policy change by means of arises from the impossibility of observing
experiments. Exxon funded a series of exper- what would happen to a given person in
iments and surveys, collected by Jerry both the state where he or she receives a
Hausman (1993), to ridicule the use of the treatment (or participates in a program) and
contingent valuation method in environmen- the state where he or she does not. If a per-
tal damage assessment. This effort was in son could be observed in both states, the
response to the role that such surveys poten- impact of the treatment on that person
tially played in the criminal action brought could be calculated by comparing his or
by government trustees after the Exxon her outcomes in the two states, and the
Valdez oil spill. Similarly, ExxonMobil fund- evaluation problem would be solved.”
ed a series of experiments and focus groups, Randomization to treatment is the means by
collected in Cass Sunstein et al. (2002), to which social experiments solve this problem
ridicule the way in which juries determine if one assumes that the act of randomizing
punitive damages. This effort was in subjects to treatment does not lead to a clas-
response to the role that juries played in sic sample selection effect, which is to say
that it does not “alter the pool of participants
56
See Ferber and Hirsch (1978, 1982) and Jerry of their behavior” (p. 88).
Hausman and David Wise (1985) for wonderful reviews. Unfortunately, randomization could plau-
57
Some discrimination studies have been undertaken
by academics with no social-policy evaluation (e.g., Chaim sibly lead to either of these outcomes, which
Fershtman and Uri Gneezy 2001 and List 2004b). are not fatal but do necessitate the use of
dec04_Article 1 12/14/04 2:25 PM Page 1038

1038 Journal of Economic Literature, Vol. XLII (December 2004)

“econometric(k)s.” We have discussed “for” the transaction with the experimenter.


already the possibility that the use of ran- Of course, they were bookies, and hence
domization could attract subjects to experi- selected for that occupation.
ments that are less risk-averse than the A variant on the recruitment problem
population, if the subjects rationally antici- occurs in settings where subjects are
pate the use of randomization. It is well- observed over a period of time, and attrition
known in the field of clinical drug trials that is a possibility. Statistical methods can be
persuading patients to participate in random- developed to use differential attrition rates
ized studies is much harder than persuading as valuable information on how subjects
them to participate in nonrandomized stud- value outcomes (e.g., see Philipson and
ies (e.g., Michael Kramer and Stanley Hedges 1998).
Shapiro 1984). The same problem applies to Substitution and the Evaluation Problem.
social experiments, as evidenced by the diffi- The second assumption underlying the valid-
culties that can be encountered when ity of social experiments is that “close substi-
recruiting decentralized bureaucracies to tutes for the experimental treatment are not
administer the random treatment (e.g., V. readily available” (Heckman and Smith
Joseph Hotz 1992). James Heckman and 1995, p. 88). If they are, then subjects who
Richard Robb (1985) note that the refusal are placed in the control group could opt for
rate in one randomized job-training program the substitutes available outside the experi-
was over 90 percent, with many of the mental setting. The result is that outcomes in
refusals citing ethical concerns with adminis- the control no longer show the effect of “no
tering a random treatment. treatment,” but instead the effect of “possi-
What relevance does this have for field or ble access to an uncontrolled treatment.”
lab experiments? The answer is simple: we Again, this is not a fatal problem, but one
do not know, since it has not been systemat- that has to be addressed explicitly. In fact, it
ically studied. On the other hand, field has arisen already in the elicitation of prefer-
experiments have one major advantage if ences over field commodities, as discussed in
they involve the use of subjects in their nat- section 4.
ural environment, undertaking tasks that Experimenter Effects. In social experi-
they are familiar with, since no sample selec- ments, given the open nature of the political
tion is involved at the first level of the exper- process, it is almost impossible to hide the
iment. In conventional lab experiments experimental objective from the person
there is sample selection at two stages: the implementing the experiment or the subject.
decision to attend college, and then the deci- The paired-audit experiments are perhaps
sion to participate in the experiment. In arte- the most obvious targets of this, since the
factual field experiments, as we defined the “treatments” themselves have any number of
term in section 1, the subject selects to be in ways to bring about the conclusion that is
the naturally occurring environment and favored by the research team conducting the
then in the decision to be in the experiment. experiment. In this instance, the Urban
So the artefactual field experiment shares Institute makes no bones about its view that
this two-stage selection process with conven- discrimination is a widespread problem and
tional lab experiments. However, the natural that paired-audit experiments are a critical
field experiment has only one source of pos- way to address it (e.g., a casual perusal of
sible selection bias: the decision to be in the Michael Fix and Raymond Struyk 1993).
naturally occurring market. Hence the book- There is nothing wrong with this, apart from
ies that accepted the contrived bets of the fact that it is hard to imagine how volun-
Camerer (1998) had no idea that he was con- teer auditors would not see things similarly.
ducting an experiment, and did not select Indeed, Heckman (1998, p. 104) notes that
dec04_Article 1 12/14/04 2:25 PM Page 1039

Harrison and List: Field Experiments 1039

“auditors are sometimes instructed on the expressed their concerns, summarized by


‘problem of discrimination in American soci- Bohm (1984b, p. 136) as follows:
ety’ prior to sampling firms, so they may
have been coached to find what audit agen- They reported that they had held meetings of
cies wanted them to find.” The opportunity their own and had decided (1) that they did not
accept the local government’s decision not to
for unobservables to influence the outcome provide them with regular bus service on regu-
are potentially rampant in this case. lar terms; (2) that they did not accept the idea of
Of course, simple controls could be having to pay in a way that differs from the way
designed to address this issue. One could that “everybody else” pays (bus service is subsi-
have different test-pairs visit multiple loca- dized in the area)—the implication being that
they would rather go without this bus service,
tions to help identify the effect of a given even if their members felt it would be worth the
pair on the overall measure of discrimina- costs; (3) that they would not like to help in real-
tion. The variability of measured discrimi- izing an arrangement that might reduce the
nation across audit pairs is marked, and level of public services provided free or at low
raises statistical issues, as well as issues of costs. It was argued that such an arrangement, if
accepted here, could spread to other parts of the
interpretation (e.g., see Heckman and public sector; and (4) on these grounds, they
Siegelman 1993). Another control could be advised their union members to abstain from
to have an artificial location for the audit participating in the project.
pair to visit, where their “unobservables”
could be “observed” and controlled in later This fascinating outcome is actually more
statistical analyses. This procedure is used relevant for experimental economics in gen-
in a standard manner in private business eral than it might seem.58
concerned with measuring the quality of When certain institutions are imposed on
customer relations in the field. subjects, and certain outcomes tabulated, it
One stunning example of experimenter does not necessarily follow that the out-
effects from Bohm (1984b) illustrates what comes of interest for the experimenter are
can happen when the subjects see a meta- the ones that are of interest to the subject.59
game beyond the experiment itself. In 1980 For example, Isaac and Smith (1985)
he undertook a framed field experiment for observe virtually no instances of predatory
a local government in Stockholm that was pricing in a partial equilibrium market in
considering expanding a bus route to a which the prey had no alternative market to
major hospital and a factory. The experiment escape to at the first taste of blood. In a
was to elicit valuations from people who comparable multi-market setting in which
were naturally affected by this route, and to subjects could choose to exit markets for
test whether their aggregate contributions other markets, Harrison (1988) observed
would make it worthwhile to provide the many instances of predatory pricing.
service. A key feature of the experiment was
7.3 Surveys that Whisper in the Ears of
that the subjects would have to be willing to
Princes
pay for the public good if it was to be pro-
vided for a trial period of six months. Field surveys are often undertaken to eval-
Everyone who was likely to contribute was uate environmental injury. Many involve con-
given information on the experiment, but trolled treatments such as “scope tests” of
when it came time for the experiment virtu-
ally nobody turned up! The reason was that
the local trade unions had decided to boy- 58
It is a pity that Bohm (1984b) himself firmly catego-
cott the experiment, since it represented a rized this experiment as a failure, although one can under-
stand that perspective.
threat to the current way in which such serv- 59
See Philipson and Hedges (1998) for a general statis-
ices were provided. The union leaders tical perspective on this problem.
dec04_Article 1 12/14/04 2:25 PM Page 1040

1040 Journal of Economic Literature, Vol. XLII (December 2004)

changes in the extent of the injury, or differ- to inform. The second paragraph is intended
ences in the valuation placed on the injury.60 to convince the recipient of their importance
Unfortunately, such surveys suffer from the to the study. The idea here is to explain that
fact that they do not ask subjects to make a their name has been selected as one of a
direct economic commitment, and that this small sample, and that for the sample to be
will likely generate an inflated valuation representative they need to respond. The
report.61 However, many field surveys are goal is clearly to put some polite pressure on
designed to avoid the problem of hypothetical the subject to make sure that their socio-eco-
bias, by presenting the referenda as “adviso- nomic characteristic set is represented.
ry.” Great care is often taken in the selection The third paragraph ensures confidential-
of motivational words in cover letters, open- ity, so that the subject can ignore any possi-
ing survey questions, and key valuation ques- ble repercussion from responding one way or
tions, to encourage the subject to take the the other in a “politically incorrect” manner.
survey seriously in the sense that their Although seemingly mundane, this assur-
response will “count.”62 To the extent that ance can be important when the researcher
they achieve success in this, these surveys interprets the subject as responding to the
should be considered social experiments. question at hand rather than uncontrolled
Consider the generic cover letter advocat- perceptions of repercussions. It also serves
ed by Don Dillman (1978, pp. 165ff.) for use to mimic the anonymity of the ballot box.
in mail surveys. The first paragraph is The fourth paragraph builds on the pre-
intended to convey something about the ceding three to drive home the usefulness of
social usefulness of the study: that there is the survey response itself, and the possibility
some policy issue that the study is attempting that it will influence behavior:

The fourth paragraph of our cover letter reem-


60
Scope treatments might be employed if there is some phasizes the basic justification for the study—its
scientific uncertainty about the extent of the injury to the social usefulness. A somewhat different
environment at the time of the valuation, as in the two sce-
approach is taken here, however, in that the
narios used in the survey of the Kakadu Conservation Zone
in Australia reported in David Imber, Gay Stevenson, and intent of the researcher to carry through on any
Leanne Wilks (1991). Or they may be used to ascertain promises that are made, often the weakest link
some measure of the internal validity of the elicited valua- in making study results useful, is emphasized. In
tions, as discussed by Carson (1997) and V. Kerry Smith {an example cover letter in the text} the promise
and Laura Osborne (1996). Variations in the valuation are (later carried out) was made to provide results to
the basis for inferring the demand curve for the environ- government officials, consistent with the lead
mental curve, as discussed by Glenn Harrison and Bengt paragraph, which included a reference to bills
Kriström (1996). being considered in the State Legislature and
61
See Cummings and Harrison (1994), Cummings,
Congress. Our basic concern here is to make the
Harrison, and Rutström (1995), and Cummings et al. (1997).
62
There are some instances in which the agency under- promise of action consistent with the original
taking the study is deliberately kept secret to the respon- social utility appeal. In surveys of particular
dent. For example, this strategy was adopted by Carson et communities, a promise is often made to pro-
al. (1992) in their survey of the Exxon Valdez oil spill vide results to the local media and city officials.
undertaken for the attorney-general of the state of Alaska. (Dillman 1978, p. 171)
They in fact asked subjects near the end of the survey who
they thought had sponsored the study, and only 11 percent
responded correctly (p. 91). However, 29 percent thought From our perspective, the clear intent and
that Exxon had sponsored the study. Although no explicit effect of these admonitions is to attempt to
connection was made to suggest who would be using the convince the subject that their response will
results, it is therefore reasonable to presume that at least
40 percent of the subjects expected the responses to go have some probabilistic bearing on actual
directly to one or another of the litigants in this well- outcomes.
known case. Of course, that does not ensure that the This generic approach has been used, for
responses will have a direct impact, since there may have
been some (rational) expectation that the case would settle example, in the CVM study of the Nestucca
without the survey results being entered as evidence. oil spill by Rowe et al. (1991). Their cover
dec04_Article 1 12/14/04 2:25 PM Page 1041

Harrison and List: Field Experiments 1041

letter contained the following sentences in Although no promise of a direct policy


the opening and penultimate paragraphs: impact is made, the survey responses are
obviously valued in this instance by the
Government and industry officials throughout
the Pacific Northwest are evaluating programs
agency charged with directly and publically
to prevent oil spills in this area. Before making advising the relevant politicians on the matter.
decisions that may cost you money, these offi- It remains an open question if these “advi-
cials want your input. … The results of this study sory referenda” actually motivate subjects to
will be made available to representatives of state, respond truthfully, although that is obviously
provincial and federal governments, and indus-
try in the Pacific Northwest. (emphasis added)
something that could be studied systemati-
cally as part of the exercise or using con-
In the key valuation question, subjects are trolled laboratory and field experiments.64
motivated by the following words: 8. Natural Experiments
Your answers to the next questions are very 8.1 What Constitutes a Natural Experiment
important. We do not yet know how much it will in Economics?
cost to prevent oil spills. However, to make deci-
sions about new oil spill prevention programs Natural experiments arise when the
that could cost you money, government and experimenter simply observes naturally
industry representatives want to learn how much occurring, controlled comparisons of one
it is worth to people like you to avoid more spills. or more treatments with a baseline.65 The
common feature of these experiments is
These words reinforce the basic message serendipity: policy makers, nature, or tele-
of the cover letter: there is some probabili- vision game-show producers66 conspire to
ty, however small, that the response of the
subject will have an actual impact. 64
Harrison (2005b) reviews the literature.
More direct connections to policy impact 65
Good examples in economics include H. E. Frech
occur when the survey is openly undertaken (1976); Roth (1991); Jere Behrman, Mark Rosenzweig, and
for a public agency charged with making the Paul Taubman (1994); Stephen Bronars and Jeff Grogger
(1994); Robert Deacon and Jon Sonstelie (1985); Andrew
policy decision. For example, the Resource Metrick (1995); Bruce Meyer, W. Kip Viscusi, and David
Assessment Commission of Australia was Durbin (1995); John Warner and Saul Pleeter (2001); and
charged with making a decision on an applica- Mitch Kunce, Shelby Gerking, and William Morgan (2002).
66
Smith (1982; p. 929) compared the advantages of lab-
tion to mine in public lands, and used a survey oratory experiments to econometric practice, noting that
to help it evaluate the issue. The cover letter, “Over twenty-five years ago, Guy Orcutt characterized the
signed by the chairperson of the commission econometrician as being in the same predicament as the
electrical engineer who has been charged with the task of
under the letterhead of the commission, deducing the laws of electricity by listening to a radio play.
spelled out the policy setting clearly: To a limited extent, econometric ingenuity has provided
some techniques for conditional solutions to inference
The Resource Assessment Commission has problems of this type.” Arguably, watching the television
been asked by the Prime Minister to conduct an can be an improvement on listening to the radio, since TV
inquiry into the use of the resources of the game shows provide a natural avenue to observe real deci-
Kakadu Conservation Zone in the Northern sions in an environment with high stakes. J. B. Berk, E.
Hughson, and K. Vandezande (1996) and Rafael Tenorio
Territory and to report to him on this issue by and Timothy Cason (2002) study contestants’ behavior on
the end of April 1991.63 … You have been The Price Is Right to investigate rational decision theory
selected randomly to participate in a national and whether subjects play the unique subgame perfect
survey related to this inquiry. The survey will be Nash equilibrium. R. Gertner (1993) and R. M. W. J.
asking the views of 2500 people across Beetsma and P. C. Schotman (2001) make use of data from
Australia. It is important that your views are Card Sharks and Lingo to examine individual risk prefer-
recorded so that all groups of Australians are ences. Steven Levitt (2003) and List (2003) use data from
included in the survey. (Imber, Stevenson, and The Weakest Link and Friend or Foe to examine the nature
Wilks 1991, p. 102) and extent of disparate treatment among game-show con-
testants. And Metrick (1995) uses data from Jeopardy! to
analyze behavior under uncertainty and players’ ability to
63
The cover letter was dated August 28, 1990. choose strategic best-responses.
dec04_Article 1 12/14/04 2:25 PM Page 1042

1042 Journal of Economic Literature, Vol. XLII (December 2004)

generate these comparisons. The main elected to take the annuity, one could infer
attraction of natural experiments is that that his discount rate was less than the
they reflect the choices of individuals in a threshold.67
natural setting, facing natural conse- This design is essentially the same as one
quences that are typically substantial. The used in a long series of laboratory experi-
main disadvantage of natural experiments ments studying the behavior of college stu-
derives from their very nature: the experi- dents.68 Comparable designs have been
menter does not get to pick and choose the taken into the field, such as the study of the
specifics of the treatments, and the experi- Danish population by Harrison, Lau, and
menter does not get to pick where and Williams (2002). The only difference is that
when the treatments will be imposed. The the field experiment evaluated by WP
first problem may result in low power to offered each individual only one discount
detect any responses of interest, as we rate: Harrison, Lau, and Williams offered
illustrate with a case study in section 8.2 each subject twenty different discount
below. While there is a lack of control, we rates, ranging between 2.5 percent and 50
should obviously not look a random gift percent.
horse in the mouth when it comes to mak- Five features of this natural experiment
ing inferences. There are some circum- make it particularly compelling for the pur-
stances, briefly reviewed in section 8.3, pose of estimating individual discount
when nature provides useful controls to rates. First, the stakes were real. Second,
augment those from theory or “manmade” the stakes were substantial and dwarf any-
experimentation. thing that has been used in laboratory
experiments with salient payoffs in the
United States. The average lump-sum
8.2 Inferring Discount Rates by Heroic
amounts were around $50,000 and $25,000
Extrapolation
for officers and enlisted personnel, respec-

In 1992, the United States Department


of Defense started offering substantial early
67
retirement options to nearly 300,000 indi- Warner and Pleeter (2001) recognize that one prob-
viduals in the military. This voluntary sepa- lem of interpretation might arise if the very existence of
the scheme signaled to individuals that they would be
ration policy was instituted as part of a forced to retire anyway. As it happens, the military also
general policy of reducing the size of the significantly tightened up the rules governing “progres-
military as part of the “Cold War dividend.” sion through the ranks,” so that the probability of being
involuntarily separated from the military increased at the
John Warner and Saul Pleeter (2001) (WP) same time as the options for voluntary separation were
recognize how the options offered to mili- offered. This background factor could be significant,
tary personnel could be viewed as a natural since it could have led to many individuals thinking that
they were going to be separated from the military anyway
experiment with which one could estimate and hence deciding to participate in the voluntary
individual discount rates. In general terms, scheme even if they would not have done so otherwise.
one option was a lump-sum amount, and Of course, this background feature could work in any
direction, to increase or decrease the propensity of a
the other option was an annuity. The indi- given individual to take one or the other option. In any
vidual was told what the cut-off discount event, WP allow for the possibility that the decision to
rate was for the two to be actuarially equal, join the voluntary separation process itself might lead to
sample selection issues. They estimate a bivariate probit
and this concept was explained in various model, in which one decision is to join the separation
ways. If an individual is observed to take process and the other decision is to take the annuity
the lump-sum, one could infer that his dis- rather than the lump-sum.
68
See Coller and Williams (1999), and Shane
count rate was greater than the threshold Frederick, George Loewenstein, and Ted O’Donoghue
rate. Similarly, for those individuals that (2002), for recent reviews of those experiments.
dec04_Article 1 12/14/04 2:25 PM Page 1043

Harrison and List: Field Experiments 1043

tively.69 Third, the military went to some iment are simply not robust to the sampling
lengths to explain to everyone the financial and predictive uncertainty of having to use
implications of choosing one option over an estimated model to infer discount rates.
the other, making the comparison of per- We use the same method as WP (2001, table
sonal and threshold discount rates relative- 6, p. 48) to calculate estimated discount
ly transparent. Fourth, the options were rates.71 In their table 3, WP calculate the
offered to a wide range of officers and mean predicted discount rate from a single-
enlisted personnel, such that there are sub- equation probit model, using only the dis-
stantial variations in key demographic vari- count rate as an explanatory variable,
ables such as income, age, race, and employing a shortcut formula that correctly
education. Fifth, the time horizon for the evaluates the mean discount rate. After each
annuity differed in direct proportion to the probit equation is estimated, it is used to
years of military service of the individual, predict the probability that each individual
so that there are annuities between four- would accept the lump-sum alternative at
teen and thirty years in length. This facili- discount rates varying between 0 percent
tates evaluation of the hypothesis that and 100 percent in increments of 1 percent-
discount rates are stationary over different age point. For example, consider a 5 percent
time horizons. discount rate offered to officers, and the
WP conclude that the average individual results of the single-equation probit model.
discount rates implied by the observed sep- Of the 11,212 individuals in this case, 72 per-
aration choices were high relative to a priori cent are predicted to have a probability of
expectations for enlisted personnel. In one accepting the lump-sum of 0.5 or greater.
model in which the after-tax interest rate The lowest predicted probability of accept-
offered to the individual appears in linear ance for any individual at this rate is 0.207,
form, they predict average rates of 10.4 per- and the highest is 0.983.
cent and 35.4 percent for officers and enlist- Similar calculations are undertaken for
ed personnel, respectively. However, this each possible discount rate between 0 per-
model implicitly allows estimated discount cent and 100 percent, and the results tabu-
rates to be negative, and indeed allows them lated. Once the predicted probabilities of
to be arbitrarily negative. In an alternative acceptance are tabulated for each of the
model in which the interest rate term individuals offered the buy-out, and each
appears in logarithmic form, and one possible discount rate between 0 percent
implicitly imposes the a priori constraint and 100 percent, we loop over each individ-
that an elicited individual discount rate be ual and identify the smallest discount rate at
positive, they estimate average rates of 18.7 which the lump-sum would be accepted.
percent and 53.6 percent, respectively. We This smallest discount rate is precisely
prefer the estimates that impose this prior where the probit model predicts that this
belief, although nothing below depends on individual would be indifferent between the
using them.70 lump-sum and the annuity. This provides a
We show that many of the conclusions distribution of estimated minimum discount
about discount rates from this natural exper- rates, one for each individual in the sample.
In figure 2 we report the results of this
69 calculation, showing the distribution of
92 percent of the enlisted personnel accepted the
lump-sum, and 51 percent of the officers. However, these personal discount rates initially offered to
acceptance rates varied with the interest rates offered, the subjects and then the distributions
particularly for enlisted personnel. implied by the single-equation probit
70
Harrison (2005a) documents the detailed calcula-
tions involved, and examines the differences that arise with
71
alternative specifications and samples. John Warner kindly provided the data.
dec04_Article 1 12/14/04 2:25 PM Page 1044

1044 Journal of Economic Literature, Vol. XLII (December 2004)

Fraction Estimated
.025 1

.02 .8

Fraction Offered
.015 .6

.01 .4

.005 .2

0 0
0 20 40 60 80 100
Discount Rate in Percent
Estimated Offered

Figure 2. Offered and Estimated Discount Rates in Warner and Pleeter Natural Experiments

model used by WP. 72 These results pool roughly centered on the distribution of
the data for all separating personnel. The offered rates, but much more dispersed.
grey histogram shows the after-tax discount There is nothing “wrong” with these differ-
rates that were offered, and the black his- ences between the offered and estimated
togram shows the discount rates inferred discount rates, although they will be critical
from the estimated “log-linear” model that when we calculate standard errors on these
constrains discount rates to be positive. estimated discount rates. Again, the estimat-
Given the different shapes of the his- ed rates in figure 2 are based on the logic
tograms, they use different vertical axes to described above: no prediction error is
allow simple visual comparisons. assumed from the estimated statistical
The main result is that the distribution of model when it is applied at the level of the
estimated discount rates is much wider than individual to predict the threshold rate at
the distribution of offered rates. Harrison which the lump-sum would be accepted.
(2005a) presents separate results for the The main conclusion of WP is contained
samples of officers and enlisted personnel, in their table 6, which lists estimates of the
and for the alternative specifications consid- average discount rates for various groups of
ered by WP. For enlisted personnel the dis- their subjects. Using the model that imposes
tribution of estimated rates is almost entirely the a priori restriction that discount rates be
out-of-sample in comparison to the offered positive, they report that the average dis-
rates above it. The distribution for officers is count rate for officers was 18.7 percent, and
53.6 percent for enlisted personnel. What
72
Virtually identical results are obtained with the are the standard errors on these means?
model that corrects for possible sample-selection effects. There is reason to expect that they could be
dec04_Article 1 12/14/04 2:25 PM Page 1045

Harrison and List: Field Experiments 1045

quite large, due to constraints on the scope distributions only reflects sampling over the
of the natural experiment. individuals. One can generate standard
Individuals were offered a choice errors that also capture the uncertainty in
between a lump-sum and an annuity. The the probit model coefficients as well.
before-tax discount rate that just equated Figure 3 displays the results of taking into
the present value of the two instruments account the uncertainty about the coeffi-
ranged between 17.5 percent and 19.8 per- cients of the estimated model used by WP.
cent, which is a very narrow range of dis- Since it is an important dimension to consid-
count rates. The after-tax equivalent rates er, we show the time horizon for the elicited
ranged from a low of 14.5 percent up to 23.5 discount rates on the horizontal axis.74 The
percent for those offered the separation middle line shows a cubic spline through the
option, but over 99 percent of the after-tax predicted average discount rate. The top
rates were between 17.6 percent and 20.4 (bottom) line shows a cubic spline through
percent. Thus the above inferences about the upper (lower) bound of the 95 percent
average discount rates for enlisted person- confidence interval, allowing for uncertainty
nel are “out of sample,” in the sense that in the individual predictions due to reliance
they do not reflect direct observation of on an estimated statistical model to infer dis-
responses at those rates of 53.6 percent, or count rates.75 Thus, in figure 3 we see that
indeed at any rates outside the interval (14.5 there is considerable uncertainty about the
percent, 23.5 percent). Figure 2 illustrates discount rates for enlisted personnel, and
this point as well, since the right mode is that it is asymmetric. On balance, the model
entirely due to the estimates of enlisted per- implies a considerable skewness in the dis-
sonnel. The average for enlisted personnel tribution of rates for enlisted personnel,
therefore reflects, and relies on, the predic- with some individuals having extremely high
tive power of the parametric functional implied discount rates. Turning to the
forms fitted to the observed data. The same results for officers, we find much less of an
general point is true for officers, but the effect from model uncertainty. In this case
problem is far less severe. the rates are relatively precisely inferred,
Even if one accepted the parametric func- particularly around the range of rates span-
tional forms (probit), the standard errors of ning the effective rates offered, as one
predictions outside of the sample range of would expect.76
break-even discount rates will be much larg- We conclude that the results for enlisted
er than those within the sample range.73 The personnel are too imprecisely estimated for
standard errors of the predicted response
can be calculated directly from the estimat- 74
The time horizon of the annuity offered to individu-
ed model. Note that this is not the same as als in the field varied directly with the years of military
the distribution shown in figure 2, which is a service completed. For each year of service the horizon on
the annuity was two years longer. As a result, the annuities
distribution over the sample of individuals at being considered by individuals were between fourteen
each simulated discount rate that assume and thirty years in length. With roughly 10 percent of the
that the model provides a perfect prediction sample at each horizon, the average annuity horizon was
around 22 years.
for each individual. In other words, the pre- 75
In fact, we calculate rates only up to 100 percent, so
dictions underlying figure 2 just use the the upper confidence intervals for the model is con-
average prediction for each individual as the strained to equal 100 percent for that reason. It would be
a simple matter to allow the calculation to consider higher
truth, so the sampling error reflected in the rates, but there would be little inferential value in doing so.
76
It is a standard result from elementary econometrics
that the forecast interval widens as one uses the regression
73
Relaxing the functional form also allows some addi- model to predict for values of the exogenous variables that
tional uncertainty into the estimation of individual dis- are further and further away from their average (e.g.,
count rates. William Greene 1993, p. 164–66).
dec04_Article 1 12/14/04 2:25 PM Page 1046

1046 Journal of Economic Literature, Vol. XLII (December 2004)

Officers Enlisted Personnel


100 100

75 75

50 50

25 25

0 0
15 20 25 30 15 20 25 30
time horizon of annuity in years time horizon of annuity in years

Figure 3. Implied Discount Rates Incorporating Model Uncertainty

them to be used to draw reliable inferences “natural natural experimental approach” by


about the discount rates. However, the Rosenzweig and Wolpin (2000).
results for officers are relatively tightly esti- For example, monozygotic twins are effec-
mated, and can be used to draw more reli- tively natural clones of each other at birth.
able inferences. The reason for the lack of Thus one can, in principle, compare out-
precision in the estimates for enlisted per- comes for such twins to see the effect of dif-
sonnel is transparent from the design, which ferences in their history, knowing that one
was obviously not chosen by the experi- has a control for abilities that were innate at
menters: the estimates rely on out-of-sam- birth. Of course, a lot of uncontrolled and
ple predictions, and the standard errors unobserved things can occur after birth and
embodied in figure 3 properly reflect the before humans get to make choices that are
uncertainty of such an inference. of any policy interest. So the use of such
instruments obviously requires additional
8.3 Natural Instruments
assumptions, beyond the a priori plausible
Some variable or event is said to be a one that the natural biological event that led
good instrument for unobserved factors if it to these individuals being twins was inde-
is orthogonal to those factors. Many of the pendent of the efficacy of their later educa-
difficulties of “manmade” random treat- tional and labor-market experiences. Thus
ments have been discussed in the context of the lure of “measurement without theory” is
social experiments. However, in recent clearly illusory.
years many economists have turned to Another concern with the “natural instru-
“nature-made” random treatments instead, ments” approach is that it often relies on the
employing an approach to the evaluation of assumption that only one of the explanatory
treatments that has come to be called the variables is correlated with the unobserved
dec04_Article 1 12/14/04 2:25 PM Page 1047

Harrison and List: Field Experiments 1047

factors.77 This means that only one instru- market behavior, they are positing “what if”
ment is required, which is fortunate since scenarios which need not be tethered to
nature is a stingy provider of such instru- reality. Sometimes theorists constrain their
ments. Apart from twins, natural events that propositions by the requirement that they be
have been exploited in this literature “operationally meaningful,” which only
include birth dates, gender, and even weath- requires that they be capable of being refut-
er events, and these are not likely to grow ed, and not that anyone has the technology
dramatically over time. or budget to actually do so.
Both of these concerns point the way to a Tests of expected utility theory have pro-
complementary use of different methods of vided a dramatic illustration of the impor-
experimentation, much as econometricians tance of thought experiments being
use a priori identifying assumptions as a explicitly linked to stochastic assumptions
substitute for data in limited information involved in their use. Several studies offer a
environments. rich array of different error specifications
leading to very different inferences about
9. Thought Experiments the validity of expected utility theory, and
particularly about what part of it appears to
Thought experiments are extremely com- be broken: Ballinger and Wilcox (1997);
mon in economics, and would seem to be Enrica Carbonne (1997); David Harless
fundamentally different from lab and field and Camerer (1994); Hey (1995); John Hey
experiments. We argue that they are not, and Chris Orme (1994); Graham Loomes,
drawing on recent literature examining the Peter Moffatt, and Sugden (2002); and
role of statistical specifications of experi- Loomes and Sugden (1995, 1998). The
mental tests of deterministic theories. methodological problem is that debates
Although it may surprise some, the compar- over the characterization of the residual
ison between lab experiments and field have come to dominate the substantive
experiments that we propose has analogues issues, as crisply drawn by Ballinger and
to the way thought experiments have been Wilcox (1997, p. 1102)78:
debated in analytic philosophy and the view
that thought experiments are just “attenuat- We know subjects are heterogeneous. The rep-
ed experiments.” Finally, we consider the resentative decision maker … restriction fails
place of measures of the natural functioning miserably both in this study and new ones ….
Purely structural theories permit heterogeneity
of the brain during artefactual experimental
by allowing several preference patterns, but are
conditions. mute when it comes to mean error rate variabil-
ity between or within patterns (restrictions like
9.1 Where Are the Econometric Instructions CE) and within-pattern heterogeneity of choice
to Test Theory? probabilities (restrictions like CH and ZWC).
We believe Occam’s Razor and the ‘Facts don’t
To avoid product liability litigation, it is kill theories, theories do’ cliches do not apply:
standard practice to sell commodities with CE, CH and ZWC are an atheoretical supporting
cast in dramas about theoretical stars, and poor
clear warnings about dangerous use and
showings by this cast should be excused neither
operating instructions designed to help one because they are simple nor because there are no
get the most out of the product. replacements. It is time to audition a new cast.
Unfortunately, the same is not true of eco-
nomic theories. When theorists undertake In this instance, a lot has been learned
thought experiments about individual or about the hidden implications of alternative
77 78
Rosenzweig and Wolpin (2000, p. 829, fn.4, and p. The notation in this quote does not need to be
873). defined for the present point to be made.
dec04_Article 1 12/14/04 2:25 PM Page 1048

1048 Journal of Economic Literature, Vol. XLII (December 2004)

stochastic specifications for experimental illustrated by example. We choose an exam-


tests of theory. But the point is that all of this ple in which there have been actual (lab and
could have been avoided if the thought field) experiments, but where the actual
experiments underlying the structural mod- experiments could have been preceded by a
els had accounted for errors and allowed for thought experiment. Specifically, consider
individual heterogeneity in preferences the identification of “trust” as a characteris-
from the outset. That relaxation does not tic of an individual’s utility function. In
rescue expected utility theory, nor is that the some studies this concept is defined as the
intent, but it does serve to make the experi- sole motive that leads a subject to transfer
mental tests informative for their intended money to another subject in an investment
purpose of identifying when and where that game.79 For example, Joyce Berg, John
theory fails. Dickhaut, and McCabe (1995) use the game
to measure “trust” by the actions of the first
9.2 Are Thought Experiments Just Slimmed- player and hence “trustworthiness” from
Down Experiments? the responses of the second player.80
But “trust” measured in this way obviously
Roy Sorenson (1992) presents an elabo-
suffers from at least one confound: aversion
rate defense of the notion that a thought
to inequality, or “other-regarding prefer-
experiment is really just an experiment “that
ences.” The idea is that someone may be
purports to achieve its aim without the ben-
averse to seeing different payoffs for the two
efit of execution” (p. 205). This lack of exe-
players, since roles and hence endowments
cution leads to some practical differences,
in the basic version are assigned at random.
such as the absence of any need to worry
This is one reason that almost all versions of
about luck affecting outcomes. Another dif-
the experiments have given each player the
ference is that thought experiments actually
same initial endowment to start, so that the
require more discipline if they are to be
first player does not invest money with the
valid. In his Nobel Prize lecture, Smith
second player just to equalize their payoffs.
(2003, p. 465) notes that:
But it is possible that the first player would
Doing experimental economics has changed the like the other player to have more, even if it
way I think about economics. There are many means having more than the first player.
reasons for this, but one of the most prominent Cox (2004) proposes that one pair the
is that designing and conducting experiments investment game with a dictator game81 to
forces you to think through the process rules and identify how much of the observed transfer
procedures of an institution. Few, like Einstein,
can perform detailed and imaginative mental from the first player is due to “trust” and
experiments. Most of us need the challenge of how much is due to “other-regarding prefer-
real experiments to discipline our thinking. ences.” Since there is strong evidence that

There are, of course, other differences 79


Player 1 transfers some percentage of an endowment
between the way that thought experiments to player 2, that transfer is tripled, and then player 2
and actual experiments are conducted and decides how much of the expanded pie to return.
80
This game has been embedded in many other set-
presented. But these likely have more to do tings before and after Berg, Dickhaut, and McCabe
with the culture of particular scholarly (1995). We do not question the use of this game in the
groups than anything intrinsic to each type investigation of broader assessments of the nature of
“social preferences,” which is an expression that subsumes
of experiment. many possible motives for the observed behavior, includ-
The manner in which thought experi- ing the ones discussed below.
81
ments can be viewed as “slimmed-down The first player transfers money to the second player,
who is unable to return it or respond in any way. Martin
experiments—ones that are all talk and no Dufwenberg and Gneezy (2000) also compare the trust
action” (Sorenson 1992, p. 190), is best and dictator games directly.
dec04_Article 1 12/14/04 2:25 PM Page 1049

Harrison and List: Field Experiments 1049

subjects appear to exhibit substantial aver- more to do with the aims and rhetorical
sion to inequality in experiments of this goals of doing experiments. As Sorenson
kind, do we need to actually run the experi- (1991, p. 205) notes:
ment in which the same subject participates
in a dictator game and an investment game The aim of any experiment is to answer or raise
to realize that “trust” is weakly overestimat- its question rationally. As stressed (earlier …),
the motives of an experiment are multifarious.
ed by the executed trust experiments? One One can experiment in order to teach a new
might object that we would not be able to technique, to test new laboratory equipment, or
make this inference without having run to work out a grudge against white rats. (The
some prior experiments in which subjects principal architect of modern quantum electro-
transfer money under dictator, so this design dynamics, Richard Feynman, once demonstrat-
ed that the bladder does not require gravity by
proposal of Cox (2004) does not count as a standing on his head and urinating.) The dis-
thought experiment. But imagine counter- tinction between aim and motive applies to
factually82 that Cox (2004) left it at that, and thought experiments as well. When I say that an
did not actually run an experiment. We experiment ‘purports’ to achieve its aim without
would still be able to draw the new inference execution, I mean that the experimental design
is presented in a certain way to the audience.
from his design that trust is weakly over-esti- The audience is being invited to believe that
mated in previous experiments if one contemplation of the design justifies an answer
accounts for the potential confound of to the question or (more rarely) justifiably raises
inequality aversion.83 Thus, in what sense its question.
should we view the thought experiment of
the proposed design of Cox (2004) as any- In effect, then, it is caveat emptor with
thing other than an attenuated version of the thought experiments—but the same homily
ordinary experiment that he actually surely applies to any experiment, even if
designed and executed? executed.
One trepidation with treating a thought
9.3 That’s Not a Thought Experiment …
experiment as just a slimmed-down experi-
This Is!
ment is that it is untethered by the reality of
“proof by data” at the end. But this has We earlier defined the word “field” in the
following manner: “used attributively to
82 denote an investigation, study, etc., carried
A thought experiment at work.
83
As it happens, there are two further confounds at out in the natural environment of a given
work in the trust design, each of which can be addressed. material, language, animal, etc., and not in
One is risk attitudes, at least as far as the interpretation of the laboratory, study, or office.” Thus, in an
the behavior of the first player is concerned. Sending
money to the other player is risky. If the first player keeps important sense, experiments that employ
all of his endowment, there is no risk. So a risk-loving play- methods to measure neuronal activity during
er would invest, just for the thrill. A risk-averse player controlled tasks would be included, since the
would not invest for this reason. But if there are other
motives for investing, then risk attitudes will exacerbate or functioning of the brain can be presumed to
temper them, and need to be taken into account when be a natural reaction to the controlled stim-
identifying the residual as trust. Risk attitudes play no role ulus. Neuroeconomics is the study of how
for the second player’s decision. The other confound, in
the proposed design of Cox (2004), is that the “price of different parts of the brain light up when
giving” in his proposed dictator game is $1 for $1 trans- certain tasks are presented, such as exposure
ferred, whereas it is $1 for $3 transferred in the invest- to randomly generated monetary gain or loss
ment game. Thus one would weakly understate the extent
of other-regarding preferences in his design, and hence in Hans Breiter et al. (2001), the risk elicita-
weakly overstate the residual “trust.” The general point is tion tasks of Kip Smith et al. (2002) and
even clearer: after these potential confounds are taken Dickhaut et al. (2003), the trust games of
into account, what faith does one have that a reliable
measure of trust has been identified statistically in the McCabe et al. (2001), and the ultimatum
original studies? bargaining games of Alan Sanfey et al.
dec04_Article 1 12/14/04 2:25 PM Page 1050

1050 Journal of Economic Literature, Vol. XLII (December 2004)

(2003). In many ways these methods are The main methodological conclusion we
extensions of the use of verbal protocols draw is that experimenters should be wary of
(speaking out loud as the task is performed) the conventional wisdom that abstract,
used by K. Anders Ericsson and Herbert imposed treatments allow general inferences.
Simon (1993) to study the algorithmic In an attempt to ensure generality and con-
processes that subjects were going through trol by gutting all instructions and procedures
as they solved problems, and the use of of field referents, the traditional lab experi-
mouse-tracking technology by Eric Johnson menter has arguably lost control to the extent
et al. (2002) to track sequential information that subjects seek to provide their own field
search in bargaining tasks. The idea is to referents. The obvious solution is to conduct
monitor some natural mental process as the experiments both ways: with and without nat-
experimental treatment is administered, urally occurring field referents and context.
even if the treatment is artefactual. If there is a difference, then it should be
studied. If there is no difference, one can
conditionally conclude that the field behavior
10. Conclusion
in that context travels to the lab environment.
We have avoided drawing a single, bright
line between field experiments and lab REFERENCES
experiments. One reason is that there are Angrist, Joshua D.; Guido W. Imbens and Donald B.
Rubin. 1996. “Identification of Casual Effects Using
several dimensions to that line, and inevitably Instrumental Variables,” J. Amer. Statist. Assoc.
there will be some trade-offs between those. 91:434, pp. 444–45.
The extent of those trade-offs will depend on Angrist, Joshua D. and Alan B. Krueger. 2001.
“Instrumental Variables and the Search for
where researchers fall in terms of their agree- Identification: From Supply and Demand to Natural
ment with the argument and issues we raise. Experiments,” J. Econ. Persp. 15:4, pp. 69–85.
Another reason is that we disagree where Angrist, Joshua D. and Victor Lavy. 1999. “Using
Maimonides’ Rule to Estimate the Effect of Class
the line would be drawn. One of us Size on Scholastic Achievement,” Quart. J. Econ.
(Harrison), bred in the barren test-tube set- 114:2, pp. 553–75.
ting of classroom labs sans ferns, sees virtu- Ballinger, T. Parker; Michael G. Palumbo and
Nathaniel T. Wilcox. 2003. “Precautionary Saving
ally any effort to get out of the classroom as and Social Learning Across Generations: An
constituting a field experiment to some use- Experiment,” Econ. J. 113:490, pp. 920–47.
ful degree. The other (List), raised in the Ballinger, T. Parker and Nathaniel T. Wilcox. 1997.
“Decisions, Error and Heterogeneity,” Econ. J.
wilds amidst naturally occurring sports-card 107:443, pp. 1090–105.
geeks, would include only those experiments Banaji, Mahzarin R. and Robert G. Crowder. 1989.
that used free-range subjects. Despite this “The Bankruptcy of Everyday Memory,” Amer.
Pyschol. 44, pp. 1185–93.
disagreement on the boundaries between Bateman, Ian; Alistair Munro, Bruce Rhodes, Chris
one category of experiments and another Starmer and Robert Sugden. 1997. “Does Part-
category, however, we agree on the charac- Whole Bias Exist? An Experimental Investigation,”
Econ. J. 107:441, pp. 322–32.
teristics that make a field experiment differ Becker, Gordon M.; Morris H. DeGroot and Jacob
from a lab experiment. Marschak. 1964. “Measuring Utility by a Single-
Using these characteristics as a guide, we Response Sequential Method,” Behav. Sci. 9:July,
pp. 226–32.
propose a taxonomy of field experiments Beetsma, R. M. W. J. and P. C. Schotman. 2001.
that helps one see their connection to lab “Measuring Risk Attitudes in a Natural Experiment:
experiments, social experiments, and natural Data from the Television Game Show Lingo,” Econ.
J. 111:474, pp. 821–48.
experiments. Many of the differences are Behrman, Jere R.; Mark R. Rosenzweig and Paul
illusory, such that the same issues of control Taubman. 1994. “Endowments and the Allocation of
apply. But many of the differences matter Schooling in the Family and in the Marriage Market:
The Twins Experiment,” J. Polit. Econ. 102:6, pp.
for behavior and inference, and justify the 1131–74.
focus on the field. Benson, P. G. 2000. “The Hawthorne Effect,” in The
dec04_Article 1 12/14/04 2:25 PM Page 1051

Harrison and List: Field Experiments 1051

Corsini Encyclopedia of Psychology and Behavioral Making: A Comparison of Students and


Science. Vol. 2, 3rd ed. W. E. Craighead and C. B. Businessmen in a Simulated Progressive Auction,” in
Nemeroff, eds. NY: Wiley. Research in Experimental Economics, Vol. 3. V. L.
Berg, Joyce E.; John Dickhaut and Kevin McCabe. Smith, ed. Greenwich, CT: JAI Press.
1995. “Trust, Reciprocity, and Social History,” Games Camerer, Colin F. 1998. “Can Asset Markets Be
Econ. Behav. 10, pp. 122–42. Manipulated? A Field Experiment with Racetrack
Berk, J. B.; E. Hughson and K. Vandezande. 1996. Betting,” J. Polit. Econ. 106:3, pp. 457–82.
“The Price Is Right, but Are the Bids? An Camerer, Colin and Robin Hogarth. 1999. “The Effects
Investigation of Rational Decision Theory,” Amer. of Financial Incentives in Experiments: A Review
Econ. Rev. 86:4, pp. 954–70. and Capital-Labor Framework,” J. Risk Uncertainty
Berlin, Brent and Paul Kay. 1969. Basic Color Terms: 19, pp. 7–42.
Their Universality and Evolution. Berkeley: UC Press. Cameron, Lisa A. 1999. “Raising the Stakes in the
Binswanger, Hans P. 1980. “Attitudes Toward Risk: Ultimatum Game: Experimental Evidence from
Experimental Measurement in Rural India,” Amer. J. Indonesia,” Econ. Inquiry 37:1, pp. 47–59.
Ag. Econ. 62:3, pp. 395–407. Carbone, Enrica. 1997. “Investigation of Stochastic
———. 1981. “Attitudes Toward Risk: Theoretical Preference Theory Using Experimental Data,” Econ.
Implications of an Experiment in Rural India,” Econ. Letters 57:3, pp. 305–11.
J. 91:364, pp. 867–90. Cardenas, Juan C. 2003. “Real Wealth and
Blackburn, McKinley; Glenn W. Harrison and E. Experimental Cooperation: Evidence from Field
Elisabet Rutström. 1994. “Statistical Bias Functions Experiments,” J. Devel. Econ. 70:2, pp. 263–89.
and Informative Hypothetical Surveys,” Amer. J. Ag. Carpenter, Jeffrey; Amrita Daniere and Lois Takahashi.
Econ. 76:5, pp. 1084–88. 2004. “Cooperation, Trust, and Social Capital in
Blundell, R. and M. Costa-Dias. 2002. “Alternative Southeast Asian Urban Slums,” J. Econ. Behav. Org.
Approaches to Evaluation in Empirical 55:4, pp. 533–51.
Microeconomics,” Portuguese Econ. J. 1, pp. 91–115. Carson, Richard T. 1997. “Contingent Valuation
Blundell, R. and Thomas MaCurdy. 1999. “Labor Surveys and Tests of Insensitivity to Scope,” in
Supply: A Review of Alternative Approaches,” in Determining the Value of Non-Marketed Goods:
Handbook of Labor Economics Vol. 3C. O. Economic, Psychological, and Policy Relevant Aspects
Ashenfelter and D. Card, eds. Amsterdam: Elsevier of Contingent Valuation Methods. R. J. Kopp, W.
Science BV. Pommerhene and N. Schwartz, eds. Boston: Kluwer.
Bohm, Peter. 1972. “Estimating the Demand for Public Carson, Richard T.; Robert C. Mitchell, W. Michael
Goods: An Experiment,” Europ. Econ. Rev. 3:2, pp. Hanemann, Raymond J. Kopp, Stanley Presser and
111–30. Paul A. Ruud. 1992. A Contingent Valuation Study of
———. 1979. “Estimating Willingness to Pay: Why Lost Passive Use Values Resulting From the Exxon
and How?” Scand. J. Econ. 81:2, pp. 142–53. Valdez Oil Spill. Anchorage: Attorney Gen. Alaska.
———. 1984a. “Revealing Demand for an Actual Chamberlin, Edward H. 1948. “An Experimental
Public Good,” J. Public Econ. 24, pp. 135–51. Imperfect Market,” J. Polit. Econ. 56:2, 95–108.
———. 1984b. “Are There Practicable Demand- Coller, Maribeth and Melonie B. Williams. 1999.
Revealing Mechanisms?” in Public Finance and the “Eliciting Individual Discount Rates,” Exper. Econ.
Quest for Efficiency. H. Hanusch, ed. Detroit: 2, pp. 107–27.
Wayne State U. Press. Conlisk, John. 1989. “Three Variants on the Allais
———. 1994. “Behavior under Uncertainty without Example,” Amer. Econ. Rev. 79:3, pp. 392–407.
Preference Reversal: A Field Experiment,” in ———. 1996. “Why Bounded Rationality?” J. Econ.
Experimental Economics. J. Hey, ed. Heidelberg: Lit. 34:2, pp. 669–700.
Physica-Verlag. Cox, James C. 2004. “How To Identify Trust and
Bohm, Peter and Hans Lind. 1993. “Preference Reciprocity,” Games Econ. Behav. 46:2, pp. 260–81.
Reversal, Real-World Lotteries, and Lottery- Cox, James C. and Stephen C. Hayne. 2002. “Barking
Interested Subjects,” J. Econ. Behav. Org. 22:3, pp. Up the Right Tree: Are Small Groups Rational
327–48. Agents?” work. pap. Dept. Econ. U. Arizona.
Bornstein, Gary and Ilan Yaniv. 1998. “Individual and Cubitt, Robin P. and Robert Sugden. 2001. “On
Group Behavior in the Ultimatum Game: Are Money Pumps,” Games Econ. Behav. 37:1, pp.
Groups More Rational Players?” Exper. Econ. 1:1. 121–60.
pp. 101–108. Cummings, Ronald G.; Steven Elliott, Glenn W.
Breiter, Hans C.; Itzhak Aharon, Daniel Kahneman, Harrison and James Murphy. 1997. “Are
Anders Dale, and Peter Shizgal. 2001. “Functional Hypothetical Referenda Incentive Compatible?” J.
Imaging of Neural Responses to Expectancy and Polit. Econ. 105:3, pp. 609–21.
Experience of Monetary Gains and Losses,” Neuron Cummings, Ronald G. and Glenn W. Harrison. 1994.
30:2, pp. 619–39. “Was the Ohio Court Well Informed in Their
Bronars, Stephen G. and Jeff Grogger. 1994. “The Assessment of the Accuracy of the Contingent
Economic Consequences of Unwed Motherhood: Valuation Method?” Natural Res. J. 34:1, pp. 1–36.
Using Twin Births as a Natural Experiment,” Amer. Cummings, Ronald G.; Glenn W. Harrison and Laura
Econ. Rev. 84:5, pp. 1141–56. L. Osborne. 1995. “Can the Bias of Contingent
Burns, Penny. 1985. “Experience and Decision Valuation Be Reduced? Evidence from the
dec04_Article 1 12/14/04 2:25 PM Page 1052

1052 Journal of Economic Literature, Vol. XLII (December 2004)

Laboratory,” econ. work. pap. B-95-03, College Preference: A Critical Review,” J. Econ. Lit. 40:2, pp.
Business Admin., U. South Carolina. 351–401.
Cummings, Ronald G.; Glenn W. Harrison and E. Gertner, R. 1993. “Game Shows and Economic
Elisabet Rutström. 1995. “Homegrown Values and Behavior: Risk-Taking on Card Sharks,” Quart. J.
Hypothetical Surveys: Is the Dichotomous Choice Econ. 108:2, pp. 507–21.
Approach Incentive Compatible?” Amer. Econ. Rev. Gigerenzer, Gerd; Peter M. Todd and the ABC
85:1, pp. 260–66. Research Group. 2000. Simple Heuristics That Make
Cummings, Ronald G. and Laura O. Taylor. 1999. Us Smart. NY: Oxford U. Press.
“Unbiased Value Estimates for Environmental Gimotty, Phyllis A. 2002. “Delivery of Preventive
Goods: A Cheap Talk Design for the Contingent Health Services for Breast Cancer Control: A
Valuation Method,” Amer. Econ. Rev. 89:3, pp. Longitudinal View of a Randomized Controlled
649–65. Trial,” Health Services Res. 37:1, pp. 65–85.
Deacon, Robert T. and Jon Sonstelie. 1985. “Rationing Greene, William H. 1993. Econometric Analysis, 2nd
by Waiting and the Value of Time: Results from a ed. NY: Macmillan.
Natural Experiment,” J. Polit. Econ. 93:4, pp. 627–47. Grether, David M.; R. Mark Isaac and Charles R. Plott.
Dehejia, Rajeev H. and Sadek Wahba. 1999. “Causal 1981. “The Allocation of Landing Rights by
Effects in Nonexperimental Studies: Reevaluating Unanimity among Competitors,” Amer. Econ. Rev.
the Evaluation of Training Programs,” J. Amer. Pap. Proceed. 71:May, pp. 166–71.
Statist. Assoc. 94:448, pp. 1053–62. ———. 1989. The Allocation of Scarce Resources:
———. 2002. “Propensity Score Matching for Experimental Economics and the Problem of
Nonexperimental Causal Studies,” Rev. Econ. Allocating Airport Slots. Boulder: Westview Press.
Statist. 84, pp. 151–61. Grether David M. and Charles R. Plott. 1984. “The
Dickhaut, John; Kevin McCabe, Jennifer C. Nagode, Effects of Market Practices in Oligopolistic Markets:
Aldo Rustichini and José V. Pardo. 2003. “The Impact An Experimental Examination of the Ethyl Case,”
of the Certainty Context on the Process of Choice,” Econ. Inquiry 22:Oct. pp. 479–507.
Proceed. Nat. Academy Sci. 100:March, pp. 3536–41. Haigh, Michael, and John A. List. 2004. “Do
Dillman, Don. 1978. mail and telephone surveys; The Professional Traders Exhibit Myopic Loss Aversion?
Total Design Method. NY: Wiley. An Experimental Analysis,” J. Finance 59, forthcom-
Duddy, Edward A. 1924 “Report on an Experiment in ing.
Teaching Method,” J. Polit. Econ. 32:5, pp. 582–603. Harbaugh, William T. and Kate Krause. 2000.
Dufwenberg, Martin and Uri Gneezy. 2000. “Children’s Altruism in Public Good and Dictator
“Measuring Beliefs in an Experimental Lost Wallet Experiments,” Econ. Inquiry 38:1, pp. 95–109.
Game,” Games Econ. Behav. 30:2, pp. 163–82. Harbaugh, William T.; Kate Krause and Timothy R.
Dyer, Douglas and John H. Kagel. 1996. “Bidding in Berry. 2001. “GARP for Kids: On the Development
Common Value Auctions: How the Commercial of Rational Choice Behavior,” Amer. Econ. Rev. 91:5,
Construction Industry Corrects for the Winner’s pp. 1539–45.
Curse,” Manage. Sci. 42:10, pp. 1463–75. Harbaugh, William T.; Kate Krause and Lise
Eckel, Catherine C. and Philip J. Grossman. 1996. Vesterlund. 2002. “Risk Attitudes of Children and
“Altruism in Anonymous Dictator Games,” Games Adults: Choices Over Small and Large Probability
Econ. Behav. 16, pp. 181–91. Gains and Losses,” Exper. Econ. 5, pp. 53–84.
Ericsson, K. Anders, and Herbert A. Simon. 1993. Harless, David W. and Colin F. Camerer. 1994 “The
Protocol Analysis: Verbal Reports as Data, rev. ed. Predictive Utility of Generalized Expected Utility
Cambridge, MA: MIT Press. Theories,” Econometrica 62:6, pp. 1251–89.
Fan, Chinn-Ping. 2002. “Allais Paradox in the Small,” J. Harrison, Glenn W. 1988. “Predatory Pricing in A
Econ. Behav. Org. 49:3, pp. 411–21. Multiple Market Experiment,” J. Econ. Behav. Org.
Ferber, Robert and Werner Z. Hirsch. 1978. “Social 9, pp. 405–17.
Experimentation and Economic Policy: A Survey,” J. ———. 1992a. “Theory and Misbehavior of First-Price
Econ. Lit. 16:4, pp. 1379–414. Auctions: Reply,” Amer. Econ. Rev. 82:5, 1426–43.
———. 1982. Social Experimentation and Economic ———. 1992b. “Market Dynamics, Programmed
Policy. NY: Cambridge U. Press. Traders, and Futures Markets: Beginning the
Fershtman, Chaim and Uri Gneezy. 2001. Laboratory Search for a Smoking Gun,” Econ.
“Discrimination in a Segmented Society: An Record 68, Special Issue Futures Markets, pp. 46–62.
Experimental Approach,” Quart. J. Econ. 116, pp. ———. 2005a. “Field Experiments and Control,” in
351–77. Field Experiments in Economics. J. Carpenter, G. W.
Fix, Michael and Raymond J. Struyk, eds. 1993. Clear Harrison and J. A. List, eds. Research in Exper.
and Convincing Evidence: Measurement of Econ. Vol. 10. Greenwich, CT: JAI Press,
Discrimination in America. Washington, DC: Urban ———. 2005b “Experimental Evidence on
Institute Press. Alternative Environmental Valuation Methods,”
Frech, H. E. 1976. “The Property Rights Theory of the Environ. Res. Econ. 23, forthcoming.
Firm: Empirical Results from a Natural Harrison, Glenn W.; Ronald M. Harstad and E.
Experiment,” J. Polit. Econ. 84:1, pp. 143–52. Elisabet Rutström. 2004. “Experimental Methods
Frederick, Shane; George Loewenstein and Ted and Elicitation of Values,” Exper. Econ. 7:June, pp.
O’Donoghue. 2002. “Time Discounting and Time 123–40.
dec04_Article 1 12/14/04 2:25 PM Page 1053

Harrison and List: Field Experiments 1053

Harrison, Glenn W. and Bengt Kriström. 1996. “On the Henrich, Joseph and Richard McElreath. 2002. “Are
Interpretation of Responses to Contingent Valuation Peasants Risk-Averse Decision Markers?” Current
Surveys,” in Current Issues in Environmental Anthropology 43:1, pp. 172–81.
Economics. P. O. Johansson, B. Kriström and K. G. Hey, John D. 1995. “Experimental Investigations of
Mäler, eds. Manchester: Manchester U. Press. Errors in Decision Making Under Risk,” Europ.
Harrison, Glenn W.; Morten Igel Lau and Melonie B. Econ. Rev. 39, pp. 633–40.
Williams. 2002. “Estimating Individual Discount Hey, John D. and Chris Orme. 1994. “Investigating
Rates for Denmark: A Field Experiment,” Amer. Generalizations of Expected Utility Theory Using
Econ. Rev. 92:5, pp. 1606–17. Experimental Data,” Econometrica 62:6, pp.
Harrison, Glenn W. and James C. Lesley. 1996. “Must 1291–326.
Contingent Valuation Surveys Cost So Much?” J. Hoffman, Elizabeth; Kevin A. McCabe and Vernon L.
Environ. Econ. Manage. 31:1, pp. 79–95. Smith. 1996. “On Expectations and the Monetary
Harrison, Glenn W. and John A. List. 2003. “Naturally Stakes in Ultimatum Games,” Int. J. Game Theory
Occurring Markets and Exogenous Laboratory 25:3, pp. 289–301.
Experiments: A Case Study of the Winner’s Curse,” Holt, Charles A. and Susan K. Laury. 2002. “Risk
work. pap. 3-14, Dept. Econ., College Bus. Admin., Aversion and Incentive Effects,” Amer. Econ. Rev.
U. Central Florida. 92:5, pp. 1644–55.
Harrison, Glenn W.; Thomas F. Rutherford and David Hong, James T. and Charles R. Plott. 1982. “Rate
G. Tarr. 1997. “Quantifying the Uruguay Round,” Filing Policies for Inland Water Transportation: An
Econ. J. 107:444, pp. 1405–30. Experimental Approach,” Bell J. Econ. 13:1, pp.
Harrison, Glenn W. and Elisabet Rutström. 2001. 1–19.
“Doing It Both Ways—Experimental Practice and Hotz, V. Joseph. 1992. “Designing an Evaluation of
Heuristic Context,” Behav. Brain Sci. 24:3, pp. JTPA,” in Evaluating Welfare and Training
413–14. Programs. C. Manski and I. Garfinkel, eds.
Harrison, Glenn W. and H. D. Vinod. 1992. “The Cambridge: Harvard U. Press.
Sensitivity Analysis of Applied General Equilibrium Hoxby, Caroline M. 2000. “The Effects of Class Size on
Models: Completely Randomized Factorial Sampling Student Achievement: New Evidence From
Designs,” Rev. Econ. Statist. 74:2, pp. 357–62. Population Variation,” Quart. J. Econ. 115:4, pp.
Hausman, Jerry A. 1993. Contingent Valuation. NY: 1239–85.
North-Holland. Imber, David; Gay Stevenson and Leanne Wilks. 1991.
Hausman, Jerry A. and David A. Wise. 1985. Social A Contingent Valuation Survey of the Kakadu
Experimentation Chicago: U. Chicago Press. Conservation Zone Canberra: Austral. Govt. Pub.,
Hayes, J. R. and H. A. Simon. 1974. “Understanding Resource Assess. Com.
Written Problem Instructions,” in Knowledge and Isaac, R. Mark and Vernon L. Smith. 1985. “In Search
Cognition. L. W. Gregg, ed. Hillsdale, NJ: Erlbaum. of Predatory Pricing,” J. Polit. Econ. 93:2, pp.
Heckman, James J. 1998. “Detecting Discrimination,” 320–45.
J. Econ. Perspect. 12:2, pp. 101–16. Johnson, Eric J.; Colin F. Camerer, Sen Sankar and
Heckman, James J. and Richard Robb. 1985. Talia Tymon. 2002. “Detecting Failures of Backward
“Alternative Methods for Evaluating the Impact of Induction: Monitoring Information Search in
Interventions,” in Longitudinal Analysis of Labor Sequential Bargaining,” J. Econ. Theory 104:1, pp.
Market Data. J. Heckman and B. Singer, eds. NY: 16–47.
Cambridge U. Press. Kachelmeier, Steven J. and Mohamed Shehata. 1992.
Heckman, James J. and Peter Siegelman. 1993. “The “Examining Risk Preferences Under High Monetary
Urban Institute Audit Studies: Their Methods and Incentives: Experimental Evidence from the
Findings,” in Clear and Convincing Evidence: People’s Republic of China,” Amer. Econ. Rev. 82:5,
Measurement of Discrimination in America. M. Fix pp. 1120–41.
and R. J. Struyk, eds. Washington, DC: Urban Kagel, John H.; Raymond C. Battalio and Leonard
Institute Press. Green. 1995. Economic Choice Theory. An
Heckman, James J. and Jeffrey A. Smith. 1995. Experimental Analysis of Animal Behavior. NY:
“Assessing the Case for Social Experiments,” J. Econ. Cambridge U. Press.
Perspect. 9:2, pp. 85–110. Kagel, John H.; Raymond C. Battalio and James M.
Henrich, Joseph. 2000. “Does Culture Matter in Walker. 1979. “Volunteer Artifacts in Experiments in
Economic Behavior? Ultimatum Game Bargaining Economics: Specification of the Problem and Some
Among the Machiguenga,” Amer. Econ. Rev. 90:4, Initial Data from a Small-Scale Field Experiment,”
pp. 973–79. in Research in Experimental Economics. Vol. 1. V.L.
Henrich, Joseph; Robert Boyd, Samuel Bowles, Colin Smith, ed. Greenwich, CT: JAI Press.
Camerer, Herbert Gintis, Richard McElreath and Kagel, John H.; Ronald M. Harstad and Dan Levin.
Ernst Fehr. 2001. “In Search of Homo Economicus: 1987. “Information Impact and Allocation Rules in
Experiments in 15 Small-Scale Societies,” Amer. Auctions with Affiliated Private Values: A Laboratory
Econ. Rev. 91:2, pp. 73–79. Study,” Econometrica 55:6, pp. 1275–304.
Henrich, Joseph; Robert Boyd, Samuel Bowles, Colin Kagel, John H. and Dan Levin. 1986. “The Winner’s
Camerer, Ernst Fehr and Herbert Gintis, eds. 2004. Curse and Public Information in Common Value
Foundations of Human Sociality. NY: Oxford U. Press Auctions,” Amer. Econ. Rev. 76:5, pp. 894–920.
dec04_Article 1 12/14/04 2:25 PM Page 1054

1054 Journal of Economic Literature, Vol. XLII (December 2004)

———. 1999. “Common Value Auctions with Insider from the Vantage-Point of Behavioral Economics,”
Information,” Econometrica 67:5, pp. 1219–38. Econ. J. 109:453, pp. F25–F34.
———. 2002. Common Value Auctions and the Loomes, Graham; Peter G. Moffatt and Robert
Winner’s Curse. Princeton: Princeton U. Press. Sugden. 2002, “A Microeconometric Test of
Kagel, John H.; Don N. MacDonald and Raymond C. Alternative Stochastic Theories of Risky Choice,” J.
Battalio. 1990. “Tests of ‘Fanning Out’ of Indifference Risk Uncertainty 24:2, pp. 103–30.
Curves: Results from Animal and Human Loomes, Graham and Robert Sugden. 1995.
Experiments,” Amer. Econ. Rev. 80:4, pp. 912–21. “Incorporating a Stochastic Element Into Decision
Kramer, Michael and Stanley Shapiro. 1984. “Scientific Theories,” Europ. Econ. Rev. 39, pp. 641–48.
Challenges in the Application of Randomized Trials,” ———. 1998. “Testing Different Stochastic
J. Amer. Medical Assoc. 252:19, pp. 2739–45. Specifications of Risky Choice,” Economica 65, pp.
Krueger, Alan B. 1999. “Experimental Estimates of 581–98.
Production Functions, “Quart. J. Econ. 114:2, pp. Lucking-Reiley, David. 1999. “Using Field
497–532. Experiments to Test Equivalence Between Auction
Kunce, Mitch; Shelby Gerking and William Morgan. Formats: Magic on the Internet,” Amer. Econ. Rev.
2002. “Effects of Environmental and Land Use 89:5, pp. 1063–80.
Regulation in the Oil and Gas Industry Using the Machina, Mark J. 1989. “Dynamic Consistency and
Wyoming Checkerboard as an Experimental Non-Expected Utility Models of Choice Under
Design,” Amer. Econ. Rev. 92:5, pp. 1588–93. Uncertainty,” J. Econ. Lit. 27:4, pp. 1622–68.
Lalonde. Robert J. 1986. “Evaluating the Econometric McCabe, Kevin; Daniel Houser, Lee Ryan, Vernon
Evaluations of Training Programs with Experimental Smith and Theodore Trouard. 2001. “A Functional
Data,” Amer. Econ. Rev. 76:4, pp. 604–20. Imaging Study of Cooperation in Two-Person
Levine, Michael E. and Charles R. Plott. 1977. Reciprocal Exchange,” Proceed. Nat. Academy Sci.
“Agenda Influence and Its Implications,” Virginia 98:20, pp. 11832–35.
Law Rev. 63:May, pp. 561–604. McClennan, Edward F. 1990. Rationality and
Levitt, Steven D. 2003. “Testing Theories of Dynamic Choice NY: Cambridge U. Press.
Discrimination: Evidence from the Weakest Link,” McDaniel, Tanga M. and E. Elisabet Rutström, 2001.
NBER work. pap. 9449. “Decision Making Costs and Problem Solving
Lichtenstein, Sarah and Paul Slovic. 1973. “Response- Performance,” Exper. Econ. 4:2, pp. 145–61.
Induced Reversals of Gambling: An Extended Metrick, Andrew. 1995. “A Natural Experiment in
Replication in Las Vegas,” J. Exper. Psych. 101, pp. ‘Jeopardy!’” Amer. Econ. Rev. 85:1, pp. 240–53.
16–20. Meyer, Bruce D.; W. Kip Viscusi and David L. Durbin.
List, John A. 2001. “Do Explicit Warnings Eliminate 1995. “Workers’ Compensation and Injury Duration:
the Hypothetical Bias in Elicitation Procedures? Evidence from a Natural Experiment,” Amer. Econ.
Evidence from Field Auctions for Sportscards,” Rev. 85:3, pp. 322–40.
Amer. Econ. Rev. 91:5, pp. 1498–507. Milgrom, Paul R. and Robert J. Weber. 1982. “A
———. 2003. “Friend or Foe: A Natural Experiment Theory of Auctions and Competitive Bidding,”
of the Prisoner’s Dilemma,” unpub. manuscript, U. Econometrica 50:5, pp. 1089–122.
Maryland Dept. Ag. Res. Econ. Neisser, Ulric, and Ira E. Hyman, Jr. eds. 2000.
———. 2004a. “Young, Selfish and Male: Field Memory Observed: Remembering in Natural
Evidence of Social Preferences,” Econ. J. 114:492, Contexts. 2nd ed. NY: Worth Publishers.
pp. 121–49. Pearl, Judea. 1984. Heuristics: Intelligent Search
———. 2004b. “The Nature and Extent of Strategies for Computer Problem Solving. Reading,
Discrimination in the Marketplace: Evidence from MA: Addison-Wesley.
the Field,” Quart. J. Econ. 119:1, pp. 49–89. Philipson, Tomas and Larry V. Hedges. 1998. “Subject
———. 2004c. “Neoclassical Theory Versus Prospect Evaluation in Social Experiments,” Econometrica
Theory: Evidence from the Marketplace,” 66:2, pp. 381–408.
Econometrica 72:2, pp. 615–25. Plott, Charles R. and Michael E. Levine. 1978. “A
———. 2004d. “Field Experiments: An Introduction Model of Agenda Influence on Committee
and Survey,” work. pap., U. Maryland. Dept. Ag. Decisions,” Amer. Econ. Rev. 68:1, pp. 146–60.
Res. Econ. and Dept. Econ. Riach, P. A. and J. Rich. 2002. “Field Experiments of
———. 2004e. “Testing Neoclassical Competitive Discrimination in the Market Place,” Econ. J.
Theory in Multi-Lateral Decentralized Markets,” J. 112:483, pp. F480–F518.
Polit. Econ. 112:5, pp. 1131–56.. Rosenbaum, P. and Donald Rubin. 1983. “The Central
List, John A. and David Lucking-Reiley. 2000. Role of the Propensity Score in Observational
“Demand Reduction in a Multi-Unit Auction: Studies for Causal Effects,” Biometrika 70, pp.
Evidence from a Sportscard Experiment,” Amer. 41–55.
Econ. Rev. 90:4, pp. 961–72. ———. 1984. “Reducing Bias in Observational
———. 2002. “The Effects of Seed Money and Studies Using Multivariate Matched Sampling
Refunds on Charitable Giving: Experimental Methods that Incorporate the Propensity Score,” J.
Evidence from a University Capital Campaign,” J. Amer. Statist. Assoc. 79, pp. 39–68.
Polit. Econ. 110:1, pp. 215–33. Rosenthal, R. and L. Jacobson. 1968. Pygmalion in the
Loewenstein, George. 1999. “Experimental Economics Classroom. NY: Holt, Rhinehart & Winston.
dec04_Article 1 12/14/04 2:25 PM Page 1055

Harrison and List: Field Experiments 1055

Rosenzweig, Mark R. and Kenneth I. Wolpin. 2000. Smith, Kip; John Dickhaut, Kevin McCabe and José V.
“Natural ‘Natural Experiments’ in Economics,” J. Pardo. 2002. “Neuronal Substrates for Choice under
Econ. Lit. 38:4, pp. 827–74. Ambiguity, Risk, Gains, and Losses,” Manage. Sci.
Roth, Alvin E. 1991. “A Natural Experiment in the 48:6, pp. 711–18.
Organization of Entry-Level Labor Markets: Smith, V. Kerry and Laura Osborne. 1996. “Do
Regional Markets for New Physicians and Surgeons Contingent Valuation Estimates Pass a Scope Test? A
in the United Kingdom,” Amer. Econ. Rev. 81:3, pp. Meta Analysis,” J. Environ. Econ. Manage. 31, pp.
415–40. 287–301.
Roth, Alvin E. and Michael W. K. Malouf. 1979. Smith, Vernon L. 1962. “An Experimental Study of
“Game-Theoretic Models and the Role of Competitive Market Behavior,” J. Polit. Econ. 70, pp.
Information in Bargaining,” Psych. Rev. 86, pp. 111–37.
574–94. ———. 1982. “Microeconomic Systems as an
Roth, Alvin E.; Vesna Prasnikar, Masahiro Okuno- Experimental Science,” Amer. Econ. Rev. 72:5, pp.
Fujiwara and Shmuel Zamir. 1991. “Bargaining and 923–55.
Market Behavior in Jerusalem, Ljubljana, ———. 2003. “Constructivist and Ecological
Pittsburgh, and Tokyo: An Experimental Study,” Rationality in Economics,” Amer. Econ. Rev. 93:3,
Amer. Econ. Rev. 81:5, pp. 1068–95. pp. 465–508.
Rowe, R. D.; W. Schulze, W. D. Shaw, L. D. Chestnut Smith, Vernon L.; G. L. Suchanek and Arlington W.
and D. Schenk. 1991. “Contingent Valuation of Williams. 1988. “Bubbles, Crashes, and Endogenous
Natural Resource Damage Due to the Nestucca Oil Expectations in Experimental Spot Asset Markets,”
Spill,” report to British Columbia Ministry of Econometrica 56, pp. 1119–52.
Environment. Sorenson, Roy A. 1992. Thought Experiments. NY:
Rutström, E. Elisabet. 1998. “Home-Grown Values Oxford U. Press.
and the Design of Incentive Compatible Auctions,” Starmer, Chris. 1999. “Experiments in Economics:
Int. J. Game Theory 27:3, pp. 427–41. Should We Trust the Dismal Scientists in White
Rutström, E. Elisabet and Melonie B. Williams. 2000. Coats?” J. Exper. Method. 6, pp. 1–30.
“Entitlements and Fairness: An Experimental Study Sunstein, Cass R.; Reid Hastie, John W. Payne, David
of Distributive Preferences,” J. Econ. Behav. Org. A. Schkade and W. Kip Viscusi. 2002. Punitive
43, pp. 75–80. Damages: How Juries Decide. Chicago: U. Chicago
Sanfey, Alan G.; James K. Rilling, Jessica A. Aronson, Press.
Leigh E. Nystrom and Jonathan D. Cohen. 2003. Tenorio, Rafael and Timothy Cason. 2002. “To Spin or
“The Neural Basis of Economic Decision-Making in Not To Spin? Natural and Laboratory Experiments
the Ultimatum Game,” Science 300:5626, pp. from The Price is Right,” Econ. J. 112, pp. 170–95.
1755–58. Warner, John T. and Saul Pleeter. 2001. “The Personal
Slonim, Robert and Alvin E. Roth. 1998. “Learning in Discount Rate: Evidence from Military Downsizing
High Stakes Ultimatum Games: An Experiment in Programs,” Amer. Econ. Rev. 91:1, pp. 33–53.
the Slovak Republic,” Econometrica 66:3, pp. Wierzbicka, Anna. 1996. Semantics: Primes and
569–96. Universals. NY: Oxford U. Press.
Smith, Jeffrey and Petra Todd. 2000. “Does Matching Winkler, Robert L. and Allan H. Murphy. 1973.
Address LaLonde’s Critique of Nonexperimental “Experiments in the Laboratory and the Real
Estimates?” unpub. man., Dept. Econ. U. Western World,” Org. Behav. Human Perform. 10, pp.
Ontario. 252–70.

You might also like