Professional Documents
Culture Documents
Peter M. Todd, Gerd Gigerenzer-Ecological Rationality - Intelligence in The World-Oxford University Press (2012) PDF
Peter M. Todd, Gerd Gigerenzer-Ecological Rationality - Intelligence in The World-Oxford University Press (2012) PDF
Peter M. Todd
Gerd Gigerenzer
and the ABC Research Group
1
1
Oxford University Press, Inc., publishes works that further
Oxford University’s objective of excellence
in research, scholarship, and education.
With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam
987654321
Printed in USA
on acid-free paper
Dedicated to Herbert Simon and Reinhard Selten, who pioneered
the study of rationality in the real world.
This page intentionally left blank
Preface
References 498
Name Index 552
Subject Index 567
This page intentionally left blank
The ABC Research Group
“
M ore information is always better, full information is best. More
computation is always better, optimization is best.” More-is-better
ideals such as these have long shaped our vision of rationality. The
philosopher Rudolf Carnap (1947), for instance, proposed the
“principle of total evidence,” which is the recommendation to use
all the available evidence when estimating a probability. The statis-
tician I. J. Good (1967) argued, similarly, that it is irrational to make
observations without using them. Going back further in time, the
Old Testament says that God created humans in his image (Genesis
1:26), and it might not be entirely accidental that some form of
omniscience (including knowledge of all relevant probabilities
and utilities) and omnipotence (including the ability to compute
complex functions in a blink) has sneaked into models of human
cognition. Many theories in the cognitive sciences and economics
have recreated humans in this heavenly image—from Bayesian
models to exemplar models to the maximization of expected utility.
Yet as far as we can tell, humans and other animals have always
relied on simple strategies or heuristics to solve adaptive problems,
ignoring most information and eschewing much computation
rather than aiming for as much as possible of both. In this book,
we argue that in an uncertain world, more information and com-
putation is not always better. Most important, we ask why and
when less can be more. The answers to this question constitute the
idea of ecological rationality, how we are able to achieve intelli-
gence in the world by using simple heuristics in appropriate con-
texts. Ecological rationality stems in part from the nature of those
3
4 THE RESEARCH AGENDA
Making Money
In 1990, Harry Markowitz received the Nobel Prize in Economics
for his path-breaking work on optimal asset allocation. He addressed
a vital investment problem that everyone faces in some form or
other, be it saving for retirement or earning money on the stock
market: How to invest your money in N available assets. It would
be risky to put everything in one basket; therefore, it makes sense
to diversify. But how? Markowitz (1952) derived the optimal rule
for allocating wealth across assets, known as the mean–variance
portfolio, because it maximizes the return (mean) and minimizes
the risk (variance). When considering his own retirement invest-
ments, we could be forgiven for imagining that Markowitz used his
award-winning optimization technique. But he did not. He relied
instead on a simple heuristic:
make better decisions. Yet our point is not that simple heuristics
are better than optimization methods, nor the opposite, as is typi-
cally assumed. No heuristic or optimizing strategy is the best in all
worlds. Rather, we must always ask, in what environments does a
given heuristic perform better than a complex strategy, and when is
the opposite true? This is the question of the ecological rationality
of a heuristic. The answer requires analyzing the information-
processing mechanism of the heuristic, the information structures
of the environment, and the match between the two. For the choice
between 1/N and the mean–variance portfolio, the relevant envi-
ronmental features include (a) degree of uncertainty, (b) number
N of alternatives, and (c) size of the learning sample.
It is difficult to predict the future performance of funds because
uncertainty is high. The size of the learning sample is the estima-
tion window, with 5 to 10 years of data typically being used to cali-
brate portfolio models in investment practice. The 1/N rule tends to
outperform the mean–variance portfolio if uncertainty is high, the
number of alternatives is large, and the learning sample is small.
This qualitative insight allows us to ask a quantitative question: If
we have 50 alternatives, how large a learning sample do we need so
that the mean–variance portfolio eventually outperforms the simple
heuristic? The answer is: 500 years of stock data (DeMiguel et al.,
2009). Thus, if you started keeping track of your investments now,
in the 26th century optimization would finally pay off, assuming
that the same funds, and the stock market, are still around.
Catching Balls
Now let us think about sports, where players are also faced with
challenging, often emotionally charged problems. How do players
catch a fly ball? If you ask professional players, they may well stare
at you blankly and respond that they had never thought about it—
they just run to the ball and catch it. But how do players know
where to run? A standard account is that minds solve such complex
problems with complex algorithms. An obvious candidate complex
algorithm is that players unconsciously estimate the ball’s trajec-
tory and run as fast as possible to the spot where the ball will hit
the ground. How else could it work? In The Selfish Gene, biologist
Richard Dawkins (1989, p. 96) discusses exactly this:
When a man throws a ball high in the air and catches it again, he
behaves as if he had solved a set of differential equations in pre-
dicting the trajectory of the ball. He may neither know nor care
what a differential equation is, but this does not affect his skill
with the ball. At some subconscious level, something function-
ally equivalent to the mathematical calculation is going on.
6 THE RESEARCH AGENDA
Gaze heuristic: Fixate your gaze on the ball, start running, and
adjust your running speed so that the angle of gaze remains
constant.
The angle of gaze is the angle between the eye and the ball, rela-
tive to the ground. Players who use this rule do not need to measure
wind, air resistance, spin, or the other causal variables. They can
get away with ignoring all these pieces of causal information. All
the relevant facts are contained in only one variable: the angle of
gaze. Note that players using the gaze heuristic are not able to com-
pute the point at which the ball will land, just as demonstrated by
the experimental results. But the heuristic nevertheless leads them
to the landing point in time to make the catch.
Like the 1/N rule, the gaze heuristic is successful in a particular
class of situations, not in all cases, and the study of its ecological
rationality aims at identifying that class. As many ball players say,
the hardest ball to catch is the one that heads straight at you, a situ-
ation in which the gaze heuristic is of no use. As mentioned before,
the gaze heuristic works in situations where the ball is already high
WHAT IS ECOLOGICAL RATIONALITY? 7
in the air, but it fails if applied right when the ball is at the begin-
ning of its flight. However, in this different environmental condi-
tion, players do not need a completely new heuristic—just a slightly
modified one, with a different final step (McBeath, Shaffer, & Kaiser,
1995; Shaffer, Krauchunas, Eddy, & McBeath, 2004):
Modified gaze heuristic: Fixate your gaze on the ball, start run-
ning, and adjust your running speed so that the image of the
ball rises at a constant rate.
What Is a Heuristic?
(Continued )
Table 1-1: Twelve Well-Studied Heuristics With Evidence of Use in the Adaptive Toolbox of Humans
Heuristic Definition Ecologically rational if: Surprising findings (examples)
Gaze heuristic To catch a ball, fix your gaze on it, The ball is coming down Balls will be caught while
(Gigerenzer, 2007; start running, and adjust your from overhead running, possibly on a curved
McBeath, Shaffer, & running speed so that the angle path
Kaiser, 1995) of gaze remains constant.
1/N rule (DeMiguel, Allocate resources equally to each High unpredictability, Can outperform optimal asset
Garlappi, & Uppal, of N alternatives. small learning sample, allocation portfolios
2009) large N
Default heuristic If there is a default, follow it. Values of those who Explains why advertising has
(Johnson & Goldstein, set defaults match little effect on organ donor
2003; chapter 16) those of the decision registration; predicts behavior
maker; consequences when trait and preference
of a choice are hard to theories fail
foresee
Tit-for-tat (Axelrod, Cooperate first and then imitate The other players also Can lead to a higher payoff than
1984) your partner’s last behavior. play tit-for-tat “rational” strategies (e.g. by
backward induction)
Imitate the majority Determine the behavior followed Environment is stable or A driving force in bonding,
(Boyd & Richerson, by the majority of people in only changes slowly; group identification, and moral
2005) your group and imitate it. info search is costly or behavior
time consuming
Imitate the successful Determine the most successful Individual learning is A driving force in cultural
(Boyd & Richerson, person and imitate his or her slow; info search is evolution
2005) behavior. costly or time consuming
Note. For formal definitions and conditions concerning ecological rationality and surprising findings, see references indicated and related chapters
in this book.
WHAT IS ECOLOGICAL RATIONALITY? 11
Evolved Capacities
Building blocks of heuristics are generally based on evolved cap-
acities. For instance, in the gaze heuristic, to keep the gaze angle
constant an organism needs the capacity to track an object visually
against a noisy background—something that no modern robot or
computer vision system can do as well as organisms (e.g., humans)
that have evolved to follow targets. When we use the term evolved
capacity, we refer to a product of nature and nurture—a capacity
that is prepared by the genes of a species but usually needs experi-
ence to be fully expressed. For instance, 3-month-old babies spon-
taneously practice holding their gaze on moving targets, such as
mobiles hanging over their crib. Evolved capacities are one reason
why simple heuristics can perform so well: They enable solutions
to complex problems that are fundamentally different from the
mathematically inspired ideal of humans and animals somehow
optimizing their choices. Other capacities underlying heuristic
building blocks include recognition memory, which the recogni-
tion heuristic and fluency heuristics exploit, and counting and
recall, which take-the-best and similar heuristics can use to esti-
mate cue orders.
April 8, 1779
If you doubt, set down all the Reasons, pro and con, in
opposite Columns on a Sheet of Paper, and when you have
considered them two or three Days, perform an Operation
similar to that in some questions of Algebra; observe what
Reasons or Motives in each Column are equal in weight, one
to one, one to two, two to three, or the like, and when you
have struck out from both Sides all the Equalities, you will see
in which column remains the Balance.… This kind of Moral
Algebra I have often practiced in important and dubious
Concerns, and tho’ it cannot be mathematically exact, I have
WHAT IS ECOLOGICAL RATIONALITY? 13
to ignore much of the available information and use fast and frugal
heuristics. And yet this approach is often resisted: When a forecast-
ing model does not predict a criterion, such as the performance of
funds, as well as hoped, the gut reaction of many people, experts
and novices alike, is to do the opposite and call for more informa-
tion and more computation. The possibility that the solution may
lie in eliminating information and fancy computation is still
unimaginable for many and hard to digest even after it has been
demonstrated again and again (see chapter 3).
Sample Size In general, the smaller the sample size of available data
in the environment, the larger the advantage for simple heuristics.
One of the reasons is that complex statistical models have to esti-
mate their parameter values from past data, and if the sample size is
small, then the resulting error due to “variance” can exceed the error
due to “bias” in competing heuristics (see chapter 2). What consti-
tutes a small sample size depends on the degree of uncertainty, as
can be seen in the investment problem, where uncertainty is high:
In this case, a sample size of hundreds of years of stock data is
needed for the mean–variance portfolio to surpass the accuracy of
the 1/N rule.
There are many other important types of environment struc-
ture relevant for understanding ecological rationality. Two of
the major ones also considered in this book are redundancy and
variability.
(e.g., a disastrous hurricane) and other people (e.g., the public reac-
tion to a disaster). Each of the heuristics in Table 1-1 can be applied
to social objects (e.g., whom to hire, to trust, to marry) as well
as to physical objects (e.g., what goods to buy). As an example, the
recognition heuristic (see chapters 5 and 6) exploits environment
structures in which lack of recognition is valuable information and
aids inferences about, say, what microbrew to order and where to
invest, but also whom to talk to and whom to trust (“don’t ride with
a stranger”). Similarly, a satisficing heuristic can be used to select a
pair of jeans but also choose a mate (Todd & Miller, 1999), and the
1/N rule can help investors to diversify but also guide parents in
allocating their time and resources equally to their children.
Environment structures are also deliberately created by institu-
tions to influence behavior. Sometimes this is felicitous, as when
governments figure out how to get citizens to donate organs by
default, or design traffic laws for intersection right-of-way in a hier-
archical manner that matches people’s one-reason decision mecha-
nisms (chapter 16). In other cases, institutions create environments
that do not fit well with people’s cognitive processes and instead
cloud minds, accidentally or deliberately. For instance, informa-
tion about medical treatments is often represented in ways that
make benefits appear huge and harms inconsequential (chapter 17),
casinos set up gambling environments with cues that make gam-
blers believe the chance of winning is greater than it really is
(chapter 16), and store displays and shopping websites are crowded
with long lists of features of numerous products that can confuse cus-
tomers with information overload (Fasolo, McClelland, & Todd, 2007).
But there are ways to fix such problematic designs and make new
ones that people can readily find their way through, as we will see.
Finally, environment structure can emerge without design
through the social interactions of multiple decision makers. For
instance, people choosing a city to move to are often attracted by
large, vibrant metropolises, so that the “big get bigger,” which can
result in a J-shaped (or power-law) distribution of city populations
(a few teeming burgs, a number of medium-sized ones, and numer-
ous smaller towns). Such an emergent distribution, which is
seen in many domains ranging from book sales to website visits,
can in turn be exploited by heuristics for choice or estimation
(chapter 15). Similarly, drivers seeking a parking space using a par-
ticular heuristic create a pattern of available spots that serves as the
environment for future drivers to search through with their own
strategies, which may or may not fit that environment structure
(chapter 18). In these cases, individuals are, through the effects
of their own choices, shaping the environment in which they
and others must make further choices, creating the possibility of a
co-adapting loop between mind and world.
20 THE RESEARCH AGENDA
This view starts from the dictum that more is always better, as
described at the beginning of this chapter—more information and
computation would result in greater accuracy. But since in the real
world, so the argument goes, information is not free and computa-
tion takes time that could be spent on other things (Todd, 2001),
there is a point where the costs of further search exceed the
benefits. This assumed trade-off underlies optimization-under-
constraints theories of decision making, in which information
search in the external world (e.g., Stigler, 1961) or in memory (e.g.,
Anderson, 1990) is terminated when the expected costs exceed its
benefits. Similarly, the seminal analysis of the adaptive decision
maker (Payne et al., 1993) is built around the assumption that heu-
ristics achieve a beneficial trade-off between accuracy and effort,
where effort is a function of the amount of information and compu-
tation consumed. And indeed, as has been shown by Payne et al.’s
research and much since, heuristics can save effort.
The major discovery, however, is that saving effort does not nec-
essarily lead to a loss in accuracy. The trade-off is unnecessary.
Heuristics can be faster and more accurate than strategies that use
more information and more computation, including optimization
techniques. Our analysis of the ecological rationality of heuristics
goes beyond the incorrect universal assumption of effort–accuracy
trade-offs to ask empirically where less information and computa-
tion leads to more accurate judgments—that is, where less effortful
heuristics are more accurate than more costly methods.
These less-is-more effects have been popping up in a variety of
domains for years, but have been routinely ignored, as documented
in chapter 3. Now, though, a critical mass of instances is being
assembled, as shown throughout this book. For instance, in an age
in which companies maintain databases of their customers, com-
plete with historical purchase data, a key question becomes pre-
dicting which customers are likely to purchase again in a given
timeframe and which will be inactive. Wübben and Wangenheim
WHAT IS ECOLOGICAL RATIONALITY? 27
Methodology
of the gaze heuristic and its variants (Saxberg, 1987; Shaffer &
McBeath, 2005 Todd, 1981). Furthermore, the predicted process of
trajectory computation implies that players will calculate where a
ball will land, whereas the gaze heuristic makes no such predic-
tion. Comparing these process-level predictions can help explain
an apparent fallacy on the part of expert players—that they are
not able to say where a ball will come down (e.g., Saxberg, 1987).
When using the gaze heuristic, players would not have this ability,
because they would not need it to catch the ball. Such an analysis
of heuristics and their ecological rationality can thus help research-
ers to avoid misjudging adaptive behavior as fallacies (Gigerenzer,
2000).
There are a number of useful methodological considerations that
are prompted by the study of ecological rationality. First, research
should proceed by means of testing multiple models of heuristics
(or other strategies) comparatively, determining which perform best
in a particular environment and which best predict behavior
observed in that environment. This enables finding better models
than those that already exist, not just assessing only one model in
isolation and then proclaiming that it fits the data or not. Second,
given the evidence discussed earlier for individual differences in
the use of heuristics, the tests of predictive accuracy should be
done at the level of each individual’s behavior, not in terms of
sample averages that may represent few or none of the individuals.
Finally, because individuals may vary in their own use of heuristics
as they explore a new problem, experiments should leave individu-
als sufficient time to learn about the alternatives and cues, and
researchers should not confuse trial and error exploration at the
beginning of an experiment as evidence for weighting and adding
of all information.
Several studies of heuristics exemplify these methodological
criteria. For instance, Bergert and Nosofsky (2007) formulated a
stochastic version of take-the-best and tested it against an additive-
weighting model at the individual level. They concluded that the
“vast majority of subjects” (p. 107) adopted the take-the-best strat-
egy. Another study by Nosofsky and Bergert (2007) compared take-
the-best with both additive-weighting and exemplar models of
categorization and concluded that “most did not use an exemplar-
based strategy” but followed the response time predictions of take-
the-best. There are also examples where not following some of these
criteria has led to results that are difficult to interpret. For instance,
if a study on how people learn about and use cues does not provide
enough trials for subjects to explore and distinguish those cues,
then lack of learning cannot be used as evidence of inability to learn
or failure to use a particular heuristic (e.g., Gigerenzer, Hertwig, &
30 THE RESEARCH AGENDA
33
34 UNCERTAINTY IN THE WORLD
decision maker gets more accuracy along with less effort. Thus,
there is a second answer to the question we started with: People
also rely on simple heuristics in situations where there is no effort–
accuracy trade-off. These results call for a different, more general
account of why it is rational to use simple heuristics, one that
includes both situations in which the effort–accuracy trade-off
holds and those where it does not. The surprising situation of
no trade-offs leads to another question, which we address in this
chapter: How can heuristics that ignore part of the available infor-
mation make more accurate inferences about the world than strate-
gies that do not ignore information?
To find answers to this new question, we first identify a useful
metaphor for the adaptive relationship between mind and environ-
ment. We then provide an analytical framework to understand how
cognition without trade-offs can work.
60
50
40
30
20
0 100 200 300 400
Days Since January 1, 2000
(b) 350
Error in Predicting the Sample
Error in Fitting the Sample
300
250
200
Error
150
100
50
0
0 2 4 6 8 10 12
Degree of Polynomial
Figure 2-1: Model fits for temperature data. (a) Mean daily tem-
perature in London for the year 2000. Two polynomial models are
fitted to this data, one of degree 3 and one of degree 12. (b) Model
performance for London temperatures in 2000. For the same data,
mean error in fitting the observed samples decreases as a function
of polynomial degree. Mean error in predicting the whole popula-
tion of the entire year’s temperatures using the same polynomial
models is minimized by a degree-4 polynomial.
36
HOW HEURISTICS HANDLE UNCERTAINTY 37
Out-of-Sample Robustness
There can be negative consequences of using too many free param-
eters in our models. To see this, we can frame the task as one of
estimating model parameters using only a sample of the observa-
tions and then test how well such models predict the entire popula-
tion of instances. This allows us to get closer to estimating how
well different models can predict the future, based on the past, even
though here we are “predicting” past (but unseen) outcomes. If the
model performs well at this task, we can be more confident that
it captures systematic patterns in the data, rather than accidental
patterns. For example, if we observe the temperature on 50 ran-
domly selected days in the year 2000 and then fit a series of poly-
nomial models of varying degree to this sample, we can measure
how accurately each model goes on to predict the temperature on
every day of the year 2000, including those days we did not observe.
This is an indication of the generalization ability of a model. As a
function of the degree of the polynomial model, the mean error in
performing this prediction task is plotted in Figure 2-1b. The model
with the lowest mean error (with respect to many such samples of
size 50) is a degree-4 polynomial—more complexity is not better.
Contrast this generalization performance for predicting unseen
data with the objective of selecting the model with the lowest
error in fitting the observed sample, that is, producing the correct
temperature on days we have observed. For this task, Figure 2-1b
tells us that error decreases as a function of the degree of the poly-
nomial, which means that the best-predicting model would not
be found if we select models merely by checking how well they
fit the observations. Notice also that the best-predicting polynomi-
als in this example are close to a theoretically reasonable lower
bound of between degree 3 and degree 4. This lower bound on the
problem exists because we should expect temperatures at the end
of the year to continue smoothly over to the predictions for tem-
peratures at the beginning of the next year. A degree-2 polynomial
cannot readily accommodate this smooth transition from one year to
the next, but degree-3 or degree-4 polynomials can. This prediction
task considers the out-of-sample robustness of the models, which is
38 UNCERTAINTY IN THE WORLD
the degree to which they are accurate at predicting outcomes for the
entire population when estimated from the contents of samples of
that population. Here, the most predictive model is very close to the
lower bound of complexity, rather than at some intermediate or high
level. This example illustrates that simpler models can cope better
with the problem of generalizing from samples.
Out-of-Population Robustness
A more realistic test of the models estimated from a sample of mea-
surements is to consider how well they go on to predict events in
the future, such as in this example, the temperature on each day of,
for instance, the year 2001. What we are predicting now lies out-
side the population used to estimate the model parameters. The
two populations may differ because factors operating over longer
time scales come into play, such as climate change. The difference
between the two populations could range from negligible to severe.
For example, Figure 2-2 shows how well the models estimated
400
Paris (2002)
Paris (2001)
350 London (2002)
London (2001)
300 London (2000)
250
Error
200
150
100
50
0 2 4 6 8 10 12
Degree of Polynomial
70
65
60
Take-the-best
Nearest Neighbor
C4.5
55 CART
Greedy Take-the-best
50
0 10 20 30 40 50 60 70 80 90
Sample Size, n
80
75
70
65
Take-the-best
60 Nearest Neighbor
C4.5
CART
55 Greedy Take-the-best
50
0 5 10 15 20 25 30
Sample Size, n
44
HOW HEURISTICS HANDLE UNCERTAINTY 45
(c) Biodiversity
85
Mean Predictive Accuracy (% Correct)
80
75
70
65
Take-the-best
60 Nearest Neighbor
C4.5
CART
55 Greedy Take-the-best
50
0 5 10 15 20 25 30
Sample Size, n
70
65
60
Take-the-best
Nearest Neighbor
C4.5
55 CART
Greedy Take-the-best
50
0 10 20 30 40 50 60
Sample Size, n
robustness may actually stem from its search rule. The greedy
version of take-the-best, which has the same stopping rule but a dif-
ferent search rule, differs considerably in robustness from take-the-
best but is indistinguishable from the other complex models, both
when they are inferior to take-the-best (Figure 2-3) and when they
are superior (Figure 2-6). This implicates the search rule itself as a
46 UNCERTAINTY IN THE WORLD
us revisit the daily temperature example but change the rules of the
game. Nobody knows the “true” underlying function behind
London’s mean daily temperatures, but we will now put ourselves
in the position of grand planner with full knowledge of the under-
lying function for the mean daily temperatures in some fictional
location. We denote this degree-3 polynomial function h(x) and
define it as
15 120 2 130 3
h( x ) = 37 + x+ x + x , here 0 ≤ x ≤ 364.
365 365 365
Figure 2-4a plots this underlying trend for each day of the
year. We will also assume that when h(x) is sampled, our observa-
tions suffer from normally distributed measurement error with
μ = 0, σ2 = 4 (which corresponds to the noise component in the
bias–variance decomposition above). A random sample of 30 obser-
vations of h(x) with this added error is shown on top of the under-
lying trend in Figure 2-4a.
If we now go on to fit a degree-p polynomial to this sample of
observations, and measure its error in approximating the function
h(x), can we draw a conclusion about the ability of degree-p poly-
nomials to fit our “true” temperature function in general? Not really,
because the sample we drew may be unrepresentative: It could
result in a lucky application of our fitting procedure that identifies
the underlying polynomial h(x), or an unlucky one incurring high
error. Thus, this single sample may not reflect the true performance
of degree-p polynomials for the problem at hand. A more reliable
test of a model is to measure its accuracy for many different sam-
ples, by taking k random samples of size n, fitting a degree-p poly-
nomial model to each one, and then considering this ensemble of
models denoted by y1(x), y2(x), . . ., yk(x). Figure 2-4b shows five
polynomials of degree 2 resulting from k = 5 samples of n = 30
observations of h(x). From the perspective of the organism, these
samples can be likened to separate encounters with the environ-
ment, and the fitted polynomials likened to the responses of the
organism to these encounters.
The question now is how well a given type of model—here poly-
nomials of degree 2—captures the underlying function h(x), which
we can estimate by seeing how well the induced models perform
on average, given their individual encounters with data samples.
First, consider the function y ( x ), which for each x gives the mean
response of the ensemble of k polynomials:
1 k
y (x ) = ∑ y (x ).
k i =1
(a) 70
h(x)
60
Temperature (°F)
50
40
30
0 100 200 300 400
Day
(b) 70
h(x)
yi (x)
60
Temperature (°F)
50
40
30
0 100 200 300 400
Day
(c) 70
h(x)
_
y(x)
60
Temperature (°F)
50
40
30
0 100 200 300 400
Day
48
HOW HEURISTICS HANDLE UNCERTAINTY 49
The bias of the model is the sum squared difference between this
mean function and the true underlying function. Our omniscience
is important now, because to measure the bias we need to know the
underlying function h(x). More precisely, bias is given by
( )2 = ∑ { y ((x
xn) ( x n )}2
n
1 k
variance = ∑ ∑ {y (x n ) − y (x n )}2 .
n k i =1
more complex models, we say that these models are overfitting the
data, fitting not just the underlying function but also the noise
inherent in each particular data sample. The two properties of bias
and variance reveal that the inductive inference of models involves
a fundamental trade-off. We can try using a general purpose learn-
ing algorithm, such as a feed-forward neural network, that employs
a wide and rich space of potential models, which more or less guar-
antees low bias. But problems start when we have a limited number
of observations, because the richness of the model space can incur a
cost in high variance: The richer the model space, the greater the pos-
sibility that the learning algorithm will induce a model that captures
unsystematic variation. To combat high variance, we can place
restrictions on the model space and thereby limit the sensitivity of
the learning algorithm to the vagaries of samples. But these restric-
tions run counter to the objective of general purpose inference, since
they will necessarily cause an increase in bias for some problems.
This is the bias–variance dilemma. All cognitive systems face
this dilemma when confronted with an uncertain world. The bal-
ancing act required to achieve both low variance and low bias is
plain to see in Figure 2-5, which decomposes the error arising from
polynomials from degree 1 (a straight line) to degree 10 at predict-
ing our temperature function h(x) from samples of size 30. For each
polynomial degree we have plotted the bias (squared) of this type of
model, its variance, and their sum. The polynomial degree that
minimizes the total error is, not surprisingly, 3, because h(x) is a
degree-3 polynomial. Polynomial models of less than degree 3
suffer from bias, since they lack the ability to capture the underly-
ing pattern. Polynomials of degree 3 or more have zero bias, as we
would expect. But for polynomials of degree 4 or more, the problem
of overfitting kicks in and the variance begins to rise due to their
excess complexity. None of the models achieve zero error. This is
due to the observation error we added when sampling, which cor-
responds to the noise term in the bias–variance decomposition.
20000
(Bias)2 + Variance
(Bias)2
Variance
15000
Error
10000
5000
0
0 2 4 6 8 10
Degree of Polynomial
w i ≥ ∑ w j where 1 ≤ i (m − ).
j i
52 UNCERTAINTY IN THE WORLD
Note. The cue values of each object (A–H) are used to code a binary representation
of its integer-valued criterion. The cues are uncorrelated and have noncompensatory
weights.
HOW HEURISTICS HANDLE UNCERTAINTY 53
80
70
Take-the-best
Nearest Neighbor
C4.5
60 CART
Greedy Take-the-best
50
0 10 20 30 40 50 60 70
Sample Size, n
Take-the-best
(b)
3000
(Bias)2 + Variance
2500 (Bias)2
Variance
2000
Error
1500
1000
500
0
0 10 20 30 40 50 60 70
Sample Size, n
Greedy Take-the-best
(c)
3000
(Bias)2 + Variance
2500 (Bias)2
Variance
2000
Error
1500
1000
500
0
0 10 20 30 40 50 60 70
Sample Size, n
54
HOW HEURISTICS HANDLE UNCERTAINTY 55
Note. The cue values of each object (A–F) are used to code the criterion value using
the Guttman Scale. The cues are maximally correlated with the criterion, all with an
ecological validity of 1.
80
70
Take-the-best
Nearest Neighbor
C4.5
60 CART
Greedy Take-the-best
50
0 5 10 15 20 25 30
Sample Size, n
(b) Take-the-best
350
(Bias)2 + Variance
300
(Bias)2
Variance
250
200
Error
150
100
50
0
0 5 10 15 20 25 30 35
Sample Size, n
Greedy Take-the-best
(c)
350
(Bias)2 + Variance
300 (Bias)2
Variance
250
200
Error
150
100
50
0
0 5 10 15 20 25 30 35
Sample Size, n
56
HOW HEURISTICS HANDLE UNCERTAINTY 57
61
62 UNCERTAINTY IN THE WORLD
There are three main reasons for this: (a) Researchers believe that
complex systems or problems require complex solutions; (b) new
ideas and methods, which are often simpler, can be resisted just for
being new; and (c) it is sometimes difficult to know when simplic-
ity works. Figuring out when simple methods succeed or fail is
challenging and can itself be complex.
This chapter is organized as follows. I first point out that deci-
sion makers—and students of judgment and decision making—are
not unique in failing to adapt to conceptual innovations that imply
greater simplicity. Indeed, the history of science is replete with
many examples. I then discuss the four cases drawn from the
decision-making literature. These are, first, the findings that predic-
tions of “clinical” judgment are inferior to actuarial models; second,
how simple methods in times series forecasting have proven supe-
rior to more sophisticated and “theoretically correct” methods
advocated by statisticians; third, how in combining information for
prediction, equal weighting of variables is often more accurate than
trying to estimate differential weights; and fourth, the observation
that, on occasion, decisions can be improved when relevant infor-
mation is deliberately discarded. I follow this by examining the
rationale for the fourth case in greater depth.
In a fascinating review, Barber (1961) documented many cases
of failure to accept new concepts involving scientific giants operat-
ing in the physical sciences where, one might suppose, hard evi-
dence would be difficult to overcome. Among the various sources
of resistance to new ideas, Barber gives as examples difficulties
understanding substantive concepts, different methodological con-
ceptions, religious ideas, professional standing (e.g., failure to
accept discoveries by young scientists), professional specialization
(e.g., work by people outside a discipline), and the dysfunctional
role sometimes played by professional societies. He goes on to
quote Max Planck, who, frustrated by the fact that his own ideas
were not always accepted, stated that “a new scientific truth does
not triumph by convincing its opponents and making them see the
light, but rather because its opponents eventually die, and a new
generation grows up that is familiar with it” (Barber, 1961, p. 597).
In this chapter, I discuss this phenomenon with respect to the
field of judgment and decision making. There are two reasons why
this field provides an interesting setting for this issue. First, for sci-
entists concerned with how decisions are and should be made, one
might imagine that there would be little resistance to adopting
methods that improve decision making by increasing accuracy,
reducing effort, or both. Second, the studies in which these new
methods were discovered are empirical and often supported
by analytical rationales. A priori, it is not a question of dubious
evidence.
WHEN SIMPLE IS HARD TO ACCEPT 63
were fewer forecasts but these were conducted in real time (e.g.,
participants were asked to provide a forecast for next year).
Moreover, forecasters could obtain background and qualitative data
on the series they were asked to forecast (a criticism of the
M-competition was that experts lacked access to important contex-
tual information). Finally, in the M3-competition (Makridakis &
Hibon, 2000), forecasts were prepared for several models using
3,003 time series drawn from various areas of economic activity
and for different forecast horizons. All of these M-competitions
(along with similar studies by other scholars) essentially replicated
the earlier findings of Makridakis and Hibon, namely, that
One might imagine that, with this weight of evidence, the aca-
demic forecasting community would have taken notice and devel-
oped models that could explain the interaction between model
performance and task characteristics. However, there seems to be
little evidence of this occurring. For example, Fildes and Makridakis
(1995) used citation analysis in statistical journals to assess the
impact of empirical forecasting studies on theoretical work in
time-series analysis. Basically, their question was whether the con-
sistent out-of-sample performance of simple forecasting models
had led to theoretical work on illuminating this phenomenon. The
answer was a resounding “no”:
During their studies, most social scientists learn the statistical tech-
nique of multiple regression. Given observations on a dependent
variable yi (i = 1, . . ., n) and k independent or predictor variables xij
(j = 1, . . ., k), the budding scientists learn that the “best” predictive
equation for y expressed as a linear function of the xs is obtained by
the well-known least-squares algorithm. The use of this technique
(and more complex adaptations of it) is probably most common in
hypothesis testing. Is the overall relationship statistically signifi-
cant (i.e., is population R2 > 0?). What are the signs and relative
sizes of the different regression coefficients? Which are most impor-
tant? And so on.
In addition to fitting data, another important function of multiple
regression is to make predictions. Given a new so-called hold-out
sample of xs, what are the associated predicted y values? In using a
regression equation in this manner, most researchers appreciate
that the R2 achieved on initial fit of the model will not be matched
in the predictive sample due to “shrinkage” (the smaller the ratio
n/k, the greater the shrinkage). However, they do not question that
the regression weights initially calculated on the “fitting sample”
are the best that could have been obtained and thus that this is still
the optimal method of prediction. They should.
In 1974, Dawes and Corrigan reported the following interesting
experiment: Instead of using weights in a linear model that have
been determined by the least-squares algorithm, use weights that
are chosen at random (between 0 and 1) but have the appropriate
sign. The results of this experiment were most surprising to scien-
tists brought up in the tradition of least-squares modeling. The
predictions of the quasi-random linear models were quite good
and, in fact, on all four datasets Dawes and Corrigan analyzed, they
were better than the predictions made by human judges who had
been provided with the same data (i.e., values of the predictor
variables). This result, however, did not impress referees at the
Psychological Review who rejected the paper. It was deemed “pre-
mature.” In addition, the authors were told that, despite their
70 UNCERTAINTY IN THE WORLD
the ISI Web of Knowledge, the Dawes and Corrigan paper was cited
more than 600 times in the 20 years following its publication.
Moreover, a number of studies in the decision-making literature
have exploited the results. However, the implications of this work
have had surprisingly little impact on the methods of scientists
who make great use of regression analysis.
Economists, for example, are among the most sophisticated
users of regression analysis. I therefore sampled five standard text-
books in econometrics to assess whether young economists are
taught about ambiguity in regression weights and the use of bench-
marks of equal or unit-weighting models for prediction. The spe-
cific textbooks were by Goldberger (1991), Greene (1991), Griffiths,
Hill, and Judge (1993), Johnston (1991), and Mittelhammer, Judge,
and Miller (2000). The answer was an overwhelming “no.” The
major concern of the texts seems to lie in justifying parameter esti-
mates through appropriate optimization procedures. The topic of
prediction is given little attention, and when it is, emphasis is
placed on justifying the “optimal” regression coefficients in the
prediction equations that have been estimated on the data avail-
able. None of the books gives any attention to equal- or unit-weight-
ing models. In addition, in a handbook whose contributors were
leading econometricians, I located a chapter entitled “Evaluating
the predictive accuracy of models” (Fair, 1986), but even this chap-
ter showed no awareness of the equal-weight findings.
In psychology, on the other hand, the statistical theory underly-
ing the development of tests draws the attention of students to
the properties and use of equally weighted composite variables
(cf. Ghiselli, Campbell, & Zedeck, 1981). Indeed, the third edition
of Nunnally and Bernstein’s Psychometric Theory (1994) explicitly
devotes a section of a chapter (p. 154) to equal weighting—citing,
among others, Dawes and Corrigan (1974) and Wainer (1976). It is
notable that they emphasize the use of equal weights when ques-
tions center on prediction in applied problems.
How does one explain the relative lack of interest in equal
weights in economics when the case against naively accepting esti-
mates of regression coefficients has been made on both empirical
and analytical grounds? Perhaps the reason is that there is a huge
“industry” propagating the use of regression analysis involving
textbooks, computer software, and willing consumers who accept
analytical results with little critical spirit, somewhat similar in
manner to the use of significance tests in reports of psychological
experiments (cf. Gigerenzer, 1998b, 2004a). Just because ideas are
“good” does not mean that they will be presented in textbooks
and handed down to succeeding generations of scientists (see, for
example, the discussion by Dhami, Hertwig, & Hoffrage, 2004,
72 UNCERTAINTY IN THE WORLD
zj = μ + δj + εj (1)
but it does in the cumulative sense (across the cues); that is, 1 ≥ 1;
1 + 1 > 1 + 0; and 1 + 1 + 0 ≥ 1 + 0 + 1. Baucells et al. showed that
cumulative dominance is quite pervasive in choice situations
involving binary cues and that any decision rule that makes
choices in accordance with cumulative dominance will perform
well. Because weights in take-the-best (and DEBA) are ordered from
large to small, it follows that take-the-best and DEBA both comply
with cumulative dominance and this explains, in part, their effect-
iveness.3
In short, there is now ample theoretical analysis showing that
take-the-best will make effective choices in error-free environments
where the importance of cues is unknown. What happens in the
presence of error and when there is uncertainty about the true
importance of cues? And to what extent does the success of take-
the-best depend on the fact that it uses binary cues as inputs?
To study these questions, Karelaia and I developed theoretical
models that can be used to compare and contrast the performance
of different heuristics across different environments using cues
that are both binary and continuous in nature (Hogarth & Karelaia,
2005a, 2006a, 2007). We examined environments where a criterion
was generated by a linear function of several cues and asked to
what extent different simple models could be expected to choose
correctly (i.e., choose the highest criterion alternative) between
two or more alternatives. That is, by characterizing the statistical
properties of environments (see below), one should be able to pre-
dict when particular simple rules would and would not work
well.
The outcome of this work is that, given the statistical description
of an environment in terms of correlations between normally
distributed cues and the criterion, as well as correlations among the
cues themselves, precise theoretical predictions can be made as to
how well different heuristics will perform.4 What we find is that, in
general, when the characteristics of heuristics match those of the
environment, they tend to predict better. Indeed, our own summary
noting differences in performance between TTB (take-the-best), SV
80
RETHINKING COGNITIVE BIASES AS ENVIRONMENTAL CONSEQUENCES 81
Figure 4-1: The cognitive system infers that the dots in the left pic-
ture are curved inward (concave), away from the viewer, while the
dots in the right picture are curved outward (convex), toward the
viewer. If you turn the book upside down, the inward dots will
pop out and vice versa. The right picture is identical to the left but
rotated 180 degrees.
An Ecological Perspective
Skewness
Mean
Variance
1,000,000
Estimated Number of Deaths per Year
ALL ACCIDENTS
ALL DISEASE
100,000 MOTOR VEH. ACC.
ALL CANCER
HEART DISEASE
STROKE
10,000 HOMICIDE
STOMACH CANCER
PREGNANCY
DIABETES
FLOOD TB
1,000 TORNADO ASTHMA
BOTULISM
ELECTROCUTION
100
SMALLPOX VACCINATION
10
1
1 10 100 1,000 10,000 100,000 1,000,000
Actual Number of Deaths per Year
ALL CANCER
100,000 ALL ACCIDENTS
HOMICIDE
DIABETES
1,000
ASTHMA
TB
FIREARM ACC.
100 EXCESS COLD
10
1
1 10 100 1,000 10,000 100,000 1,000,000
Actual Number of Deaths per Year
(b) 1,000,000
Estimated Number of Deaths per Year
100,000
10,000
100
Actual frequency predicted
by estimated frequency
10
1
1 10 100 1,000 10,000 100,000 1,000,000
Actual Number of Deaths per Year
88
RETHINKING COGNITIVE BIASES AS ENVIRONMENTAL CONSEQUENCES 89
1. A close inspection of Figure 4-4b shows that the variance of the esti-
mated frequencies in the Hertwig et al. (2005) study is smaller than that of
the actual frequencies, unlike in the statistical model. This indicates that
regression accounts for most but not all of the primary bias. Stephen M.
Stigler (personal communication) suggested that the smaller variance of
subjective estimates could indicate that the participants were quite prop-
erly estimating the actual rates by a form of shrinkage estimation, which
has a firm Bayesian justification (Stigler, 1990).
90 UNCERTAINTY IN THE WORLD
Base-Rate Fallacy
Imagine there are two kinds of people in the world—say, engineers
and lawyers. When we encounter someone new, how can we decide
whether that person is an engineer or a lawyer? We can gather
and use some cues about the object that are associated with each
category, for example, style of dress, or we can use the mean of
the distribution of objects—here equivalent to the more common of
the two types, or the one with the higher base rate—or we can
92 UNCERTAINTY IN THE WORLD
Overconfidence
Confidence in one’s knowledge is typically studied using questions
of the following kind:
People choose what they believe to be the correct answer and then
rate their confidence that the answer is correct. The participants in
studies of such judgments are called “realistic” if the difference
between their mean confidence and their proportion of correct
answers is zero. The typical finding, however, is that mean confi-
dence tends to exceed the proportion of correct answers. For exam-
ple, if the mean of the confidence ratings assigned to the correctness
of all selected answers is 70%, but the mean proportion correct is
60%, the confidence judgments are higher than the proportion cor-
rect and the participants are said to be overconfident (the over/under-
confidence measure would in this case be 70% − 60% = 10 percentage
points). This systematic discrepancy between confidence judgments
and the proportion of correct answers has been termed the overcon-
fidence bias (e.g., Lichtenstein, Fischhoff, & Phillips, 1982).
Early explanations of this phenomenon were sought in deficient
cognitive processing, such as a confirmation bias in memory search
(Koriat, Lichtenstein, & Fischhoff, 1980). That is, after an alterna-
tive is chosen, the mind searches for information that confirms the
choice, but not for information that could falsify it. Despite the
94 UNCERTAINTY IN THE WORLD
90
80
Accuracy
70
60
50
50 60 70 80 90 100
Confidence
(b) 100
90 Overconfidence
Underconfidence
80
70
60
Accuracy
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
Confidence
98
RETHINKING COGNITIVE BIASES AS ENVIRONMENTAL CONSEQUENCES 99
Contingency Illusions
Contingencies quantify the degree to which an outcome is more
likely, given one condition rather than another. One frequently used
definition is the Δ rule, which states that the relative impact of a
cause (e.g., therapy) on an effect (e.g., healing) can be described by
the contingency p(healing | therapy)–p(healing | no therapy), that
is, the difference between the likelihoods of healing given therapy
and healing given no therapy. More generally, in hypothesis
tests, the degree of evidence in favor of a focal hypothesis can be
described by the contingency Δ = p(confirmation | focal hypothesis)–
p(confirmation | alternative hypothesis) (Fiedler, Walther, & Nickel,
1999). A contingency assessment may be distorted or misleading
when the samples used to estimate the two probabilities differ in
size and reliability. Thus, the confirmation rate for two hypotheses,
H1 and H2, may be equally high, but one researcher is mainly con-
cerned with H1 and is therefore exposed to larger samples of infor-
mation on H1, whereas another researcher is concerned with H2 and
is therefore exposed to denser information about H2. As a conse-
quence, the two researchers could end up with different estimates
of the overall contingency. Sample size is a crucial environmental
determinant of the variability of sampling distributions, which can
impact subsequent probability judgments.
The impact on contingency assessment of the number of obser-
vations or sample size was investigated by Fiedler et al. (1999). In
an active information search paradigm, participants were asked to
test the hypothesis that male aggression tends to be overt, whereas
female aggression tends to be covert. Participants could check, in a
computer database, whether a variety of behaviors representing
overt or covert aggression had been observed in a male (Peter) and
female (Heike) target person. The computer was programmed to
confirm all questions about overt and covert aggression in Peter
and Heike at the same constant rate of 75%. However, participants
typically asked more questions that matched the hypotheses to be
tested (male overt aggression/female covert aggression) than the
102 UNCERTAINTY IN THE WORLD
(a)
30
Number of Drivers
20 20
10 10
5 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Median
Mean Number of Accidents
(b)
Number of Drivers
20
15
10 10
8
7
6
5
4
3
2
1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Median Mean
Number of Accidents
Figure 4-6: Can it be that most drivers are better than average? When
the distribution of number of accidents is symmetrical as shown in
(a), where drivers below average (below the mean) are indicated by
gray bars, this cannot happen. However, when the distribution is
skewed as in (b), it is possible. As a result, most drivers (63 out of
100) have fewer accidents than the mean (Gigerenzer, 2002).
Ecological Cognition
(Simon, 1990; see also chapter 1). One condition that should govern
whether this strategy will be used is whether the environment is
appropriately structured (meaning, as we will define later, that there
is a high recognition validity). When the environment is not appro-
priate for using the recognition heuristic, decision makers may
ignore recognition, oppose recognition, or factor in sources of infor-
mation beyond recognition, as we will see later in this chapter.
The exploitable relation between subjective recognition and
some other (not directly accessible) criterion results from a process
by which the criterion influences object recognition through
mediators, such as mentions in newspapers, on the Internet, on
radio or television, by word of mouth, and so on. This process
applies primarily to the proper names of objects, and consequently
most studies of the recognition heuristic have involved name rec-
ognition; however, it could also apply to visual or aural images of
individual objects, locations, or people. To illustrate, the size of a
city (the criterion) is typically correlated with recognition of the
city because large cities are mentioned more often in the media.
Frequent mentions increase the likelihood that a city name will be
recognized, and as a result, recognition becomes correlated with
the size of a city. In line with these assumed connections, Goldstein
and Gigerenzer (2002) found a high correlation between the number
of inhabitants of particular German cities and how often each city
was mentioned in the American media. This, in turn, was highly
correlated with the probability that the city would be recognized
by Americans. This two-step chain can thus explain how and why
American recognition rates of German cities were highly correlated
with city size. Pachur and Hertwig (2006) and Pachur and Biele
(2007), looking at domains of diseases and sports teams, provided
further support for the assumption that the correlation between a
criterion and recognition is mediated through the quantity of men-
tions in the media.
Our goal in this chapter is to give an overview of empirical
research on the recognition heuristic since Goldstein and Gigerenzer
(1999, 2002) first specified it (see also Gigerenzer & Goldstein, 2011;
Pachur, Todd, Gigerenzer, Schooler, & Goldstein, 2011). We start by
describing and clarifying the basic characteristics and assumptions
of the heuristic. For this purpose, we trace how the notion
of the heuristic developed, and we locate recognition knowledge
in relation to other knowledge about previous encounters with
an object, such as the context of previous encounters, their
frequency, and their ease of retrieval from memory—that is, their
fluency. Next, we provide an overview of empirical evidence sup-
porting answers to two important questions: In what environments
is the recognition heuristic ecologically rational? And do people
rely on the recognition heuristic in these environments? We then
116 CORRELATIONS BETWEEN RECOGNITION AND THE WORLD
R
α= ,
R +W
previously studied plus unstudied items and pick out the ones that
were on the original list. In other words, in these experiments
typically none of the items are actually novel, because they are
commonly used words. Therefore, the “mere” (or semantic) recog-
nition that the recognition heuristic employs is insufficient to
identify the correct items in this task, and knowledge about the
context (i.e., episodic knowledge) in which the previously studied
items were originally presented is required. The recognition heu-
ristic does not require such episodic knowledge, because semantic
recognition alone differentiates novel from previously encountered
objects. Moreover, recognition in Goldstein and Gigerenzer’s sense
is not independent of a reference class. A German participant may
know that she has heard of Paris, France but not Paris, Tennessee
(population ca. 10,000), and not treat Paris as recognized on a test
of U.S. cities. In addition to recognition being sensitive to a per-
son’s conception of the reference class, recognition validity
and even the decision to apply the recognition heuristic hinge on it,
as well.
A second important distinction is between (semantic) recogni-
tion and frequency information, that is, knowledge about the
number of times an object has been encountered in the past (e.g.,
Hintzman & Curran, 1994). The recognition heuristic does not dis-
tinguish between objects one has encountered 10 times and those
encountered 60 times (as long as both are recognized or unrecog-
nized). This is one element that makes the recognition heuristic
different from the availability heuristic (Tversky & Kahneman,
1973), which makes use of ease of retrieval, quality of recalled
items, or frequency judgments (for a discussion of the different
notions of availability see Hertwig, Pachur, & Kurzenhäuser, 2005).
To make an inference, one version of the availability heuristic
retrieves instances of the target events, such as the number of people
one knows who have cancer compared to the number of people
who have suffered from a stroke. The recognition heuristic, by con-
trast, bases an inference simply on the ability (or lack thereof) to
recognize the names of the event categories (cf. Pachur & Hertwig,
2006). In addition, the recognition heuristic is formally specified as
an algorithm and so can make precise predictions (such as the less-
is-more effect), while the availability heuristic in its original form
was too loosely defined for such predictions (for formal approaches
to different forms of the availability heuristic, see Dougherty, Gettys,
& Ogden, 1999; Hertwig et al., 2005; Pachur, Hertwig, & Rieskamp,
in press; Sedlmeier, Hertwig, & Gigerenzer, 1998).
A recognition assessment, which feeds into the recognition heu-
ristic, unfolds over time. The speed with which this recognition
assessment is made—fluency—can itself be informative and can be
used to infer other facts, for instance, how frequently an object has
122 CORRELATIONS BETWEEN RECOGNITION AND THE WORLD
2. An ironic exception to this statement is the fact that in the pair San
Diego and San Antonio, a commonly used example (Goldstein & Gigerenzer,
1999), San Diego now has fewer inhabitants than San Antonio within their
respective city limits, though by metropolitan area, San Diego remains
much larger.
124 CORRELATIONS BETWEEN RECOGNITION AND THE WORLD
High
Mentions in the Media
Low
Rare/severe Frequent/mild
Disease Frequency
other tests compared the fictional cities to places known for specific
reasons, such as Nantucket (limerick), Chernobyl (nuclear disaster)
or Timbuktu (expression). Since a reference class was not provided,
and because it is hard to think of a natural reference class from which
places like these would constitute a representative sample, partici-
pants may correctly infer that they are in an artificial environment. In
a clearly manipulated environment, such as that of trick questions,
recognition validity may be unknown, unknowable, or inestimable.
Unable to assess the ecological validity of the recognition heuristic,
people may elect alternative response strategies.
1.0
A B C
.9 E F G
Proportion of Choices in Line
D H
With Recognition Heuristic
.8
I
.7 A Hertwig et al. (2007, music artists)
B Serwe&F rings (2006, amateurs)
C Snook&Cullen (2006)
D Serwe&F rings (2006, laypeople)
.6 J E Pachur& Biele (2007)
F Hertwig et al. (2007, companies)
G Goldstein & Gigerenzer (2002)
K H Pohl (2006, Exp.1)
.5 I Pohl (2006, Exp.2)
J Pachur& Hertwig (2006, Study 1)
K Pohl (2006, Exp.1)
.4
.4 .5 .6 .7 .8 .9 1.0
Recognition Validity
validity vs. 54% for low). These results suggest that the overall rec-
ognition validity in a particular domain is an important factor for
whether the heuristic is applied or not.3 However, both Pohl
(Experiments 1 and 4, but see Experiment 2) and Pachur and
Hertwig (2006) found that, looking across participants in the same
domain, participants did not seem to match their recognition heu-
ristic use directly to their individual recognition validity for that
domain (specifically, the individual proportions of choices in line
with the heuristic were not correlated with the individual α). This
interesting result suggests that people know about validity differ-
ences between environments, but not about the exact validity of
their own recognition knowledge in particular environments.
Supporting this conclusion, Pachur et al. (2008) found that although
the mean of participants’ estimates of the validity of their own rec-
ognition knowledge (to predict the size of British cities) matched
the mean of their actual recognition validities perfectly (.71 for
both), the individual estimates and recognition validities were
uncorrelated (r = −.03).
3. Some results, however, suggest that people only decide not to follow
recognition in domains with low recognition validity when they have
alternative knowledge available that has a higher validity than recognition
(Hertwig et al., 2008; Pachur & Biele, 2007).
WHEN IS THE RECOGNITION HEURISTIC AN ADAPTIVE TOOL? 133
use seems to involve (at least) two distinct processes. The first is an
assessment of whether recognition is a useful indicator in the given
judgment task, and the second is judging whether an object is rec-
ognized or not. A brain imaging study by Volz and colleagues (2006)
obtained evidence for the neural basis of these two processes. When
a decision could be made based on recognition, there was activa-
tion in the medial parietal cortex, attributed to contributions of rec-
ognition memory. In addition, there were independent changes in
activation in the anterior frontomedial cortex (aFMC), a brain area
involved in evaluating internal states, including self-referential
processes and social-cognitive judgments (e.g., relating an aspect of
the external world to oneself). The processes underlying this latter
activation may be associated with evaluating whether recognition is
a useful cue in the current judgment situation. Moreover, the aFMC
activity deviated more from the baseline (i.e., reflected more cogni-
tive effort) when a decision was made against recognition, suggest-
ing that making a decision in line with recognition is the default.
50
25
0
28 Individual Participants, With No Additional Cues
(b) 100
Percentage of Inferences in Accordance
With Recognition Heuristic
75
50
25
0
28 Individual Participants, With One Contradicting Cue
136
WHEN IS THE RECOGNITION HEURISTIC AN ADAPTIVE TOOL? 137
Estimation
The decisions considered so far involved simple categorical judg-
ments about the environment, such as, Which is larger: A or B? Is
the statement X true or false? But often we have to make an absolute
estimate regarding some aspect of an object and come up with a
numerical value (e.g., the number of inhabitants of a city). Is infor-
mation about whether one has heard of an object also used for esti-
mation? This possibility has been discussed by Brown (2002), who
observed in studies on estimation of dates of events and country
142 CORRELATIONS BETWEEN RECOGNITION AND THE WORLD
Conclusion
Portions of this chapter are adapted from Schooler & Hertwig (2005)
and Hertwig, Herzog, Schooler, & Reimer (2008), with permission from the
American Psychological Association.
144
HOW SMART FORGETTING HELPS HEURISTIC INFERENCE 145
How and why forgetting might be functional has also been the focus
of an extensive analysis conducted by Anderson and colleagues
(Anderson & Milson, 1989; Anderson & Schooler, 1991, 2000; Schooler
& Anderson, 1997). On the basis of their rational analysis of memory,
they argued that much of memory performance, including forgetting,
might be understood in terms of adaptation to the structure of the envi-
ronment. The rational analysis of memory assumes that the memory
system acts on the expectation that environmental stimuli tend to
reoccur in predictable ways. For instance, the more recently a stimu-
lus has been encountered, the higher the expectation that it will be
encountered again and information about that stimulus will be needed.
Conversely, the longer it has been since the stimulus was encountered,
the less likely it is to be needed soon, and so it can be forgotten.
A simple time-saving feature found in many word processors can
help illustrate how recency can be used to predict the need for infor-
mation. When a user prepares to open a document file, some pro-
grams present a “file buffer,” a list of recently opened files from which
the user can select. Whenever the desired file is included on the list,
the user is spared the effort of either remembering in which folder
the file is located or searching through folder after folder. For this
mechanism to work efficiently, however, the word processor must
provide users with the files they actually want. It does so by “forget-
ting” files that are considered unlikely to be needed on the basis of
the assumption that the time since a file was last opened is negatively
correlated with its likelihood of being needed now. The word proces-
sor uses the heuristic that the more recently a file has been opened,
the more likely it is to be needed again now. In the rest of this chap-
ter, we show how human memory bets on the same environmental
regularity, and how this bet can enable simple heuristics, including
the recognition and fluency heuristics, to operate effectively.
.80
.75
.70
Recognition Rate
.65
.60
.55
.50
0 2 4 6 8 10 12 14 16
Years Since Cancellation
.18
.16
.14
Probability of Occurrence
.12
.10
.08
.06
.04
.02
0
0 20 40 60 80 100
Days Since Word Last Occurred
2 Massed Practice
0
0 1 2 3 4 5 6 7 8
Retention Interval In Days
.030
.020 Massed
Occurrences
.005
0
0 10 20 30 40 50 60 70 80 90
Days Since Word Last Occurred
Figure 6-4: Probability of a word being used in the New York Times
headlines as a function of number of days since it was last used,
given that the word was used just twice in the previous 100 days.
The steeper curve shows words whose two uses in the headlines
were massed near in time to each other, and the shallower curve
shows words whose occurrences were distributed farther apart
(data from Anderson & Schooler, 1991).
150
HOW SMART FORGETTING HELPS HEURISTIC INFERENCE 151
but with the interval from the last mention to February 5 now being
66 days. One way to characterize the results in Figure 6-4 is that
when words are encountered in a massed way there is an immedi-
ate burst in the likelihood of encountering them again, but that this
likelihood drops precipitously. In contrast, words encountered in a
more distributed fashion do not show this burst, but their likeli-
hood of being encountered in the future remains relatively con-
stant. The difference is akin to that between the patterns with which
one needs a PIN (personal identification number) for the safe in a
hotel room and the PIN for one’s bank account. While on vacation,
one will frequently need the safe’s PIN, but over an extended period
one is more likely to need the PIN for the bank account. The idea is
that the memory system figures the relative values of the codes over
the short and long run, based on the pattern with which they are
retrieved. So one can think about cramming for an exam as an
attempt to signal to the memory system that the exam material will
likely be highly relevant in the short term, but not so useful further
in the future.
These isomorphisms between regularities in memory and in the
statistical structure of environmental events exemplify the thesis
that human memory uses the recency, frequency, and spacing with
which information has been needed in the past to estimate how
likely that information is to be needed now. Because processing
unnecessary information is cognitively costly, a memory system
able to prune away little-needed information by forgetting it is
better off. In what follows, we extend the analysis of the effects of
forgetting on memory performance to its effects on the performance
of simple inference heuristics. To this end, we draw on the research
program on fast and frugal heuristics (Gigerenzer, Todd, & the ABC
Research Group, 1999) and the ACT-R research program (Adaptive
Control of Thought–Rational—see Anderson & Lebiere, 1998). The
two programs share a strong ecological emphasis.
The research program on fast and frugal heuristics examines
simple strategies that exploit informational structures in the envi-
ronment, enabling the mind to make surprisingly accurate deci-
sions without much information or computation. The ACT-R
research program also strives to develop a coherent theory of cog-
nition, specified to such a degree that phenomena from perceptual
search to the learning of algebra might be modeled within the
same framework. In particular, ACT-R offers a plausible model of
memory that is tuned, according to the prescriptions of the rational
analysis of memory, to the statistical structure of environmental
events. This model of memory will be central to our implementa-
tion of the recognition heuristic (Goldstein & Gigerenzer, 2002)
and the fluency heuristic (Hertwig, Herzog, Schooler, & Reimer,
2008), both of which depend on phenomenological assessments of
152 CORRELATIONS BETWEEN RECOGNITION AND THE WORLD
When two objects to be decided between are both recognized, the flu-
ency heuristic (see, e.g., Jacoby & Brooks, 1984; Toth & Daniels, 2002;
Whittlesea, 1993) can be applied. It can be expressed as follows:
⎛ n d⎞
Bi l ∑t ,
⎝ k =1 k ⎟⎠
where the record has been encountered n times in the past at lags of
t1, t2,. . .,tn. Finally, d is a decay parameter that captures the amount
of forgetting in declarative memory and thus determines how much
HOW SMART FORGETTING HELPS HEURISTIC INFERENCE 155
.85
.80
.75
.70
.65
.60
.55
.50
−1 −0.8 −0.6 −0.4 −0.2 0
(more forgetting) (less forgetting)
Memory Decay Rate
.60
.50
.40
.30
.20
.10
0
−1 −0.8 −0.6 −0.4 −0.2 0
(more forgetting) (less forgetting)
Memory Decay Rate
.90
.85
.80
.75
.70
.65
.60
.55
.50
−1 −0.8 −0.6 −0.4 −0.2 0
(more forgetting) (less forgetting)
Memory Decay Rate
Figure 6-7: The validity of the recognition heuristic and of the flu-
ency heuristic (the proportion of correct inferences that each heu-
ristic makes when it can be applied) as a function of memory decay
rate, d. Maxima are marked with dots. (Adapted from Schooler &
Hertwig, 2005.)
158
HOW SMART FORGETTING HELPS HEURISTIC INFERENCE 159
400
350
300
Retrieval Time (ms)
250 100ms
200
150
100 100ms
50
small large
0
1.0 1.5 2.0 2.5 3.0 3.5 4.0
Activation
.75
Cities
Athletes
.70
Proportion of Correct Inferences
Companies
Music artists
Billionaires
.65
.60
.55
.50
.45
0–99 100–399 400–699 ≥700
Differences in Recognition Latencies (ms)
.90
Cities
.85 Music artists
Accordance to Fluency Heuristic
Companies
.80
.75
.70
.65
.60
.55
.50
.45
0–99 100–399 400–699 ≥700
Differences in Recognition Latencies (ms)
This analysis also revealed two distinct reasons for why forget-
ting and heuristics can work in tandem. In the case of the recogni-
tion heuristic, intermediate amounts of forgetting maintain the
systematic partial ignorance on which the heuristic relies, increas-
ing the probability that it correctly picks the higher criterion
object. In the case of the fluency heuristic, intermediate amounts of
forgetting boost the heuristic’s performance by maintaining activa-
tion levels corresponding to retrieval latencies that can be more
easily discriminated. In what follows, we discuss how the
fluency heuristic relates to the availability heuristic and whether it
is worthwhile to maintain the distinction between the fluency and
recognition heuristics, and we conclude by examining whether for-
getting plausibly could have evolved to serve heuristic inference.
higher mortality rate). We have no objection to the idea that the flu-
ency heuristic falls under the broad rubric of availability. In fact,
we believe that our implementation of the fluency heuristic offers a
definition of availability that interprets the heuristic as an ecologi-
cally rational strategy by rooting fluency in the informational struc-
ture of the environment. This precise formulation transcends the
criticism that availability has been only vaguely sketched (e.g.,
Fiedler, 1983; Gigerenzer & Goldstein, 1996; Lopes & Oden, 1991).
In the end, how one labels the heuristic that we have called fluency
is immaterial because, as Hintzman (1990) observed, “the explana-
tory burden is carried by the nature of the proposed mechanisms
and their interactions, not by what they are called” (p. 121).
Conclusion
167
168 CORRELATIONS BETWEEN RECOGNITION AND THE WORLD
100
95 β = 1.0
Individual Accuracy (% Correct)
90 β = .9
85
β = .8
80
75 β = .7
70
β = .6
65
60 β = .5
55
50
0 10 20 30 40 50 60 70 80 90 100
Number of Objects Recognized (n)
Note, however, that just as for the majority rule, this rule does
not necessarily describe the entire process leading to a group deci-
sion. For example, a minority may speak up before the majority
finally overwhelms them, and similarly, people who cannot use the
recognition heuristic may do the same before name recognition
ultimately gets its way.
We now propose two rules that assume that members who can
use their knowledge are more influential in the combination of
inferences than members who can use the recognition heuristic.
100
95 Recognition-first Rule
Recognition-based Majority Rule
Group Accuracy (% Correct)
90 Knowledge-first Rule
Simple Majority Rule
85 Knowledge-based Rule
80
75
70
65
60
55
50
0 10 20 30 40 50 60 70 80 90 100
Number of Objects Recognized (n)
have α = .8 and β = .6. Imagine that there are 101 triplets of girls,
each triplet with its own n, and each girl in the triplet
recognizing n cities. Note that for n = 0 the predictions of all rules
coincide because no city is recognized by any sister in the triplet
and the group guesses on all pairs. For n = 100, the predictions
of all but the recognition-based majority rule coincide because all
cities are recognized by all sisters and the group ends up choosing
the knowledge-based majority. The recognition-based majority
rule falls behind in accuracy in this situation because it guesses on
all pairs.
The first thing we note in Figure 7-2 is that a strong less-is-more
effect is predicted for all rules save the knowledge-based majority
rule. Furthermore, the effect is more pronounced than in the indi-
vidual case (e.g., the β = .6 line in Figure 7-1) in the sense that
there is more accuracy gained at the peak of the curve compared to
the point of full recognition at n = 100. While the middle sister
individually was more accurate than the eldest sister by 8 percent-
age points, if triplets use the simple majority rule, the middle trip-
let is more accurate than the eldest triplet by 10 percentage points.
The difference increases to 14 percentage points for the recogni-
tion-first rule. Partially ignorant groups thus have it even better
than partially ignorant individuals!
This finding is an illustration of a statistical theorem, the
so-called Condorcet jury theorem from the inventor of “social math-
ematics,” Marquis de Condorcet (1785; Groffman & Owen, 1986).
This theorem states that the accuracy of a majority increases with
the number of voters when voters are accurate more than half of
the time. Condorcet presented this statement amidst the French
revolution but it was not formally proven until the second half of
the 20th century. Both Condorcet and modern writers have since
seen the jury theorem, and its extensions, as a formal justification
of using the majority rule.
In fact there are more benefits of belonging to a group. Whenever
less-is-more effects occur for groups, they are at least as prevalent
as for individuals: Recall that when α = .8 and β = .6, p = 33% for
individuals. We found the same prevalence for the majority rule.
This is not a coincidence but can be deduced: Under the majority
rule, group accuracy increases with the number of individuals who
are correct, which in turn is more likely to increase with individual
accuracy. Thus the shape of the group and individual curves is the
same and this guarantees equal prevalence. For other group rules
producing a less-is-more effect, this effect can be more prevalent
than for the individuals in the group, so that the group in those
cases essentially amplifies the benefits of ignorance.
The prevalence of the less-is-more effect increases when mem-
bers who use the more accurate (α = .8) recognition heuristic are
HOW GROUPS USE PARTIAL IGNORANCE TO MAKE GOOD DECISIONS 177
100
Prediction Accuracy (% Correct)
75
50
25
0
28 Groups
Recognition-based Majority Rule Knowledge-based Majority Rule
100
90
Group Accuracy (% Correct)
80
70
60
50
7 8 9 10 11 12 13 14 15
Average Number of Cities Recognized by Group Members
Jörg Rieskamp
Anja Dieckmann
187
188 REDUNDANCY AND VARIABILITY IN THE WORLD
Characteristics of Environments
positive or a zero cue value cm (i.e., 1 or 0). Each cue has a specific
validity. The validity vm is defined as the conditional probability
of making a correct inference based on cue m alone given that
cue m discriminates, that is, that one object has a positive cue
value (cm = 1) and the other a value of zero (cm = 0).
We are interested here in two particular characteristics of the
environment: information redundancy and the dispersion of the
validity of information. The overall redundancy in information
conveyed by the different cues in the environment can be measured
as the mean correlation between all pairs of cues assessed across
all the objects. We compute the correlation between two cues on
the basis of the object pair comparisons each cue makes; that is, we
first calculate the cue difference vector for each pair of objects
and then correlate the differences for one cue across all object
pairs with the differences for the other cue across all object pairs.
The mean correlation over all pairs of cues can vary from a high
value of 1, where all cues are the same and so are completely redun-
dant (and hence where only one cue ever needs to be considered
for an inference), to a low value of 0, where each cue provides
independent information. Environments with a positive mean cue
correlation near 1 can be called “friendly” with respect to the deci-
sion maker (Shanteau & Thomas, 2000), because the cues tend to
point toward the same decision, while environments with indepen-
dent cues and correlations nearer 0 have been called “unfriendly,”
because their cues often provide contradictory information.
The dispersion of the validity of information in an environment
can be characterized in terms of the range of the cues’ validities—
that is, how much the validities of the cues differ. For instance,
if cue validities differ widely from .55 to .95, this is a high-
dispersion environment, whereas if all cues have similar validities
between .80 and .85, this is a low-dispersion environment. The dis-
persion of cues’ validities and the cues’ redundancy in a particular
environment can both influence a strategy’s performance in that
environment. For instance, in a situation with low information
redundancy and low validity dispersion, after seeing the most
valid cue it is worthwhile to consider another cue that offers non-
redundant information and still has a validity near that of the
first cue. In contrast, in a situation with high information redun-
dancy and high validity dispersion, after seeing the most valid cue
it could be of little benefit to look up another cue that offers only
redundant and less valid information. Hogarth and Karelaia (2005a)
found that under high information redundancy and high validity
dispersion a heuristic relying on only one single cue outperformed
multiple regression in making new inferences.
In the next sections we follow a standard approach to study-
ing ecological rationality (Todd & Gigerenzer, 2000), first using
REDUNDANCY: STRUCTURE THAT HEURISTICS CAN EXPLOIT 191
⎛ p
k( A ) ⎞
⎟ = b1d1 + ... + bmdm + ... + bM dM + b0 ,
B
ln ⎜ (1)
k( A
⎝1− p B )⎠
(e.g., 106 for the most valid of six cues). Again, when the right-hand
sum is positive, object A is selected, otherwise, B. It needs to be
stressed that this computational representation is very different
from the process predicted by take-the-best, with its sequential and
limited information search.
The sixth and last strategy in our competition, which we call
take-two, builds a bridge between the compensatory strategies
and take-the-best (cf. Dieckmann & Rieskamp, 2007): It searches for
the cues in the order of their validity and stops searching when it
finds two cues that favor the same object, which is then selected
regardless of whether, during search, a cue was found that favored
the other object (see chapter 10 on two-reason stopping). If take-two
does not find two cues that favor the same object, it selects the
object that is favored by the cue with the highest validity (or else
picks randomly if no cue discriminates). The strategy follows the
idea that people sometimes do not want to base their decision on
one single cue but nevertheless may want to limit their information
search; take-two satisfies both goals. Take-two has the interesting
property of being able to produce intransitive choices. Since the
predictions of logistic regression (and also of all the other strate-
gies) are always transitive, take-two is the only strategy in our com-
petition that cannot be represented as a special case of Equation 1.
Accuracy of Inferences
Figures 8-1 and 8-2 show the performance (fitting and generalization)
of the different strategies in our artificial high- and low-redundancy
environments, respectively. For each redundancy condition, we
plot the average percentage of correct inferences in the test set made
by the different strategies with (a) low dispersion, and (b) high dis-
persion of the cue validities, for training sets with sizes varying
between 10% and 100% of the environment. The 100% sample
shows the strategies’ accuracies when trained on the entire envi-
ronment, that is, the pure data-fitting case.
Consistent with previous results (e.g., Czerlinski et al., 1999;
Gigerenzer & Goldstein, 1996), take-the-best, the simplest strategy
under consideration, performs very well under high information
redundancy. In particular, in the condition with low dispersion
of the cue validities, take-the-best is the best strategy for all except
the 80–100% training sizes (see Figure 8-1a). Logistic regression,
the benchmark model, is strongly influenced by the size of the
training set: In this condition, if the set is relatively small, less than
40% of the environment, its accuracy drops substantially below
the average accuracy of the other strategies, apparently over-
fitting. However, logistic regression’s accuracy increases with larger
training sets.
With high dispersion of the cue validities, though, logistic
regression substantially outperforms the other strategies (see
Figure 8-1b). Take-the-best is still the second best strategy. The
remaining four compensatory strategies perform at relatively simi-
lar levels. The more complex strategies, naïve Bayes and Franklin’s
rule, outperform the simpler strategies, take-two and Dawes’s rule,
but not by much.
For the low redundancy environments, where the different cues
convey different (independent) information, the results look very
different: Take-the-best is now outperformed by the compensatory
strategies. In particular, when the cue validities have low disper-
sion, take-the-best performs poorly (see Figure 8-2a). However,
when the dispersion of cue validities is high, take-the-best still
reaches accuracies close to those of Dawes’s rule and take-two
(see Figure 8-2b). Logistic regression’s accuracy is again strongly
198 REDUNDANCY AND VARIABILITY IN THE WORLD
(a) 85
75
70
65
Take-the-best
Take-two
60 Dawes’s Rule
Franklin’s Rule
55 Naïve Bayes
Logistic Regression
50
10 20 30 40 50 60 70 80 90 100
Size of Training Set (%)
(b) 85
80
Percentage of Correct Inferences
75
70
65
Take-the-best
Take-two
60 Dawes’s Rule
Franklin’s Rule
55 Naïve Bayes
Logistic Regression
50
10 20 30 40 50 60 70 80 90 100
Size of Training Set (%)
(a) 95
85
80
75
Take-the-best
Take-two
70 Dawes’s Rule
Franklin’s Rule
65 Naïve Bayes
Logistic Regression
60
10 20 30 40 50 60 70 80 90 100
Size of Training Set (%)
(b) 95
Percentage of Correct Inferences
90
85
80
75 Take-the-best
Take-two
70 Dawes’s Rule
Franklin’s Rule
65 Naïve Bayes
Logistic Regression
60
10 20 30 40 50 60 70 80 90 100
Size of Training Set (%)
other strategies is when cues are low in redundancy and have sim-
ilar validities.
The results of this simulation allow us to specify part of the
ecological rationality of take-the-best: Environments that are
characterized by high information redundancy are exploitable by,
and hence friendly to, take-the-best. But even when redundancy is
low, as long as validities are widely dispersed, take-the-best can
perform at a level close to that of compensatory strategies. In con-
trast, environments with low information redundancy and low
validity dispersion are hostile for take-the-best in comparison to
compensatory strategies. These results appear reasonable: Take-
the-best often makes an inference by relying on the information
of a highly valid cue, which leads to high accuracy relative to com-
pensatory strategies when the remaining cues do not offer much
new information anyhow. In contrast, compensatory strategies gain
an advantage in low-redundancy situations in which different cues
offer new information, particularly if take-the-best cannot rely on
high-validity cues (i.e., in the low-dispersion environment).
A compensatory strategy can do better than take-the-best when
the combined information in the cues that are not considered by
take-the-best leads to a better decision, which requires that the
weights given by the compensatory strategy to the remaining cues
allow for compensation (i.e., overruling the decision of take-the-
best). To see just how often this compensation among cues actually
happens for our benchmark logistic regression model, we calcu-
lated a compensation index, defined as the proportion of all possi-
ble pair comparisons between the objects in one environment in
which the set of weights for logistic regression (or other models)
allows for a compensation. For example, a compensation index of
10% for a particular set of cue weights says that over all possible
cue value settings with those weights, a preliminary decision that
is based on the first discriminating cue (searching through the cues
in weight order, large to small) will be compensated (overruled) in
10% of all cases by the remaining cues with smaller weights. To
put these results in perspective we first determined the theoretical
maximum value for the compensation index. To do so, we con-
structed all possible cue configurations (i.e., 26 different configura-
tions), formed all possible comparisons between them, and applied
a unit weight strategy (i.e., Dawes’s rule) to decide between them.
This procedure results in a compensation index of 27%, meaning
that no compensation will occur in 73% of all cases. Compensatory
strategies that weight cues unequally cannot achieve a higher com-
pensation index, because later cue weights (coming in order of
decreasing magnitude) will by definition be smaller than Dawes’s
rule’s equal weights and so will lead less often to compensation.
We determined the compensation index for logistic regression,
REDUNDANCY: STRUCTURE THAT HEURISTICS CAN EXPLOIT 201
Strategy Frugality
Beyond accuracy, another important characteristic of a strategy is
the cost of applying it. Here we ignore computational costs and
focus only on frugality, that is, the percentage of the available
cues looked up for making an inference; this is anyhow likely to be
the most pressing cost for most decision makers (Todd, 2001).
As described above we defined limited information search for
Franklin’s rule, naïve Bayes, and logistic regression by assuming
that they look up cues in the order of their importance (i.e., validi-
ties, log odds, or regression weights), or randomly for Dawes’s rule,
and stop search when a decision on the basis of the information
acquired so far cannot be overruled by any additional information
that might yet be looked up.
Figure 8-3 shows the percentage of cues looked up by the
strategies to reach a decision. Since the strategies’ frugality did not
100
High Redundancy/Low Dispersion
High Redundancy/High Dispersion
90 Low Redundancy/Low Dispersion
Low Redundancy/High Dispersion
Percentage of Required Cues
80
70
60
50
40
30
20
10
0 Take-the-best Take-two Dawes’s Rule Franklin’s Rule Naïve Bayes Logistic
Regression
Strategies
Figure 8-3: Frugality of the six strategies in the four kinds of deci-
sion environments, in terms of percentage of cues needed to make a
decision.
202 REDUNDANCY AND VARIABILITY IN THE WORLD
differ between the training set and the test set, we only present fru-
gality based on the whole environments as samples. Take-the-best
required, on average, only 36% of the cues before reaching a deci-
sion, which is substantially less information than the compensa-
tory strategies that use on average 74% of the cues, even with the
limited information search assumed. Comparing different environ-
ment conditions, take-the-best required less information under
low information redundancy than under high information redun-
dancy. This was different for most of the compensatory strategies,
which required slightly more information under low compared
with high information redundancy.
How can these contradictory results for compensatory versus
noncompensatory strategies be explained? Under high information
redundancy, cues are positively correlated with each other such
that the cues a decision maker checks for will often support the
same object. Therefore, a second discriminating cue will very
often point to the same object as the first discriminating cue,
making it unlikely that a preliminary decision based on the cues
gathered so far could be changed by the remaining cues. Search is
therefore stopped relatively early by the search-stopping mecha-
nism we defined for compensatory strategies. But high information
redundancy also implies that when one cue does not discriminate
between two objects, a second cue is likely not to discriminate
between the objects either. Thus, take-the-best on average has to
search longer before encountering a discriminating cue under high
information redundancy than under low redundancy, where the
chance of finding a discriminating cue right after a nondiscrim-
inating cue is larger. This difference between take-the-best and
compensatory strategies provides an interesting prediction for
experimental tests: Participants favoring a compensatory strategy
should search for less information under high (versus low) infor-
mation redundancy, while participants favoring a noncompensa-
tory strategy should search for more.
Among the compensatory strategies, the simple Dawes’s rule
requires the most cues. This is not surprising since it does not give
larger weights to early cues, so they can be outvoted by later cues
right to the end. Franklin’s rule also requires many cues, in particu-
lar compared to naïve Bayes. Franklin’s rule uses the validities
as weights, which vary considerably less than in the weighting
structure used by naïve Bayes, whose high weight variation leads
it to require the least information among the compensatory strate-
gies (64%). The dispersion of the validities affects only naïve
Bayes’s frugality. This is the case because naïve Bayes’s weighting
structure depends on the validities and becomes extremely skewed
when cues with a relatively high validity exist, as is the case in the
high validity dispersion condition.
REDUNDANCY: STRUCTURE THAT HEURISTICS CAN EXPLOIT 203
Environment Description
(Continued )
204 REDUNDANCY AND VARIABILITY IN THE WORLD
3. Selling prices of Predicting the selling price of 27 houses in Erie, Penn. (Narula
houses & Wellington, 1977), described by the following nine cues:
original price, number of fireplaces, current taxes, lot size,
living space, number of garage spaces, number of rooms,
number of bedrooms, and age of house
4. Salary of Predicting the salary of 52 college professors (Rice, 1995),
professors described by the following five cues: sex, highest degree,
rank, years in current rank, and year degree was earned
5. Rent for Predicting the rent per acre for 67 land units in different
farmland counties in Minnesota used for alfalfa plantations (Weis-
berg, 1985), described by the following four cues: liming
requirement, average rent for tillable land, density of dairy
cows, and proportion of farmland used as pasture
6. Lifespan of Predicting the lifespan of 58 mammals (Allison & Cicchetti,
mammals 1976), described by the following nine cues: body weight,
brain weight, slow wave sleep, paradoxical sleep, total
sleep, gestation time, predation index, sleep exposure
index, and overall danger index
7. Oxidants Predicting the number of oxidants in 30 observations in Los
Angeles (Rice, 1995), described by the following four cues:
wind speed, temperature, humidity, and insulation
8. Absorption of Predicting the amount of oxygen absorbed by dairy wastes in
oxygen 20 observations (Weisberg, 1985), described by the follow-
ing six cues: biological oxygen demand, Kjeldahl nitrogen,
total solids, total volatile solids, chemical oxygen demand,
and day of the week
9. Car accident rates Predicting the accident rate (per million vehicle miles) for
39 observed segments of highways (Weisberg, 1985),
described by the following 12 cues: federal aid interstate
highway, principal arterial highway, major arterial high-
way, length of segment, daily traffic, truck volume, speed
limit, lane width, width of outer shoulder, freeway-type
interchanges, interchanges with signals, and access point
10. Amount of rain- Predicting the amount of rainfall after cloud seeding for
fall after cloud 24 weather observations (Woodley, Simpson, Biondini, &
seeding Berkeley, 1977), described by the following six cues: action,
days after experiment, suitability for seeding, percentage
of cloud cover on day of experiment, pre-wetness, and
echo motion
REDUNDANCY: STRUCTURE THAT HEURISTICS CAN EXPLOIT 205
11. Obesity Predicting the leg circumference at age 18 for 58 men and
women (Tuddenham & Snyder, 1954), described by the
following 11 cues: sex, weight at age 2, height at age 2,
weight at age 9, height at age 9, leg circumference at age
9, strength at age 9, weight at age 18, height at age 18,
strength at age 18, and somatotype
12. Number of Predicting the number of species for 29 Galapagos islands
species on the (Johnson & Raven, 1973), described by the following six
Galapagos Islands cues: endemics, area, elevation, distance to next island,
distance to coast, and area of adjacent island
13. Fuel Predicting the average motor fuel consumption (per person
in gallons) of the 48 contiguous United States (Weisberg,
1985), described by the following seven cues: population,
motor fuel tax, number of licensed drivers, per capita
income, miles of highway, percent of population with driv-
er’s licenses, and percent of licensed drivers
14. Homelessness Predicting the rate of homelessness in 50 U.S. cities (Tucker,
1987), described by the following six cues: percentage of
population in poverty, unemployment rate, public hous-
ing, mean temperature, vacancy rates, and population
15. Total costs of Predicting the total costs of 158 firms (Christensen & Greene,
firms 1976), described by the following seven cues: total output,
wage rate, cost share for labor, capital price index, cost
share for capital, fuel price, cost share for fuel
16. Costs of U.S. Predicting 90 observations of the costs of six different
airlines U.S. airlines (Greene, 2003), described by the following
three cues: revenue passenger miles, fuel price, and load
factor
17. Output of trans- Predicting the output of transportation firms in 25 U.S. states
portation firms (Zellner & Revankar, 1970), described by the following three
cues: capital input, labor input, and number of firms
18. People’s income Predicting the income of 100 people (Greene, 1992),
described by the following five cues: credit card applica-
tion accepted, average monthly credit card expenditure,
age, owns or rents home, and self-employed
19. U.S. Predicting total manufacturing costs for the U.S. from 25
manufacturing yearly observations (1947–1971; Berndt & Wood, 1975),
costs described by the following eight cues: capital cost share,
labor cost share, energy cost share, materials cost share,
capital price, labor price, energy price, materials price
(Continued )
206 REDUNDANCY AND VARIABILITY IN THE WORLD
20. Cost of electricity Predicting the total costs of 181 electricity producers (Ner-
producers love, 1963), described by the following seven cues: total
output, wage rate, cost share for labor, capital price index,
cost share for capital, fuel price
21. Program Predicting the effectiveness of a new teaching method pro-
effectiveness gram for performance in a later intermediate macroeco-
nomics course using 32 observations (Spector & Mazzeo,
1980) described by the following three cues: grade point
average, economic pre-test score, and participation in the
new teaching method program
22. Mileage of cars Predicting the mileage of 398 cars (Asuncion & Newman,
2007), described by the following four cues: displacement,
horsepower, weight, acceleration
23. Liver disorders Predicting the liver disorders (i.e., mean corpuscular volume)
of 345 patients (Asuncion & Newman, 2007), described by
the following five cues: alkaline phosphatase, alamine ami-
notransferase, asparate aminotransferase, gamma-glutamyl
transpeptidase, and number of half-pint equivalents of alco-
holic beverages drunk per day
24. CPU performance Predicting the relative performance of the central processing
unit (i.e., machine cycle time in nanoseconds) of 209 dif-
ferent CPUs (Asuncion & Newman, 2007), described by the
following seven cues: minimum main memory in kilobytes,
maximum main memory in kilobytes, cache memory in
kilobytes, minimum channels in units, maximum channels
in units, published relative performance, and estimated
relative performance
25. Refractivity of Predicting the refractivity of 214 different types of glass
glass (Asuncion & Newman, 2007), described by the following
six cues: sodium, magnesium, aluminum, silicon, potas-
sium, and calcium
26. Alcohol level of Predicting alcohol level of 178 kinds of wine (Asuncion &
wine Newman, 2007), described by the following 12 cues: malic
acid, ash, alkalinity of ash, magnesium, total phenols, fla-
vanoids, nonflavanoids, proanthocyanins, color intensity,
hue, OD280/OD315 of diluted wines, and proline
27. Populations of Predicting the number of inhabitants of 54 African coun-
African countries tries, described by the following seven cues: part of the
Sahel zone, area size, belongs to OPEC, media citations
in 2004, per capita income, number of inhabitants of capi-
tal, and illiteracy rate; data assembled on the basis of own
research, partly based on the World Factbook (Central Intel-
ligence Agency, 2005)
REDUNDANCY: STRUCTURE THAT HEURISTICS CAN EXPLOIT 207
10
Difference in Percent Correct Inferences
−10
10 20 30 40 50 60 70 80 90 100
Size of Training Set (%)
.8
Proportion of Trials
.6
Take-the-best Stopping Rule
Continued Search
.4
.2
0
1 2 3 4 5 6
Learning Phase Decision Phase
Blocks
.6
.4
.2
0
1 2 3 4 5 6
Learning Phase Decision Phase
Blocks
.6
.4
.2
0
1 2 3 4 5 6
Learning Phase Decision Phase
Blocks
.6
.4
.2
0
1 2 3 4 5 6
Learning Phase Decision Phase
Blocks
Conclusions
Arndt Bröder
216
THE QUEST FOR TAKE-THE-BEST 217
218
THE QUEST FOR TAKE-THE-BEST 219
Box 9-1: How We Conducted Experiments and Why We Did It This Way
If we want to know the manner in which people integrate cue information for induc-
tive inferences (i.e., their decision strategies), we must first know which cues people
use. One way to be sure of this in an experiment is to give people the cues to use
explicitly. We provided our participants with four (or five) binary cues (either seen on a
computer screen or learned in training for later recall and use in the experiment) and
cue validities (either by telling them directly or letting them acquire the knowledge
indirectly via frequency learning) and then had them make inferences by choosing
between two or three objects. The pattern of decisions allowed us to draw conclusions
about the strategy probably employed by each participant, using a maximum like-
lihood classification principle (see Bröder & Schiffer, 2003a, for details). We used
domains without much preexisting knowledge to prevent participants from relying on
cues they might bring in from outside the experiment. The tasks we used were:
Stock broker game: Participants inferred which one of multiple shares had the best
prospects for profit by considering different cues about the associated firms, such as
turnover growth (Experiments 5–13).
Criminal case: Participants were detectives judging which of two suspects was
more likely to have committed a murder, based on evidence found at the scene
of the crime. The features (cues) of the suspects had to be retrieved from memory
(Experiments 14–20).
did not specify how generally they thought it would apply: Did
they expect all people to use take-the-best whenever possible, or
all people to use it sometimes, or some people to use it always, or
even only some to use it sometimes? (At the time that I conducted
my first experiments, the notion that take-the-best is only one tool
in the mind’s adaptive toolbox had not been spelled out.) Hence,
our initial research question in approaching take-the-best empiri-
cally was, Is take-the-best a universal theory of inductive infer-
ences, that is, always used by everyone?
In the first three experiments I conducted with 130 participants
in total, I assumed either that all people use take-the-best all the
time (i.e., deterministic use with no errors, Experiment 1) or that
all people use it, but they occasionally make errors (Experiments 2
and 3). Both versions of the hypothesis were clearly rejected: First,
only 5 of the 130 participants used take-the-best all the time (in 15
or 24 trials; see Lee & Cummins, 2004, for a comparable result).
Second, for the other participants, choices were clearly influenced
THE QUEST FOR TAKE-THE-BEST 221
by other cues than just the most valid discriminating one that
take-the-best would use; this systematic influence clearly showed
that the deviations from take-the-best’s predictions could not be
explained away as random response errors.
We could have stopped here and declared the heuristic a dead
end (some authors with similar critical results came close to this
conclusion, e.g., Lee & Cummins, 2004; Newell & Shanks, 2003;
Newell et al., 2003). However, we felt that this would be a prema-
ture burial, since no theory of decision making predicts behavior
correctly 100% of the time. A more realistic version of the theory
would probably allow for both (unsystematic) response errors
and a heterogeneous population of decision makers. For instance, a
small minority of people relying on other heuristics, averaged
together with a group of predominantly take-the-best users, could
have led to my results, as we will see in the next section.
Obvious conclusions of these first experiments were that (a) not
everybody uses take-the-best in every probabilistic inference task,
and (b) if some people do use take-the-best, one has to allow for
unsystematic response errors as psychologists routinely do in other
areas. Thus, I had a definitive—and negative—answer to my initial
research question about take-the-best’s universality, but I began to
doubt that it had been a good question in the first place! Before
claiming that take-the-best was not a reasonable cognitive model, I
thought it worthwhile to confront a more realistic version of the
hypothesis instead of a universal, deterministic straw man.
80
60
50
40
30
20
10 Compensatory Noncompensatory
Environments Environments
0
.4 .6 .8 1.0 1.2 1.4 1.6
Ratio of Expected Payoffs for
Take-the-Best vs. Franklin’s Rule
that we again asked the wrong question. Rather than asking about
the relation between personality and default strategy use, the
adaptive question would be whether there is a correlation between
individual capacities and the ability to choose an appropriate strat-
egy in different environments.
One result of Experiment 11 left us somewhat puzzled and
helped us aim our individual differences question in a new direc-
tion: In addition to the personality measures, participants completed
several scales of an intelligence test (Berliner Intelligenz-Struktur-
Test—Jäger, Süß, & Beauducel, 1997; see Bröder, 2005, for details
of the subscales used), and the intelligence score was slightly, but
significantly, correlated with selected strategies (R2 = .10). However,
contrary to our expectation, it was the clever ones who used take-
the-best!
(b)
Figure 9-2: Stimuli presented during the learning trials for the
criminal case game to investigate memory-based decisions in the
pictorial (top) and verbal (bottom) conditions of Experiments 18
and 19. (Adapted from Bröder & Schiffer, 2006b; original stimuli
were in color, with labels in German.)
234
THE QUEST FOR TAKE-THE-BEST 235
12
10
Decision Time (Seconds)
0
Cue 1 Cue 2 Cue 3 Cue 4
Most Valid Discriminating Cue
Figure 9-3: Mean decision times (and SE) of participants with dif-
ferent outcome-based strategy classifications aggregated across
Experiments 15 to 19. The x-axis denotes different classes of deci-
sion trials in which the most valid cue discriminates (labeled cue
1), the second most valid cue (but not the first) discriminates (cue
2), and so forth. The decision time patterns roughly fit the process
descriptions of take-the-best, Franklin’s rule, Dawes’s rule, and
guessing (see text).
236 REDUNDANCY AND VARIABILITY IN THE WORLD
25
23
Decision Time (Seconds)
21
19
17
15
13
11
9
Take-the-first (N=5)
7 Take-the-best (N=32)
5
Cue 1 Cue 2 Cue 3 Cue 4
Most Valid Discriminating Cue
and ordered cue retrieval by validity. Figure 9-4 shows the mean
decision times of both sets of participants. The order of decision
time means for take-the-first users exactly follows the expected
3–1–4–2 cue sequence, indicating that these people retrieved cues
in the order of retrieval ease and stopped search when a discrimi-
nating cue was found. This hardly seems like a coincidence—we
believe it is evidence that the decision times reflect part of the learn-
ing process, particularly showing its sequential nature. The impor-
tant insight is that this reaction time evidence indicates that both
take-the-first users and take-the-best users process cues sequen-
tially and ignore further information in a lexicographic fashion.
Fight or Run?
Each year in autumn, small but loyal groups of people across
Europe listen to a spectacular concert: the roaring of male red deer
in their rutting season. It might be a matter of taste whether a person
enjoys listening to this performance (also available on CD, and
broadcast live on the Internet via webcam), but for the red deer
stags fighting for control of a harem of mates, it is a matter of genetic
survival. Typically, the roaring is only the first in a sequence of
contests in which the harem holder competes against challengers.
A male red deer has to be in good physical shape to be able to
roar at great volume for some time. If the harem holder roars
more impressively, the challenger may already give up at this point
and walk away. But if the challenger roars with comparable endur-
ance, the next contest is initiated, parallel walking. This contest
allows the competitors to assess each other’s physical fitness at a
closer distance. If this also fails to produce a clear winner, the
third contest is started: head butting. This is the riskiest activity, as
it can result in dangerous injuries (Clutton-Brock & Albon, 1979).
This step-by-step information search allows competitors that are
clearly different in strength to terminate the competition at early
stages, sparing the inferior male the risk of injuries and the superior
male exhausting quarrelling, so that it can save its energy for more
serious challengers (Enquist & Leimar, 1990).
241
242 REDUNDANCY AND VARIABILITY IN THE WORLD
Trust or Refuse?
New York City taxi drivers are much more likely to be murdered
than the average worker. When drivers are hailed in the Bronx or
other dangerous neighborhoods, they need to screen potential
passengers to decide whether they are trustworthy. Many drivers
report that in cases of doubt, they first drive past the person to check
him or her out before pickup. If they pick up the wrong client, they
may be robbed or even killed. If they refuse a harmless client, they
will lose money. How do drivers make up their minds? Unlike
the red deer, they must decide rather quickly, and different drivers
can use different cues depending on their past experience. One
New York driver said that a hailer in a suit and tie could be trusted
whereas “people that mostly do the robbing have tattoos on . . .
roughneck dressing with some hoodie, trying to hide some part of
their face.” Another, in contrast, was wary of overdressing, because
“a well-dressed man in a bad area, something has to be wrong”
(Gambetta & Hamill, 2005, pp. 158–159). Some cues for trust, how-
ever, were shared by virtually all drivers, including older over
younger, female over male, and self-absorbed over inquisitive.
Whatever cues were used, they had to be assessed rapidly on the
fly, putting a premium on searching for a few, accurate, readily
observed attributes.
Shoot or Pass?
Handball players face a constant stream of decisions about what to
do with the ball. Pass, shoot, lob, or fake? They have to search in
memory for past experiences to generate appropriate present
options. The question, then, is when to stop generating options and
act upon one of them. In an experiment designed to find out,
skilled players stood in front of a video screen, where 10-second
scenes from top games were shown, ending in a freeze-frame
(Johnson & Raab, 2003). The players were then asked to imagine
that they were the player with the ball and to choose the best play
option as quickly as possible. After this, they could inspect the
still-frozen frame for another 45 seconds and then choose a play
again. With this additional time and information, about half of the
players changed their mind about the best option. But which
was better: the quick intuitive judgment or the one after extended
reflection? On average, the first option that came to players’ minds
was best. If they continued to search in their memory to generate
further options, they were likely to come up with second-best and
third-best options and end up acting on one of those instead.
Sometimes, searching for fewer options is not only faster, but also
better.
EFFICIENT COGNITION THROUGH LIMITED SEARCH 243
Kinds of Search
The red deer, the cab driver, and the handball player all search for
information––in the outside world and inside memory. Human
brains seem to have a large appetite for information, and for that
reason Homo sapiens have been baptized as informavores (Dennett,
1991). In this chapter, we explore two aspects of information
foraging: how people search for information, and when they stop
searching.
Principles of searching and stopping are candidates for the basic
elements of cognition, a set of mental operations that can be
combined to produce the myriad of feats the mind is capable of
performing (Simon, 1979a). One might expect that psychologists
have been busy studying the question of how informavores forage
for information. Instead, there has been a puzzling preference for
theories that ignore search and stopping. In research on categoriza-
tion and inference, for instance, almost all theories––from exem-
plar models to Bayesian models––assume that all of the relevant
information about cues is already given (see Berretty, Todd, &
Martignon, 1999). Similarly, in judgment and decision-making
research, most approaches, including expected utility theories,
prospect theory, and multiple cue probability learning, do not
model information search. This is understandable given that most
experiments conveniently lay out all the information in front of the
experimental participant, obviating search (see Cooksey, 1996;
Fishburn, 2001; Holzworth, 2001). The question asked is, how do
people process given information—in an additive, multiplicative,
Bayesian, or other way? Yet in the real world, the required informa-
tion is not usually handed to the decision maker on a tray (or a
computer screen). Physicians must decide when to stop diagnostic
testing and administer treatment, just as the cab driver must decide
when to stop looking for more cues and then trust or refuse. People
search on the Internet for information about digital cameras, think
about which friend to ask for advice on personal matters, or try to
find a good reason to justify buying a Porsche, and in each case
have to ultimately stop their search and make a choice. These kinds
of search for information are usually sequential. The male deer does
not simultaneously listen to the competitor’s roar, check him out
while running alongside him, and assess the force of impact of
his head, and then weight and add the information to decide
whether to run away or stay. Instead, each cue is checked in order,
only if it is needed to make the decision. The advantage of such
sequential search is that the process can be stopped at any time,
which can save energy, costs, and lives.
Why then is search often neglected in psychological research?
One possible answer lies in the kinds of statistical models
244 REDUNDANCY AND VARIABILITY IN THE WORLD
Cues or Alternatives
The goal of search can be cues or alternatives (Table 10-2) or both.
The deer has two essentially predetermined alternatives, to con-
tinue challenging the opponent or to run away, and must just search
for cues as to which option is more promising. Similarly, the taxi
driver has two alternatives, to trust or refuse a potential passenger,
and also just needs to look for cues. How do they search for the cues
they need, and when do they stop? One possibility is with lexico-
graphic heuristics such as take-the-best (Gigerenzer & Goldstein,
1996) and elimination-by-aspects (Tversky, 1972), which model
search and stopping over cues, assuming that the alternatives are
given. Now consider a search for alternatives. The handball player
must search in memory for alternative plays in a game situation.
The hotel guest who keeps pressing the television remote control
is searching for alternatives consisting of the subjectively most
interesting programs. How do athletes and hotel guests search for
alternatives and when do they stop? A second group of heuristics,
including satisficing search using aspiration levels, describes this
sequential search for alternatives (Simon, 1955a; see also Seale &
Rapoport, 1997; Selten, 2001; Todd & Miller, 1999).
This distinction between search for cues and alternatives is not a
mutually exclusive one. Often, both kinds of search are involved
in solving a problem: A challenger deer could look for more cues
about the strength of the current harem holder or could leave this
contest and look for another harem holder to challenge. Similarly,
when people search through alternatives such as potential houses
or spouses, they also collect information about their attributes
(Saad, Eba, & Sejean, 2009). And a channel surfer searching for
better alternatives also has to search for cues while sampling a few
seconds or minutes of each program in order to infer how interest-
ing it might be. Some research has been done that combines both
kinds of search, such as experiments where the information can be
uncovered from a matrix of alternatives and their cues. This design
allows researchers to determine to what degree a person employs
alternative-wise search, that is, first looking up all the cue values
of one alternative before examining the second alternative, or cue-
wise search, that is, looking up the value of all alternatives on
one cue before proceeding to the next cue (e.g., Bettman, Johnson,
Luce, & Payne, 1993; Payne, Bettman, & Johnson, 1988; Russo &
Dosher, 1983).
Next, consider an optimal rule for stopping search that has this
form:
This stopping rule aims at the best stopping point, not just a good
one. Models for sequential search with optimal stopping rules have
been proposed in economics (e.g., Stigler, 1961) and psychology
(e.g., Anderson, 1990; Busemeyer & Rapoport, 1988). For instance,
in one of Anderson’s models, search in memory stops when costs of
retrieving the next record (in terms of retrieval time, etc.) exceed
the expected benefit of retrieving it.
Optimization can be feasible in the small world of an experiment
with three or four independent variables, yet it may become sci-
ence fiction for a mind embedded in the large, uncertain world.
Many interesting problems are out of reach of optimization meth-
ods, because optimization is computationally intractable, too slow,
or too expensive (Michalewicz & Fogel, 2000). For instance, this
computational limitation applies to the problem of finding the opti-
mal order of a set of cues for use with a lexicographic decision
heuristic: This problem is NP-complete (Martignon & Hoffrage,
1999), meaning that when there are many cues, finding the solution
becomes computationally intractable. Similarly, finding the opti-
mal stopping point for search can be impossible in real-world set-
tings, unlike in small experimental worlds where the costs of future
search are specified by the experimenter. For instance, economic
theories of optimization under constraints typically assume that
people behave as if they are omniscient, that is, have perfect knowl-
edge of all benefits and costs of further search, and can solve the
differential equations needed to determine the ideal stopping
point where the costs of further search exceed its benefits. To their
credit, proponents of this approach usually make clear that these
optimization models make “the agents in our models more like the
econometricians who estimate and use them” (Sargent, 1993, p. 4).
But when optimization is out of reach, real people can employ heu-
ristic rather than optimal search and stopping rules.
Computational intractability is one constraint on the ideal of
optimal search or stopping rules. Another constraint is estimation
error, or alternatively the need for robustness. Consider once more
the task of estimating the optimal order of cues. In a computer sim-
ulation of a real environment with nine cues for inferring population
sizes of cities, Martignon and Hoffrage (1999) determined the opti-
mal cue order. This involved evaluating 9! = 362,880 orders, which
is computationally tractable (for a computer at least), meaning that
250 REDUNDANCY AND VARIABILITY IN THE WORLD
the best order could be found (in this case by exhaustive search).
A heuristic search rule (validity vi, described shortly) led to a cue
order that was better than 98% of all possible orders, although
the optimal order was by definition better. Yet, how well did these
heuristic and optimal orders fare in a new sample, that is, when it
came to foresight (prediction or generalization) rather than hind-
sight (data fitting)? Martignon and Hoffrage split the set of cities
into two halves and for one half (the training set) calculated the
optimal cue order as well as the heuristic (vi) order. This procedure
was repeated 100 times to control for random sampling effects. In
each case, the two orders were tested for their fitting accuracy on
the training set, as well as their ability to make accurate predictions
on the second half of the cities (the test set), a procedure known as
cross-validation. The surprising result was that the heuristic order
led to higher predictive accuracy than the optimal order did. Because
the loss of accuracy when comparing fitting to prediction was lower
for the heuristic order (a decrease from 75% for fitting to 73% for
predicting) than for the optimal order (from 77% to 72%), the heuris-
tic search rule is referred to as being more robust in this situation.
The general point is that an optimal order for one sample of
observations is not necessarily the optimal order for a new sample
(called out-of-sample prediction—see chapter 2). In this sense, a
heuristic search rule can perform “better than optimally” in predic-
tion, although not in fitting. Robustness is also crucial if the train-
ing and test sets are not random samples from the same population
but from two populations that differ in unknown respects (out-of-
population prediction). Out-of-population prediction is the rule
rather than the exception in medicine, for instance, where a diag-
nostic system with various cues or predictors is developed using a
sample of patients in one hospital and then applied to patients in
other hospitals and geographic locations, who belong to popula-
tions that differ in unknown ways (see chapter 14 for a discussion
of this in the context of building robust medical decision trees).
Scope
This chapter presents models of how people search for cues to make
a decision when optimization is out of reach. The decision task we
focus on is an inference between two alternatives. An inference
has a clear criterion—that is, the decision can be proven to be right
or wrong. The male deer’s decision to fight can be proven right if
he wins and wrong if he loses, and the New York taxi driver will
soon find out whether someone is honest or dodgy after deciding
that the person looks trustworthy. We do not deal with preferential
choices where direct feedback about right or wrong does not exist,
such as whether to marry or not, or whether to have chocolate or
vanilla ice cream for dessert. It is nevertheless possible that the
search rules we consider for inference also apply to preference.
Experimental research conducted to test search and stopping
rules is more mundane than the risky real-life decisions of taxi
drivers. Typically, participants have to use a set of binary cues to
infer which of two alternatives has the higher value on some crite-
rion, such as which of two shares will earn more money, which of
two suspects committed a murder, or which baseball team will win
a game. But a number of the search and stopping rules we discuss
here have been generalized to other tasks as well, such as classifica-
tion (Berretty et al., 1999; see also chapter 14) and estimation
(Hertwig, Hoffrage, & Martignon, 1999; see also chapter 15).
Search rules and stopping rules are two of the building blocks of
heuristics. Particular building blocks in a given heuristic specify
what information to look for (search), how long to search for it
(stopping), and what to do with the pieces of information found
252 REDUNDANCY AND VARIABILITY IN THE WORLD
chapter 9), and the use of this stopping rule predicts decision times
in individual judgments (Bröder & Gaissmaier, 2007).
In this chapter, we analyze adaptive decision making at the
level of search and stopping rules rather than of entire heuristics.
We will not deal with decision rules, except for the constraints
that particular stopping rules impose on decision rules. In what fol-
lows, we first describe several alternative search and stopping
rules and then address their ecological rationality and the empiri-
cal evidence for their use in different situations.
Models of Search
Random Search
An elementary form of search is:
a good order of cues, learning could be slow and some good cues
might get “stuck” low in the cue order so that they would seldom
be used (Todd & Dieckmann, 2005; see chapter 11). One way to
alleviate these problems would be to start with random rather than
ordered search, which promotes learning about all cues equally,
and not just those that happened to be ranked high in the initial
orderings. After a head start with random search, the physician
should then switch to ordered search. Random search corresponds
to an exploration phase, to be distinguished from an exploitation
phase, in which a heuristic exploits the cue structure in the envi-
ronment. Similarly, when learning samples are small and many cue
values are unknown, random search can lead to predictions as
accurate as those produced by search by validity and substantially
more accurate than those of multiple regression (Gigerenzer &
Goldstein, 1999).
For the second condition, it is easy to see that if cue validities are
about equal, then order of cues does not matter for accuracy. This is
one of the conditions that make Dawes’s rule ecologically rational
and enable it to predict more accurately than multiple regression
(Hogarth & Karelaia, 2005a, 2006b, 2007; Martignon & Hoffrage,
2002; see also chapter 3).
Ri R
vi = = i
Ri Wi Di
ui = vidi
Ri Di Ri
ui = × =
Di P P
256 REDUNDANCY AND VARIABILITY IN THE WORLD
Ri ( P − Di ) 1 − di
si = = ui +
P 2
where P−Di is the number of pairs in which a cue i does not dis-
criminate. The corresponding search rule is:
The reason for conditions (1) and (2) can be intuitively under-
stood by considering their converse: If cue validities were all equal,
then ordering them would be pointless (as in the condition for
random search earlier), and if all cues were independent of each
other, then relying on one reason alone would result in inferior
predictions compared to combining multiple independent pieces
of information. In environments where (1) and (2) hold, search by
validity, combined with one-reason stopping, typically generates
more accurate predictions than multiple regression and other linear
strategies that weight and add all pieces of information (Czerlinski,
Gigerenzer, & Goldstein, 1999; Dieckmann & Rieskamp, 2007; see
more on the effects of redundancy in chapter 8). Note, however,
that condition (1) only benefits ordered search when the decision
maker knows the order of validities. If this order has to be estimated
from small samples (as discussed in chapters 2 and 11), then ordered
search in combination with one-reason stopping runs the risk of
betting on the wrong cue. Condition (3) has been frequently dis-
cussed in the literature (e.g., Payne et al., 1993). When search is
limited, it is important to rank cues by validity, discrimination rate,
or a combination of both in order to save costs by increasing the
chance of finding a good cue first, or a cue that allows for a quick
decision. We present a formal definition of costs later in the section
on stopping rules.
When are usefulness and success search rules more appropriate
than validity alone? Note that they only differ from validity when
the discrimination rates vary. If an organism’s goal is solely accu-
racy, these search orders cannot beat validity order. But when fru-
gality, defined as the number of cues looked up before search is
stopped, and decision speed matter in addition to accuracy, then
search by usefulness and by success could be preferable to search
by validity if
Recency Search
The search rules proposed so far are adapted to environments that
are relatively stable. This is the case for many tasks, such as judging
the trustworthiness of potential passengers. But if a New York
taxi driver were to move his business to Belfast or Cairo, some of
the relevant cues or their order would probably change. How can
search adapt to such a changing environment? When environments
change quickly, social learning of cues, by advice or imitation, is
one option (Boyd & Richerson, 1985). Individual learning could be
possible by resetting the reference class, from New York to, say,
Belfast, and starting fresh. Another way an individual could learn
would be to exploit a cognitive limitation, namely, a short memory
window produced by forgetting (Anderson & Schooler, 2000; see
chapter 6). The simplest such strategy is to search through the most
recent experiences in the following way:
Fluency Search
Related to recency search is ordering cues based on their fluency,
that is, by how quickly they come to mind. Fluency is in some
sense a passive rather than an active ordering principle driven
by the experiences that an individual has had. As illustrated earlier
with the example of the handball players (Johnson & Raab, 2003),
there are situations in which experts are well advised to rely on the
first option they think of (Gigerenzer, 2007). The idea that fluency
of recall can be used as a cue in inferential judgment goes back to
research on the availability heuristic (Tversky & Kahneman, 1974),
which assumes that people use the ease of retrieving instances, or
the frequency of the instances they retrieve, to assess the probabil-
ity of events. The availability heuristic has been criticized because
its underlying processes are not precisely defined (e.g., Fiedler,
1983; Gigerenzer, 1996). But Schooler and Hertwig (2005) demon-
strated a way forward, using the ACT-R framework (Anderson et al.,
2004) to specify the related fluency heuristic and thereby produce
testable predictions about the efficacy of using fluency information
in different environments (see chapter 6 for details). The fluency of
retrieving information is often informative because it correlates
with the frequency and recency of encountering information in the
environment (Anderson & Schooler, 1991). This has primarily
been studied regarding comparisons between alternatives, such as
which music artists have higher album sales (Hertwig, Herzog,
Schooler, & Reimer, 2008) or which of two cities is larger (Schooler
& Hertwig, 2005). Gaissmaier (2008) extended the idea of informa-
tive fluency use from alternatives to the ordering of cues.
An important distinction needs to be made between fluency
search and the ordering principles described so far, in that fluency
is an attribute of particular cue values and not of cues per se. That
is, while other orders assume that the values of two alternatives are
260 REDUNDANCY AND VARIABILITY IN THE WORLD
Search by Accessibility
Fluency search refers to the situation where memory access orders
the cues for us. When cues are available externally rather than in
memory, often the environment orders information for us. Some
cues, for instance, could be more readily accessible to the senses
than others, as the case of the red deer illustrates. The red deer
checks cues in a fixed order, following the reach of the senses:
Acoustic signals are available first (particularly in a forest environ-
ment), visual cues require closer proximity, and tactile signals
necessitate physical contact. This sensory order is also adaptive
because it correlates with the risk of being hurt while assessing the
cues. In mate choice, speed of accessibility can order cues: Physical
attractiveness of a potential mate is easy and quick to assess, while
it takes time and effort to find out about intelligence and sexual
fidelity (Miller & Todd, 1998). Although cues frequently vary in
their accessibility, we know of no studies that systematically
EFFICIENT COGNITION THROUGH LIMITED SEARCH 261
I = c/g
1. Another way to see this is to start with the requirement that the
minimum gains minus costs for checking one cue, g−c, must be greater
than the expected reward from guessing, g/2, or g−c > g/2. Then g/2 > c,
1/2 > c/g = I, and I < 1/2.
264 REDUNDANCY AND VARIABILITY IN THE WORLD
to use one-reason decision making but at the same time set themselves a
limit as to how much information they will maximally purchase, such as
“Stop when a discriminating cue is found, but only look for a maximum
of m cues. If no discriminating cue is found by that point, then stop search
and guess.”
3. Again we can see this by starting with the requirement that the maxi-
mum possible gain after checking two cues, g−2c, must be greater than
the gain from just guessing (without checking any cues), g/2, or g−2c > g/2,
g/2 > 2c, g > 4c, 1/4 > c/g = I.
266 REDUNDANCY AND VARIABILITY IN THE WORLD
Can we use the heuristic rules and their match with specific envi-
ronmental structures, as defined in the previous sections, to predict
what heuristics people will use in particular environments? The
logic is first to analyze the match between various search and stop-
ping rules and particular experimental settings and then to see
whether people use these in an adaptive, ecologically rational way.
The ideal study of the adaptive use of search and stopping rules
would implement different relevant environmental structures as
independent variables and then analyze whether the distribution of
search and stopping rules used by people in those environments
changes as predicted by their ecological rationality. Such studies
exist, but the majority have tested only one rule or one heuristic
(often just take-the-best) in one or two environments. Therefore,
part of the evidence concerning the adaptive use of search and
stopping rules is indirect, based on comparisons between experi-
ments. We look first at experiments pitting inferences requiring
search against inferences from givens, and then turn to experiments
involving search for cues in environments with particular types of
structure.
in the former there are still noticeable costs associated with deter-
mining what information to seek next and then actually obtaining
it, even if this just means clicking on it with a mouse, rather than
merely casting one’s eyes over a table of cues already laid out. These
appreciable costs lead to a hypothesis that exactly parallels the one
for memory search:
Noncompensatory Environments
We have seen that in a noncompensatory environment, search by
validity and one-reason stopping are ecologically rational when the
order of cue validities is known (as opposed to being estimated
from samples—see Gigerenzer & Brighton, 2009). This result leads
to the following hypothesis:
Cue Redundancy
As discussed earlier, one-reason stopping is adaptive relative to
compensatory strategies when the redundancy (e.g., correlation)
between cues is high. This suggests the following hypothesis:
Costs of Cues
What influence do monetary costs have on stopping rules? In all
studies we are aware of, all available cues have had the same cost
(unlike in the red deer’s situation), so we restrict our analysis to
EFFICIENT COGNITION THROUGH LIMITED SEARCH 271
Time Pressure
Direct monetary costs are not the only means of favoring frugality
in information search. Time pressure should also increase the use
of a stopping rule that ends search quickly. Thus, we hypothesize:
274
SIMPLE RULES FOR ORDERING CUES IN ONE-REASON DECISION MAKING 275
Search rule: Search through cues in some order. For the mini-
malist heuristic, order is random, while for take-the-best,
order is in terms of ecological validity, that is, the propor-
tion of correct decisions made by a cue out of all the times
that cue discriminates between pairs of options.
Stopping rule: Stop search as soon as one cue is found that
discriminates between the two options.
Decision rule: Select the option to which the discriminating
cue points, that is, the option that has the cue value associ-
ated with higher criterion values.
.76
.72
.68
.60
2.6 3.0 3.4 3.8 4.2 4.6
Frugality (Number of Cues Used)
Figure 11-1: Mean final offline accuracy and frugality after 100
learning trials for various cue order learning rules. In gray, all
362,880 possible search orders for the city comparison task are plot-
ted in terms of their frugality and accuracy. The open star indicates
the performance of ecological validity ordering in take-the-best and
the black star shows random cue ordering in minimalist, corre-
sponding to the mean cue order where all learning rules begin. The
mean offline performance of all of the learning rules has improved
after 100 trials in comparison to that benchmark (greater frugality
and mostly higher accuracy).
order; and (c) count, in which a tally is kept of the number of times
each record is retrieved, and the list is reordered in decreasing
order of this tally after each retrieval. Because count rules require
storing additional information, more attention has focused on the
memory-free transposition and move-to-front rules. Analytic and
simulation results (reviewed in Bentley & McGeoch, 1985) have
shown that while transposition rules can come closer to the
optimal order asymptotically, in the short run move-to-front rules
converge more quickly (as can count rules). This may make
move-to-front (and count) rules more appealing as models of cue
order learning by humans facing small numbers of decision trials.
Furthermore, move-to-front rules are more responsive to local
structure in the environment (e.g., able to capitalize immediately
on a particular record becoming temporarily “popular”), while
transposition can result in very poor performance under some cir-
cumstances (e.g., when neighboring pairs of “popular” records get
trapped at the far end of the list by repeatedly swapping places
with each other).
It is important to note, however, that there are some critical dif-
ferences between the self-organizing sequential search problem
and the cue-ordering problem we address here. First, when a record
is sought that matches a particular key, search proceeds until
the correct record is found. In contrast, when a decision is made
lexicographically and the list of cues is searched through, there
is no one “correct” cue to find—search stops at the first cue that
discriminates between the decision alternatives (i.e., allows a deci-
sion to be made), and there may be several such cues. Furthermore,
once a discriminating cue is found, it may not even make the
correct decision (the lower its validity, the more likely it is to indi-
cate the wrong choice). Thus, given feedback about whether a deci-
sion was right or wrong, a discriminating cue could potentially be
moved up or down, respectively, in the ordered list. This dissocia-
tion between making a decision or not (based on the cue discrimi-
nation rates, that is, the proportion of all decisions on which
the cue makes a distinction between alternatives), and making a
right or wrong decision (based on the cue validities), means that
there are two performance criteria in our problem—frugality and
accuracy—as opposed to the single criterion of search time for
records. Because record search time corresponds to cue frugality,
the heuristics that work well for the self-organizing sequential
search task are likely to produce orders that emphasize frugality
(reflecting cue discrimination rates) over accuracy when they are
applied to the cue-ordering task. With this tendency in mind,
these heuristics offer a useful starting point for exploring cue-
ordering rules.
SIMPLE RULES FOR ORDERING CUES IN ONE-REASON DECISION MAKING 281
made by a cue so far (in all the times that the cue was looked up)
and a separate count of all the correct discriminations (i.e., those
decisions where the cue discriminated and indicated the alterna-
tive with the higher criterion value). Therefore, its memory load
is comparatively high. The validity of each cue is determined
by dividing its current correct discrimination count by its total
discrimination count. Based on these values computed after each
decision, the rule reorders the whole set of cues from highest to
lowest validity.
The tally rule only keeps one count per cue, storing the differ-
ence between the number of correct decisions and incorrect
decisions made by that cue so far. If a cue discriminates correctly
on a given trial, one point is added to its tally, and if it leads to an
incorrect decision, one point is subtracted. The tally rule is thus
less demanding than the validity rule in terms of both memory and
computation: Only one count is kept, and no division is required.
Note that the tally rule with its single count is sensitive to the
number of discriminations while the validity rule is not. For
instance, the validity rule would rank a cue that made 5 discrimina-
tions, 4 of them correct and 1 incorrect, the same as a cue that made
25 discriminations, 20 correct and 5 incorrect (because 4/5 = 20/25),
while the tally rule would rank the latter higher (4 − 1 < 20 − 5).
SIMPLE RULES FOR ORDERING CUES IN ONE-REASON DECISION MAKING 283
The simple swap rule uses the transposition rather than the
count approach. This rule has no memory of cue performance other
than an ordered list of all cues and just moves a cue up one position
in this list whenever it leads to a correct decision, and down if it
leads to an incorrect decision. In other words, a correctly deciding
cue swaps positions with its nearest neighbor upward in the cue
order, and an incorrectly deciding cue swaps positions with its
nearest neighbor downward.
The tally swap rule is a hybrid of the simple swap rule and the
tally rule. It keeps a tally of correct minus incorrect discriminations
per cue so far (so memory load is high) but only moves cues by
swapping: When a cue makes a correct decision and its discrimina-
tion tally is greater than or equal to that of its upward neighbor, the
two cues swap positions. When a cue makes an incorrect decision
and its tally is smaller than or equal to that of its downward neigh-
bor, the two cues also swap positions. Otherwise, the tallies of the
neighboring cues suggest that the current cue order is reasonable
and no change is made, providing a degree of stabilizing inertia.
We also test two types of rules that move cues to the top of the
rank order. First, the move-to-front rule moves the last discriminat-
ing cue (i.e., whichever cue was found to discriminate for the cur-
rent decision) to the front of the order. This is equivalent to the
cue-ordering building block employed by the take-the-last heuristic
(Gigerenzer & Goldstein, 1996, 1999), which uses a memory of cues
that discriminated most recently in the past to determine cue search
order for subsequent decisions. Second, selective move-to-front
moves the last (most recent) discriminating cue to the front of the
order only if it correctly discriminated; otherwise, the cue order
remains unchanged. This rule thus takes accuracy as well as dis-
crimination-based frugality into account.
Finally, we consider an associative learning rule that uses the
delta rule (Widrow & Hoff, 1960) to update cue weights according
to whether they make correct or incorrect discriminations and then
reorders all cues in decreasing order of this weight after each deci-
sion. This corresponds to a simple network with K (in our dataset,
9) input units encoding the difference in cue value between the two
objects (A and B) being compared (i.e., ini = −1 if cuei(A) < cuei(B),
1 if cuei(A) > cuei(B), and 0 if cuei(A) = cuei(B) or cuei was not
checked), and one output unit whose target value encodes the
correct decision (t = 1 if criterion(A) > criterion(B), otherwise −1).
The weights between inputs and output are updated according to
⎛ K
⎞
Δwi = lr ⎜ t ∑ ink w k ⎟ ini with learning rate lr = 0.1. We expect this
• •
⎝ k =1 ⎠
rule to behave similarly to selective move-to-front initially (moving
a correctly discriminating cue to the front of the list by giving it the
284 REDUNDANCY AND VARIABILITY IN THE WORLD
largest weight when weights are small) and to tally swap later on
(moving cues only a short distance in the list once weights are larger).
To test the performance of these cue-order learning rules when
applied to small samples of data, we used the German cities data
set (Gigerenzer & Goldstein, 1996, 1999) consisting of the 83 largest
German cities (those with more than 100,000 inhabitants in 1990)
described on 9 cues that give some information about population
size. We present results averaged over 10,000 learning sequences
for each rule, starting from random initial cue orders. Each sequence
consisted of 100 comparisons to decide the larger of two randomly
selected cities. For each decision, the current cue order was used
to look up cues until a discriminating cue was found, which was
used to make the decision (employing a lexicographic one-reason
stopping rule and decision rule as in take-the-best). After each
decision, the cue order was updated using the particular order-
learning rule. We consider two measures of accuracy: The cumula-
tive accuracy (i.e., online or amortized performance—Bentley &
McGeoch, 1985) of the rules is defined as the total percentage of
correct decisions made so far at any point in the learning process,
which captures the essence of learning-while-doing. The contrast-
ing measure of offline accuracy indicates how well the current
learned cue order would do if it were applied to the entire test set
(also known as batch learning).
TTB
.74 Delta Rule
Validity
Tally
Tally Swap
Proportion Correct
.73 Simple Swap
Selective Move-to-front
Move-to-front
.72
.71
.70
random
.69
0 20 40 60 80 100
Decision
TTB
4.2
Delta Rule
Frugality (Number of Cues Used)
Validity
4.0 Tally
Tally Swap
Simple Swap
3.8 Selective Move-to-front
Move-to-front
3.6
3.4
random
3.2
3.0
2.8
0 20 40 60 80 100
Decision
285
286 REDUNDANCY AND VARIABILITY IN THE WORLD
accuracy and at the same time also beat minimalist’s random cue
selection in terms of frugality.
In fact, it could be that the frugality-determining discrimination
rates of cues generally exert more of a pull on cue order than
validity. One reason to expect this is the fact that in the city data
set we used for the simulations (as in other natural environments;
see Gigerenzer & Goldstein, 1999), the validities and discrimi-
nation rates of cues are negatively correlated. A cue with a low
discrimination rate along with a high validity has little chance
of being used and hence, of demonstrating its high validity.
Whatever learning rule is used, if such a cue is displaced down-
ward to the lower end of the order by other cues, it may never be
able to escape to the higher ranks where it belongs. The problem is
that when a decision pair is finally encountered for which that
cue would lead to a correct decision, it is unlikely to be checked
because other, more discriminating although less valid, cues are
looked up before and already bring about a decision. In this regard,
our learning rules, when combined with one-reason decision
making, are sensitive to the order of experiences, an effect described
in the incremental learning literature (e.g., Langley, 1995). Because
one-reason decision making is intertwined with the learning
mechanism in learning-while-doing scenarios, and so influences
which cues can be learned about, across these learning rules what
mainly makes a cue come early in the order is producing a high
absolute surplus of correct over incorrect decisions (which the tally
rule in particular is tracking) and not so much a high ratio of correct
discriminations to total discriminations regardless of base rates
(which validity tracks).
Overall, the tally and tally swap rules emerge as a good compro-
mise between performance, computational requirements, learning
speed, and psychological plausibility considerations. Remember
that the tally and tally swap rules assume a memory store of the
counts of correct minus incorrect decisions made by each cue so
far. But this does not make them implausible for use by natural
minds, even though computer scientists were reluctant to adopt
such counting approaches for their artificial systems in the past
because of their extra memory requirements. There is considerable
evidence that people are actually very good at remembering the
frequencies of events—even human babies and nonhuman animals
seem sensitive to differences in the frequency of observed or expe-
rienced events, at least for small numbers. For instance, Hauser,
Feigenson, Mastro, and Carey (1999) showed that both 10- to
12-month-old babies and rhesus monkeys preferred containers with
more food items in them after they had observed the experimenter
putting in the items one after another. In this situation, babies and
monkeys could discriminate between a container with up to three
288 REDUNDANCY AND VARIABILITY IN THE WORLD
items and a container with four items. Other studies have shown
that after extensive training, animals can even learn to discriminate
between much larger numbers. Rilling and McDiarmid (1965)
trained pigeons to repetitively peck one illuminated lever, and
when the light went out, to change to one of two adjacent levers
depending on the number of pecks they had made. In this way,
the pigeons were shown to be able to discriminate between 35 and
50 pecks. Hasher and Zacks (1984) concluded from a wide range
of studies that frequencies are encoded in an automatic way,
implying that people are sensitive to this information without
intention or special effort. This capacity is usually demonstrated in
experiments that involve tracking the frequency of many different
items (e.g., Flexser & Bower, 1975; Underwood, Zimmerman, &
Freund, 1971; Zacks, Hasher, & Sanft, 1982; for reviews see also
Nieder & Dehaene, 2009; Sedlmeier & Betsch, 2002).
Consequently, the tally-based rules seem simple enough for a
wide range of organisms—including college students in their role
as experimental participants—to implement easily. In comparison,
the simple swap and move-to-front rules may not be much simpler,
because storing a cue order may be about as demanding as storing a
set of tallies. The tally-based rules are also computationally simple
because they do not have to keep track of base rates or perform
divisions, as does the validity rule.
Estes (1976) provided empirical evidence for the use of tally-
based strategies, arguing that people often base decisions on raw
frequencies rather than converting them into base-rate-adjusted
probabilities. In a series of experiments, participants first observed
outcomes of an imaginary survey about people’s preferences for a
number of products. They saw pairs of products and were told
which one was preferred by a fictional consumer. By showing
participants different pairs of products (e.g., A vs. B, C vs. D, etc.)
with varying frequency, Estes could pit the probability of a product
being preferred against the number of times it was preferred. In the
subsequent test phase, critical pairs were formed (e.g., A vs. C, with
A having a higher probability of preference, say, in 8 out of 10 pairs,
and C having the higher absolute frequency of preference, say, in
12 out of 24 pairs). Participants then had to indicate which product
was more likely to be preferred by a new sample of people from the
same population. In this test phase, participants showed a strong
tendency to predict that the winner would be the product that had
been preferred more frequently in the observation phase, even when
it had a lower probability of preference (i.e., C over A in our exam-
ple). This result supports the idea that people may keep track of
the number of correct discriminations that a cue makes rather
than utilizing a conditional measure such as its validity when
determining a cue order to use. We next turn to a set of experiments
SIMPLE RULES FOR ORDERING CUES IN ONE-REASON DECISION MAKING 289
Before we ask how people learn cue orders, we should ask what
cue orders they actually end up using when making decisions.
This question has been partly addressed in research on the use of
the take-the-best heuristic, with its fixed validity-ordered cue
search. In situations where information must be searched for
sequentially in the external environment, particularly when there
are direct search costs for accessing each successive cue, con-
siderable use of take-the-best has been demonstrated (Bröder,
2000a, Experiments 3 & 4; Bröder, 2003; see also chapter 9). Take-
the-best is also employed when there are indirect costs, such as
from time pressure (Rieskamp & Hoffrage, 1999) or from internal
search in memory (Bröder & Schiffer, 2003b). The particular search
order used by people in these experiments has not always been
tested separately, but when such an analysis has been done, search
by cue validity order has been found (Newell & Shanks, 2003;
Newell, Weston, & Shanks, 2003).
However, none of these experiments tested whether people were
ever using search orders other than validity. A closer look into
the experimental designs of the studies cited above reveals that
they would not even have been able to show the use of many other
search orders: They all used systematically constructed environ-
ments in which the discrimination rates of the cues were held con-
stant. Such fixed discrimination rates make several alternative
ordering criteria that combine discrimination rate and validity all
lead to the same cue order, namely, just validity again. Examples of
such criteria (see Martignon & Hoffrage, 1999; chapter 10) are suc-
cess, which is the proportion of correct discriminations that a cue
makes plus the proportion of correct decisions expected from guess-
ing on the nondiscriminated trials [i.e., success = v d + 0.5(1 − d),
•
(e.g., Läge, Hausmann, & Christen, 2005; Newell, Rakow, Weston, &
Shanks, 2004). But these and all the other studies on cue order use
remain silent about how any cue order could possibly be learned by
participants.
In sum, despite accumulating evidence for the use of one-reason
decision-making heuristics, the learning processes that underlie
people’s search through information when employing such heuris-
tics remain a mystery. Additionally, in most previous experimental
studies on the use of take-the-best, cue-order learning was at best
greatly simplified—if not totally obviated—by encouraging par-
ticipants to use cues in order of their validity either directly, by
informing them about cue validities or the validity order (Bröder,
2000a, Experiments 3 and 4; Bröder, 2003; Bröder & Schiffer, 2003b;
Newell & Shanks, 2003; Newell et al., 2003; Rieskamp & Hoffrage,
1999), or indirectly, through the presentation of graphs that depicted
cue validities (Bröder, 2000a, Experiments 1 and 2). Thus, to find
out how people construct and adjust cue search orders in unfamil-
iar task environments, we had to design a new experiment.
In our experimental setup, we carefully controlled what infor-
mation participants had access to from the beginning. First, as it is
the cue-order learning process we are mainly interested in, we did
not tell people what the cue validities were in our task. Second,
many of the existing experiments on take-the-best framed the task
as a choice between differentially profitable shares or stocks from
companies that were described on several cues indicative of their
profitability (Bröder, 2000a, Experiments 3 and 4; Bröder, 2003;
Newell & Shanks, 2003; Newell et al., 2003; Rieskamp & Hoffrage,
1999). Because of the potential existence of rather strong initial
preferences for certain cues in this familiar domain, we instead
created a task about a subject most people know very little about:
oil mining. Participants had to find out how cues differed in their
usefulness for making correct decisions about where to drill for
oil. And finally, to highlight the importance of searching for the
right information in the right order, participants had to pay for each
cue they wanted to consider in making their decision. Using this
setup, we aimed to find out how people build and adjust their cue
orders as a result of feedback over the course of several decisions,
and how well their final learned cue orders would perform.
Different types of cue orders are appropriate for different types of
environments; for instance, as mentioned above, in an environment
in which all cues have the same discrimination rate but different
validities, a validity-based ordering makes sense. To study how
environmental structure might influence the cue-ordering process,
we constructed three different environments, each consisting of
100 decision pairs that could be decided on the basis of five cues
about the two alternatives (locations to drill for oil) being compared.
SIMPLE RULES FOR ORDERING CUES IN ONE-REASON DECISION MAKING 291
Participants were further told that the tests differed in how reliable
they were (i.e., their validity) and in how often they discriminated
between sites (i.e., their discrimination rate). To facilitate memori-
zation, the stronger adjective (e.g., “big,” “strong,” “fast,” etc.) was
consistently used as the positive cue value, indicating more oil.
Before the actual decisions started, participants were asked to
rank the five tests according to how useful they thought the tests
were going to be in the experiment. This was done to be able to
check for effects of any preexisting ideas about cue orders. The def-
inition of the word “usefulness” was left open intentionally.
Participants had to choose between two new oil sites, X and Y,
based on the values of test cues that they chose to see. Cue values
were always revealed pairwise, that is, simultaneously for both
alternatives. Participants had to conduct at least one test (i.e., one
cue had to be selected and revealed). After a test had been con-
ducted, participants could either go on with testing or decide in
favor of one of the sites right away by clicking on the “X” or “Y”
button. As soon as a decision between the sites had been made
and entered, outcome feedback was given: Either a box appeared
displaying the word “correct” and the chosen alternative was
circled in green, or a box appeared that said “wrong” and the
chosen alternative was crossed out in red. Furthermore, a cumula-
tive account of the participant’s earnings in petros so far was
displayed on the screen throughout the decision phase, updated
with each cue purchase and correct decision. A screenshot is shown
in Figure 11-4.
Finally, after the 100 decisions had been completed, participants
were asked again to rank the tests according to their usefulness.
Depending on the order they entered, they could increase their
gains by up to 20,000 petros (i.e., E2). Participants were told about
this opportunity for extra reward at the beginning of the experi-
ment to additionally motivate cue-order learning. The actual payoff
was determined by computing the correlation between the par-
ticipants’ final stated rank order and the order that yielded the
highest payoff in the particular environment they experienced and
multiplying this correlation by 20,000 petros. Negative payoffs
were treated as zero.
fewer cues were bought on average (2.2) than in the low-cost condi-
tions (2.8), although even with this frugality, participants earned
less in the more challenging high-cost conditions. On the majority
of trials, search was stopped immediately after having found one
discriminating cue, as specified by one-reason decision mecha-
nisms such as take-the-best: The proportion of one-reason stopping
was substantially higher in the high-cost conditions (at 70%) com-
pared to the low cost conditions (51%) but did not differ between
environments. Participants made choices in accordance with
take-the-best’s decision rule, deciding in line with the first dis-
criminating cue they encountered, on 87% of the trials (including
cases where they went on searching beyond the first discriminating
cue). Both the stopping and decision patterns indicate the strong
impact of the first discriminating cue on the choice that was
ultimately made, and thus both also point to the importance of
the order in which cues are considered. What orders did people
end up using, and were they matched to the structure of the differ-
ent environments?
294 REDUNDANCY AND VARIABILITY IN THE WORLD
What Cue Orders Do People Use? As an indicator of the search rule partici-
pants actually used by the end of 100 decisions, we focus on the
cue-order-ranking participants explicitly stated after the decision
phase. First, we checked whether the initial explicit ranking par-
ticipants were asked for was reflected in the final explicit cue order.
The correlation between the first stated and last stated cue order
was on average low (mean r = .27). Participants did not even start to
search cues in the order they initially stated—the correlation
between this and the order in which participants initially looked
up cues was only r = –.05. The correlation between the last stated
cue order and the cue positions on the screen from left to right was
also low (mean r = .10). It can thus be concluded that neither initial
ideas about cue usefulness nor the order in which cues were dis-
played on the screen had a major impact on the search order that
participants used.
At a minimum, we expected participants’ final stated cue orders,
when used in one-reason decision making, to beat looking up
cues in random order. This is indeed the case for all environments
except the most challenging one that combined high cost and a
trade-off between validity and discrimination rate. The average
performance of each participant’s final stated cue order if applied
to all decision pairs they had seen, assuming one-reason stopping
and deciding, is summarized in Table 11-2.
Overall, the analysis of the general performance of the stated cue
orders supports the notion that many participants were able to
learn an adaptive search order. As a next step, we correlated par-
ticipants’ cue orders with four search orders previously proposed
in the literature—validity, discrimination rate, usefulness, and
success—to see if participants approached the expected order in
each environment. However, the average rank-order correlations
are quite low, and sometimes even negative. Only in the first envi-
ronment (VAL) where discrimination rate was kept constant—and
high—while validity varied were participants’ search orders moder-
ately correlated on average with the ecological validity order (mean
rho = .36 in the low-cost and .30 in the high-cost condition).
Of course, participants did not look up all cues on all trials, lim-
iting their ability to estimate orderings by ecological validity,
discrimination rate, success, and usefulness. Also, they checked
different cues unequally often. By taking these different base rates—
the frequency with which a cue has been checked—into account,
we computed the subjective validity (Gigerenzer & Goldstein, 1996),
discrimination rate, success, and usefulness experienced by each
participant as that person chose which cues (and hence cue values)
to observe during decision making. However, there were even
lower correlations (and overlaps) between these subjective mea-
sures and participants’ stated final cue orders.
SIMPLE RULES FOR ORDERING CUES IN ONE-REASON DECISION MAKING 295
Note. Values in brackets refer to numbers that are not expected to be different from
random cue order, because cue validity and discrimination rate were held constant
in the first and second environment, respectively. VAL: Environment with varying
validities and equal discrimination rates; DR: environment with varying discrimina-
tion rates and equal validities; VAL*DR: environment with varying validities and
discrimination rates.
How Do People Construct Their Cue Order? To get an idea of when and how
participants move cues around in their current cue order, we look
for changes in the order used from one decision trial to the next.
On any given trial t, we assume that participants have a current
cue order, which we infer in the following way: The cues used on
the present trial t are put in a list in the order in which they were
checked. Any missing cues (not checked in the present trial) are
added to the end of the list in order of most recent use, so, for
instance, if cue 4 was used on trial t–1 but cue 2 had not been used
since trial t–3, then cue 4 would be followed by cue 2 in the con-
structed order list. Then we look at the N cues used in trial t+1
and see if they are ordered differently from the first N cues in
the current cue order list. If so, we relate these cue order changes
to the cue values and decision outcome seen on trial t, update the
current assumed cue order for trial t+1, and proceed to consider
trial t+2.
The foremost pattern that emerges from this analysis is that cue
order usually does not change. On 60% of the trials across all par-
ticipants, no change in cue position was observed, regardless of
the previous decision outcome. Some participants did make many
more changes than others, though—the rate of cue-order change
ranged from 1% to 98% of trials for individuals. This is congruent
with a tendency of some participants to converge more quickly and
others less quickly to a particular cue order and then use it for the
remaining trials, mostly without further influence from feedback.
We will come back to this point below.
When cues are used in a different order, to what extent does their
direction of movement follow from their impact on the previous
trial? We only considered cues that were checked at the third posi-
tion in the search order, because for these, both upward and down-
ward movements are equally possible.
When a cue that was looked up at the third position discrimi-
nated and indicated a correct decision, it is 1.5 times more likely to
move up in the order (so it will be checked sooner) than to move
down. In other words, it moved up 28% of the time, stayed in place
54% of the time, and moved down 18% of the time. In contrast,
after wrong discriminations a cue is 1.4 times more likely to move
298 REDUNDANCY AND VARIABILITY IN THE WORLD
down. When the third cue is checked but does not discriminate, it
is also more likely (1.6 times) to move down.
How far do moving cues travel in the order? We again concen-
trated on cues that were checked at the third position in the search
order. We found that after correct discriminations, a step size of
+1 is the most frequent (besides a step size of 0, i.e., no movement),
at 17%. After nondiscriminations, a step size of −1 is most fre-
quently observed (21%) and the same holds for wrong discrimina-
tions (21%). Step sizes of +2 and −2 are observed rarely (in 8% and
6% of the cases, respectively, on average across correct, wrong, and
nondiscriminations).
These descriptive analyses provide initial hints that people
might respond to outcome feedback via adaptive changes to the cue
order, that is, moving cues up in the order after they make correct
discriminations, and down after wrong discriminations or after
they failed to discriminate. The finding that there is most often no
change in a cue’s position in the search order, regardless of what
kind of impact the cue had on the previous trial, potentially speaks
against the use of swapping and move-to-front rules and instead
supports rules that converge to (relatively) stable orders. Because
tally and tally swap rules count up correct decisions or discrimina-
tions across all decisions made so far, the relative impact of the
single current decision decreases over the course of the decision
phase, so that cues move less and cue orders become more stable
over trials. As a consequence, these rules might, as we predicted
based on our simulation results, fit behavior better than the simple
swap rule. In addition, the relatively high prevalence of step size +1
after correct discriminations and −1 after wrong discriminations
could be a hint that tally swap rules might fit behavior somewhat
better than complete-reordering tally rules. We find out if that is the
case by next testing the fit of particular learning rules to partici-
pants’ cue search data.
much, this could also explain why tally and tally swap rules based
on a count of correct decisions alone showed a higher average fit
than rules that count correct minus wrong decisions per cue.) In
future experiments, it can be tested whether the validity learning
rules achieve a higher fit when a wrong decision would entail a
loss of money just as a correct decision would involve a gain, and
accuracy would thus become even more important.
Third, evolution may have determined some environmental
domains in which it is important to learn valid cues. Such domains
will involve decision problems of adaptive relevance but where
there could also be environment-specific variation that requires
individual learning (as opposed to more stable environments
where knowledge of the cue order itself could be “built in” by evo-
lution). These could include food choice (where cues to what is
edible or poisonous could vary regionally and seasonally) and
avoidance of dangerous animals (where predator prevalence can
vary over space and time). For example, rhesus monkeys can
quickly learn to associate a snake-shape cue, even in a snake-shaped
toy, with a fear response, which then strongly supports the decision
not to approach an animal with that form (Cook & Mineka, 1989,
1990). Note that in these domains motivational and emotional
responses play a role, possibly making the cues more powerful or
even establishing a noncompensatory cue structure that allows
quick decisions based on little information—a quick and hence
adaptive design in high-risk decision domains. However, it is not
easy to tell whether the cues that are used in these decisions really
follow an order by validity, because validity for criteria relevant in
our evolutionary past can be difficult to determine in an objective
way in some of these cases. Moreover, validity may not be the prime
concern in these domains, but rather making quick decisions or
avoiding costly mistakes (Bullock & Todd, 1999).
Fourth, individuals could also learn a validity ordering from
others (or records created by others), in environments that enable
social learning or cultural transmission. In many cases people can
just look up indications of highly valid cues in books or on the
Internet or can directly ask experts. Especially in important and
high-stakes domains, it is likely that someone already has taken the
effort to compute validities based on large data sets, such as the
predictive accuracies of diverse diagnostic cues in medicine, or, as
in our experimental task, the validity of various potential indica-
tors of oil deposits (though such information is unlikely to be made
publicly available). However, for such important decisions, when
the decision maker will probably be held accountable and have to
justify the choice, people are less likely to engage in one-reason
decision making and more likely to gather additional information
before making a decision (Siegel-Jacobs & Yates, 1996; Tetlock &
SIMPLE RULES FOR ORDERING CUES IN ONE-REASON DECISION MAKING 305
Conclusions
Craig R. M. McKenzie
Valerie M. Chase
I magine that you have just moved to a desert town and are trying
to determine if the local weather forecaster can accurately predict
whether it will be sunny or rainy. The forecaster often predicts sun-
shine and rarely predicts rain. On one day, you observe that the
forecaster predicts sunshine and is correct. On another day, she
predicts rain and is correct. Which of these correct predictions
would leave you more convinced that the forecaster can accurately
predict the weather? According to a variety of information-theoretic
accounts, including Bayesian statistics, the more informative of the
two observations is the correct prediction of rain (Horwich, 1982;
Howson & Urbach, 1989). As we show in more detail later, this is
because a correct prediction of sunshine is not surprising in the
desert, where it is sunny almost every day. That is, even if the
forecaster knew only that the desert is sunny, you would expect her
to make lots of correct predictions of sunshine just by chance.
Because rainy days are rare in the desert, a correct prediction of
rain is less likely to occur by chance and therefore provides stron-
ger evidence that the forecaster can distinguish between future
sunny and rainy days. The same reasoning applies to incorrect pre-
dictions: Those that are least likely to occur by chance alone are
most informative with respect to the forecaster’s (in)accuracy.
In short, rarity is valuable. Whether your expectations are
confirmed or violated as a result, observing a rare conjunction of
events is more revealing than observing a common one. Trying to
assess the desert forecaster’s accuracy by checking the weather only
after she predicts sunshine would be like looking for the proverbial
309
310 RARITY AND SKEWNESS IN THE WORLD
H0: Prediction and Event Independent (ρ = 0) H1: Prediction and Event Dependent (ρ = .5)
Event Event
Prediction
A B A B
C D C D
Sun 0.09 0.81 0.9 Sun 0.045 0.855 0.9
)
in
in
0.6 0.6
ra
ra
0.0 0.0
0.4 0.4
ict
ict
1.0 0.8 1.0 0.8
ed
ed
0.2 0.2
0.6 0.6
pr
pr
0.4 0.0 0.4 0.0
p(
p(
0.2 0.0 0.2 0.0
p (rain) p (rain)
Cell C Informativeness (LLR)
)
in
in
0.6 0.6
ra
0.0 0.0
ra
0.4
ict
0.4
ict
1.0 0.8 1.0 0.8
ed
0.2
ed
0.6 0.4 0.6 0.4 0.2
pr
pr
0.2 0.0 0.0 0.2 0.0 0.0
p(
p(
p (rain) p (rain)
Covariation Assessment
Note. Sample columns indicate number of fictional people in each sample with
indicated factors present or absent. Participants considered the sample in which
the large frequency corresponded to Cell A (rather than Cell D) to provide the
strongest evidence of a relationship—except in Condition 4, where participants
knew that Cell A observations were common. In that condition, participants
considered the large Cell D sample to provide the strongest support.
318
WHY RARE THINGS ARE PRECIOUS 319
These concrete participants were told that they had access to the
records of former students in order to find out if there was a rela-
tionship between students’ emotional status and high school out-
come. Half of these participants were told that each record listed
whether the student was emotionally disturbed (yes or no) and
whether the student dropped out (yes or no). Thus, the presence
(i.e., the “yes” level) of each variable was rare, making a Cell A
observation rarer than a Cell D observation. When presented with
the two samples of nine observations (see Condition 3 in Table
12-1), one with many Cell A observations and one with many Cell
D observations, the Bayesian account predicts the same results that
have been found in earlier covariation studies, including the ones
reported above: Because presence is rare in this condition, partici-
pants should find the large Cell A sample as providing stronger
evidence of a relationship between emotional health and high
school outcome. Indeed, this is what McKenzie and Mikkelsen
(2007) found: Table 12-1 shows that more than 70% of participants
selected the large Cell A sample.
The key condition was the one remaining: Some participants
were presented with the same concrete scenario but simply had the
labeling reversed, just as in the abstract condition (see Condition 4
in Table 12-1). Rather than indicating whether each student was
emotionally disturbed and dropped out, the records indicated
whether each was emotionally healthy (yes or no) and whether each
graduated (yes or no). Thus, the absence of each of these variables
was rare, making Cell A more common than Cell D. The Bayesian
perspective leads to a prediction for this condition that is the oppo-
site of all previous covariation findings: Participants will find Cell D
information most informative. McKenzie and Mikkelsen (2007) again
found that the results were consistent with the Bayesian account. As
shown in Table 12-1, only 33% of these participants selected the
sample with the large Cell A frequency as providing stronger sup-
port; that is, most found the large Cell D sample most supportive.
This is the first demonstration of such a reversal of which we
are aware. The results provide strong evidence for the hypothesis
that the robust Cell A bias demonstrated over the past four decades
stems from (a) participants’ ecological approach to the task (consis-
tent with the Bayesian perspective), and (b) their default assump-
tion (perhaps implicit) that presence is rare. When there is good
reason to believe that absence is rare, Cell D is deemed more
informative, just as the Bayesian approach predicts. Note that the
behavior of both the concrete and the abstract groups is explained
in terms of their sensitivity to rarity: The former exploited real-
world knowledge about which observations were rare, and the
latter exploited knowledge about how labeling indicates what is
(usually) rare (see also McKenzie, 2004a).
322 RARITY AND SKEWNESS IN THE WORLD
100
Participants Choosing Rare Observation (%)
Rare Observation Mentioned in Hypothesis
90 Common Observation Mentioned in Hypothesis
80
70
60
50
40
30
20
10
0
Abstract Abstract + Concrete Concrete +
Statistics Statistics
Indeed, one can make normative sense out of the default strategy
if, when testing X1 → Y1, X1 and Y1 (the mentioned events) are
assumed to be rare relative to X2 and Y2 (the unmentioned events).
If this were so, then the mentioned confirming observation
would be normatively more informative than the unmentioned
confirming observation. In other words, it would be adaptive to
treat mentioned observations as most informative if hypotheses
tend to be phrased in terms of rare events. Do laypeople tend to
phrase conditional hypotheses in terms of rare events?
Consider the following scenario: A prestigious college receives
many applications but admits few applicants. Listed in Table 12-2
is information regarding five high school seniors who applied last
year. Next to each applicant is a rating from the college in five
categories. In each category, one candidate was rated “high” and
the other four were rated “low.” On the far right is shown that only
one of the five candidates was accepted. Given the information,
how would you complete the statement: “If applicants ________,
then ________”?
You probably noticed that only SAT scores correlate perfectly
with whether the applicants were rejected or accepted. Importantly,
however, a choice still remains as to how to complete the state-
ment. You could write, “If applicants have high SAT scores, then
they will be accepted” or “If applicants have low SAT scores, then
they will be rejected.” Both are accurate, but the former phrasing
targets the rare events, and the latter targets the common ones.
McKenzie, Ferreira, Mikkelsen, McDermott, and Skrable (2001)
presented such a task to participants, and 88% filled in the condi-
tional with, “If applicants have high SAT scores, then they will be
accepted”—that is, they mentioned the rare rather than the common
events. Another group was presented with the same task, but the
college was said to be a local one that did not receive many appli-
cations and admitted most applicants. “Accepted” and “rejected”
H0: Cause and Effect Independent (ρ = 0) H1: Cause and Effect Dependent (ρ = .1)
Effect? Effect?
Yes No Yes No
Cause?
A B A B
C D C D
No 0.009 0.891 0.9 No 0.006 0.894 0.9
cause test =
⎛ p (A |H1 ∩ cause test) ⎞
p (A |cause test ) log2 ⎜
⎝ p ( | H0 ∩ cause test)⎟⎠
⎛ p ( |H1 ∩ cause test)
t⎞
+ p ( |cause test ) log2 ⎜
⎝ p(B| H0 ∩ cause test)⎟⎠
task similar to those used by Chase (1999), for example, Green and
Over (2000) asked participants to test the hypothesis that drinking
from a particular well causes cholera. Participants could choose
one or more of all four tests: the cause test, the effect test, the not-
cause test, and the not-effect test. The probabilities of people’s
drinking from the well and having cholera were manipulated
between participants with the verbal labels “most” and “few” (e.g.,
“Most people drink from the well”). Consistent with the evidence
already reviewed, Green and Over found that participants’ likeli-
hood of choosing a test increased with the test’s expected informa-
tiveness. Taken together, the results indicate that people are
sensitive to rarity not only when making inferences on the basis of
known data, but also when deciding what data to seek.
Conclusion
Torsten Reimer
Ulrich Hoffrage
335
336 RARITY AND SKEWNESS IN THE WORLD
There are several ways that one can extend the approach of fast
and frugal heuristics to a group context (Todd & Gigerenzer, 1999).
ECOLOGICAL RATIONALITY FOR TEAMS AND COMMITTEES 339
decision
Vi1k Vi2k Vijk
M k rs
Ai Vi11 Vi 21 Vij1 be
em
M2 m
C1 C2 Cj M 1 oup
r
G
Cues
V1,1,4 … … … V1,20,4
V1,1,3 … … … V1,20,3
V1,1,2 … … … V1,20,2
V2,1,4 … … … V2,20,4
A1 V1,1,1 … … … V…
V2,1,3 … …1,20,1 V2,20,3
Alternatives
V2,1,2 … … … V2,20,2
V3,1,4 … … … V3,20,4
A2 V2,1,1 … … … V2,20,1 Group
V3,1,3 … … … V3,20,3
decision
be 4
V3,1,2 … … … V3,20,2
em M
rs
m 3
up 2 M
A3 V3,1,1 … … … V3,20,1
G 1 M
si M 4
de M 3 A
M
s
C1 C2 … … … C20
on
ro
A
ci
Cues
du 2
vi AM
al
In M1
A
di
Figure 13-2: The social-combination approach to group decision
making via the majority rule. Group members (here, M1 to M4) decide
among alternatives (A1 to A3) that are described by the values (V)
these alternatives have on cues (C1 to C20). First, each group member
makes an individual decision, and then the group integrates the
individual choices using the majority rule.
coded such that the cues were positively correlated with the crite-
rion, that is, a positive cue value indicated a higher criterion value
than a negative cue value did. The cube displayed in Figure 13-2
represents the knowledge a group might have about a given triplet
of candidates. It can be cut into four slices, where each slice repre-
sents a member’s knowledge of candidates’ cue values.
Compensatory strategies
Tallying (alternatively, the unit weight model or “Dawes’s rule”) sums up the (equally
weighted) cue values of each candidate and chooses the candidate with the highest sum
score.
WADD (the weighted additive model or “Franklin’s rule”) proceeds like tallying, except
that cue values are weighted (multiplied) by their Goodman–Kruskal validities before
they are summed.
Noncompensatory heuristics
Minimalist looks up a randomly chosen cue. If one candidate has a positive value and
the remaining two candidates have negative values on this cue then information
search stops and the candidate with the positive value is chosen. If two candidates
have positive values and one negative, the latter is excluded from the choice set and
new cues are randomly drawn until one is found that discriminates between the two
remaining candidates, and the one with the positive value is chosen. If all cues have
been looked up and there is still more than one candidate left, choice between the
remaining candidates is made randomly.
Social-combination rule
The majority rule chooses the candidate with the most votes. If a tie arises because
two candidates are each favored by two members, one of these is chosen at random.
is a tie with respect to the number of votes (two for one candidate
and two for another), then randomly choose the decision of one
group member.
Validity .9
.8
.7
.6
.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Rank in Cue Hierarchy
1
L-high (M=0.80, SD=0.06)
.9 L-low (M=0.60, SD=0.06)
J-flat (M=0.56, SD=0.09)
J-steep (M=0.60, SD=0.12)
.8
Validity
.7
.6
.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Rank in Cue Hierarchy
100
Tallying
WADD
80
Accuracy (% Correct)
Minimalist
Take-the-best
60
40
y=33.3
20
0
L-high L-low J-flat J-steep
Distribution of Cue Validities
Simulation 1: Does the Distribution of Cue Validities Matter When All Group
Members Are Omniscient?
Accuracy of the decision strategies was determined by producing
all possible triplets of candidates in each of the four environments
and by counting how often the simulated groups made a correct
decision (see Reimer & Hoffrage, 2006, for details). As expected, the
performance of the strategies depended on the environment in
which they were evaluated—see Figure 13-5. Except in the linear-
high condition, in which cues had a much higher validity on aver-
age than in the other three environments (see Figure 13-4), the
strategies that took cue validity into account had a higher accuracy
than the corresponding strategies that ignored cue validities:
Among the linear strategies, the WADD strategy outperformed tally-
ing, and among the limited-search heuristics, take-the-best outper-
formed minimalist in these three environments. Thus, overall,
assigning higher weights to better cues paid off unless most cues
had a relatively high validity.
The comparison between the compensatory strategies (tallying,
WADD) and their corresponding noncompensatory heuristics (min-
imalist, take-the-best) revealed an interesting pattern. For the strat-
egies that ignored cue weights or cue validities, we observed that
tallying outperformed minimalist in each environment. In contrast,
348 RARITY AND SKEWNESS IN THE WORLD
Simulation 2: Does the Amount of Missing Information Matter When the Individual
Group Members Have Incomplete Knowledge?
In the second simulation, we allowed missing information by deter-
mining randomly which group member had access to which cue
values. Importantly, the information that was known to the groups
as a whole was always held constant: Each cue value for each
alternative was known to at least one group member in every case.
This manipulation created heterogeneous groups, in which mem-
bers had different knowledge, thus capturing an important aspect
of group decision making. Introducing missing information and
thereby individual differences provides an interesting test of the
strategies’ ecological rationality: To what extent does their perfor-
mance depend on the amount of missing information and, con-
versely, shared knowledge? Intuitively, one would expect that
group performance declines when less information is available,
and, as a consequence, when less information is shared. But would
all strategies suffer to the same degree, and how is their loss in per-
formance modified by the information structure of the environment
in which the strategies are tested? In other words, are the differ-
ences between the strategies and their dependence on environmen-
tal structures that we observed in simulation 1 robust across
different percentages of missing information?
We tested two conditions: In the first, each of the four members
received 15 (25%) of the 60 cue values, and thus no single piece of
information was shared by group members. In the second condi-
tion, each group member knew 30 (50%) of the 60 cue values,
and thus a given piece of information was shared, on average, by
two group members. In addition, we compared the results of the
respective conditions from simulation 1, in which all members
ECOLOGICAL RATIONALITY FOR TEAMS AND COMMITTEES 349
knew all 60 cue values (100%) and each piece of information was
shared by all four members (Reimer & Hoffrage, 2006).
As indicated in Table 13-1, missing information somewhat
impaired group performance, but much less than we had expected.
On average, across all decision strategies and environments, the
simulated groups performed 6 percentage points worse if each
member knew only 25% of the information and thus no cue value
was shared, as compared to the full knowledge condition of simula-
tion 1. Such a drop in performance may matter; however, compared
to studies on the hidden-profile effect in which information is dis-
tributed in a biased way and in which, as a consequence, up to 80%
of groups change their decisions when members do not have access
to all available information, the drop we observed seems surpris-
ingly small (Reimer, Hoffrage, & Katsikopoulos, 2007; Stasser &
Titus, 1985). If one considers that a reduction of accessible cue
values from 100% to 25%, that is, by 75 percentage points, leads to
a reduction in performance of only 6 percentage points, the strate-
gies’ performances appear to be quite robust against this manipula-
tion. The relationships between the decision strategies for the four
distributions of cue validities reported in the first simulation also
remained relatively stable and robust across different amounts of
350 RARITY AND SKEWNESS IN THE WORLD
Note. (A) All cues equally likely to be known; (B) More information known on
10 most valid cues; (C) 3 group members share all values on 10 most valid cues;
(D) 3 group members share all values on 3 most valid cues. See text for details.
has a hidden profile in this way, with the information about this
alternative being unshared, no single group member is likely to
infer that this is the best choice. As a consequence, when members
integrate their individual opinions on the basis of a majority rule,
this tendency to miss the hidden best alternative will be accentu-
ated and groups will be even less accurate than the average indi-
vidual. Simulation studies as well as empirical studies indicate
that groups are better off when they use a communication-based
strategy in such a situation (Reimer & Hoffrage, 2005; Reimer,
Kuendig, Hoffrage, Park, & Hinsz, 2007; Reimer, Reimer, & Hinsz,
2010).
Even though we held the social combination rule constant in our
investigations, some of our results on the individual group mem-
bers’ accuracies are relevant to the question of when groups should
use the majority rule and when they should use another combina-
tion rule. The basic insight is that a cue is for the individual member
The trees we will introduce are naïve, fast, and frugal. By “naïve”
we mean that the trees ignore conditional dependencies between
cues when ordering the cues. “Frugal” means that the trees do
not necessarily use all cues; as a result, these trees are also “fast”
(a more precise definition of these terms will be given later). The
trees implement one-reason classification in analogy with heuris-
tics for one-reason decision making (Gigerenzer & Goldstein,
1999). In sum, the trees are naïve in construction and fast and
frugal in execution, and we will show that their accuracy for clas-
sification is surprisingly high in both fitting and predictive general-
ization, when compared to far more complex standard models.
In this chapter, we will first discuss the problem of classification
and how trees can be used to solve it. We will then discuss
how naïve fast and frugal trees can be constructed and we will com-
pare them analytically with more complex classification models.
Finally, we will test the ecological rationality of different types of
trees in a simulation across a wide range of environments, to see
when each type works well.
ST segment elevated?
yes no
yes no
yes no
Figure 14-2: Green and Mehr’s (1997) fast and frugal tree for
classifying patients as having a high or a low risk of heart disease.
0
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Proportion Incorrectly Assigned
(False-Positive Rate)
yes no
Prescribe
No macrolides
macrolides
Figure 14-4: Fischer et al.’s (2002) fast and frugal tree for deciding
whether to prescribe antibiotic treatment to young children suffer-
ing from community acquired pneumonia.
The tree in Figure 14-2 is fast and frugal: First, the ST segment
is checked and if it is elevated a classification of the patient’s
condition is immediately made and the tree is exited with a specific
action (CCU) based on that cue’s value; but if the ST segment is not
elevated, next chest pain is checked, which again provides the
opportunity to make a classification and exit; and finally other
symptoms are checked only if necessary, and appropriate exit and
action is taken at that point. If a second question were asked for
patients with elevated ST segment, rather than exiting immediately
at that point, then the tree would not be fast and frugal.
The labels fast and frugal have precise meanings (Gigerenzer &
Goldstein, 1996): The frugality of a tree for classifying a set of
objects is the mean number of cue values (across objects) it uses for
making a classification and hence reaching a decision—fewer cues
used means greater frugality. The speed of a tree is the mean number
of basic operations—arithmetic and logical—needed for making a
classification. It is clear that, with these definitions, replacing ques-
tion nodes with exit nodes makes a tree faster and more frugal. Fast
and frugal trees are also “minimal” among those constructed for a
given set of cues, with the fewest number of question nodes made
using the cues when tested one at a time.
Figure 14-5 illustrates how the fast and frugal tree of Green and
Mehr (1997) is related to the natural frequency tree (see also
368 RARITY AND SKEWNESS IN THE WORLD
89
ST=1 ST=0
33 56
CP= 1 CP=0 CP=1 CP= 0
23 10 27 29
21 2 8 2 17 10 19 10
9 12 1 1 3 5 0 2 2 15 0 10 0 19 0 10
Figure 14-5: The natural frequency tree of the Green and Mehr
(1997) study. Shaded nodes correspond to people with infarctions
(and nonshaded nodes to people without). Bold branches outline
the fast and frugal tree of Figure 14-2. The bold vertical bar corre-
sponds to the split of cue profiles, according to the fast and frugal
tree, into those with high risk of heart disease (left of bar) and those
with low risk of heart disease (right of bar).
Result 1: For every fast and frugal tree f there exists a unique
cue profile S(f)—called the tree’s splitting profile—so that
f assigns x to C1 if and only if x >l S(f). For every cue profile S
there exists a unique fast and frugal tree f, such that S(f ) = S.
3. Another model for classification, RULEX, has also been linked to one-
reason heuristics: “We find the parallels between RULEX and these one
reason decision-making algorithms to be striking. Both models suggest that
human observers may place primary reliance on information from single
dimensions” (Nosofsky & Palmeri, 1998, p. 366).
NAÏVE, FAST, AND FRUGAL TREES FOR CLASSIFICATION 371
Fast and frugal trees are also connected to linear models. In linear
models for classification, each cue ci has a weight wi > 0 and for
n
each cue profile x = [x1, x2, . . ., xn], the score R ( x ) = ∑x w
i =1
i i
is
Result 2: For every fast and frugal tree f there exist h > 0 and
wi > 0 where
w i > ∑ w k for i = 1, 2, …, n − 1
k i
so that f makes identical classifications with the linear model
with weights wi and threshold h. For every linear model with
weights wi > 0 so that
w i > ∑ w k for i = 1, 2, …, n − 1
k i
372 RARITY AND SKEWNESS IN THE WORLD
and a threshold h > 0, there exists a fast and frugal tree f that
makes identical categorizations.
For example, the Green and Mehr (1997) tree in Figure 14-2 makes
identical classifications with the linear model with R(x) = 4x1 + 2x2 +
x3 and h = 2 (they both assign [0, 0, 0], [0, 0, 1] and [0, 1, 0] to C0 and
all other cue profiles to C1).
Linear models with w i > ∑ w k are called noncompensatory (Ein-
k i
horn, 1970; Martignon & Hoffrage, 2002). Result 2 says that fast and
frugal trees are equivalent to noncompensatory linear models in the
sense that the two make the same classifications. Note, however,
that result 2 does not imply that it is impossible to distinguish
empirically between fast and frugal trees and noncompensatory
linear models. The process predictions of fast and frugal trees,
including ordered and limited information search, are distinct from
those of linear models, which use all available information in no
specified order.
To summarize our results so far, fast and frugal trees are a simple,
heuristic way of implementing one-reason classification. They can
be represented as lexicographic classifiers with a fixed splitting
profile or as noncompensatory linear models for classification.
Also, the trees form a family of transparent models for performing
classification with limited information, time, and computation. But
how can we build accurate fast and frugal trees? Our next step is to
present construction rules for ordering cues in fast and frugal trees.
But MaxVal can run into problems: If for every cue the positive
validity is higher than the negative validity (or vice versa), then the
resulting tree will be a rake, and if in addition the number of
cues is high, this means that nearly all objects will be classified as
belonging to the same category. To avoid such possible extreme
cases, we also consider a construction rule that strikes a balance
between the categories that objects are classified into.
The alternative approach, called Zig-Zag, produces trees that
follow a zig-zag pattern—the direction of the exit nodes alternate
between positive and negative classifications, and correspondingly
the cue with the greatest positive or greatest negative validity will
be chosen at each step. This procedure is implemented starting
at the top level and proceeding downward until the last remaining
cue is assigned to the last level with exits for both categories. If
the distribution of objects according to the two criterion values is
more or less even, as in the Green and Mehr data, this procedure
seems both natural and reasonable. If the distribution of objects is
extremely uneven, a couple of extra steps may be incorporated by
the Zig-Zag rule to even out the asymmetries (or, in the jargon of
data mining, to “tame the distribution”; see Martignon et al., 2008,
for the technical details).
In sum, both the MaxVal and Zig-Zag ranking procedures
ignore conditional dependencies of cues given outcome values,
and both base their rankings simply on positive and negative cue
validities (and for the Zig-Zag method, possibly also on the relative
size of the object classes).
As an application, consider the Green and Mehr data in Figure 14-5.
Using the formulas in Table 14-1, the positive and negative validities
for the three cues, ST, CP, and OC, are as shown in Table 14-2. Given
these validities, MaxVal creates a rake, while Zig-Zag builds the
zig-zag tree proposed by Green and Mehr (1997; see Figure 14-2).
Table 14-2: Positive and Negative Validities of the Three Cues for
the Green and Mehr (1997) Dataset (Shown in Figure 14-5)
Cue ST CP OC
Positive validity .39 .24 .22
Negative validity .96 .92 .93
Note. ST: Elevated ST segment; CP: chest pain; OC: other conditions
376 RARITY AND SKEWNESS IN THE WORLD
the other chapters in this book attest. How about fast and frugal
trees? In this section we use computer simulations to test their
accuracy and robustness in various environments, compared to
other classification models, which we describe next.
Logistic regression is a typical statistical regression model for
binomially distributed dependent variables. It is a generalized
linear model that classifies by means of comparing a weighted sum
of cue values with a fixed threshold. Logistic regression is exten-
sively applied in the medical and social sciences.
CART (Breiman et al., 1993) builds classification and regression
trees for predicting numerical dependent variables (regression) or
categorical predictor variables (classification). The shape of the
trees it constructs is determined by a collection of rules that are
selected based on how well they can differentiate observations, in
the sense of information gain. No further rules are applied when
CART establishes that no further information gain can be made.
CART shares common features with fast and frugal trees because
its strict rules for construction lead, in general, to trees that have
fewer nodes than the natural frequency tree. Yet, the construction
of CART trees is computationally intense because information gain
has to be assessed conditionally on previous rule applications.
We tested logistic regression and CART against our MaxVal and
Zig-Zag tree construction methods on 30 datasets, mostly from the
UC Irvine Machine Learning Repository (Asuncion & Newman,
2007). We included very different problem domains (from medi-
cine to sports to economics), with widely varying numbers of
objects (from 50 to 4,052) and cues (from 4 to 69). The accuracy of
each model was evaluated in four cases—fitting all the data, and
generalizing to new data (prediction) when trained on 15%, 50%,
or 90% of all objects (for estimating model parameters)—and tested
on the remaining objects (see Figure 14-6).
We also compared the models on a restricted set of just 11 of the
30 data sets from Figure 14-6 that were from medical domains,
which could better match the conditions discussed earlier for
batch learning situations. The performance of the naïve fast and
frugal trees on these medical data sets was markedly better than on
the 30 data sets overall, as shown in Figure 14-7.
As was to be expected, the more complex models (logistic regres-
sion and CART) were the best performers in fitting. But a good
classification strategy needs to generalize to predict unknown
cases, not (just) explain the past by hindsight. In prediction, the
simple trees built by Zig-Zag match or come close to the accuracy
of CART and logistic regression while MaxVal lags a few percentage
points behind. CART appears to overfit the data, losing 17 percent-
age points of accuracy from the fitting to the 15% training situation
in the 30 data sets of Figure 14-6. In the 11 medical data sets of
100
CART
60
50
Fitting Prediction Prediction Prediction
from 90% from 50% from 15%
Training Set Training Set Training Set
100
CART
Percentage of Correct Inferences
91 Logistic Regression
90 Zig-Zag
MaxVal
82
80 78 79
78
75 76 76
74 74 74 74 74
72 72 72
70
60
50
Fitting Prediction Prediction Prediction
from 90% from 50% from 15%
Training Set Training Set Training Set
Conclusion
379
380 RARITY AND SKEWNESS IN THE WORLD
50
40
30
20
10
0
1 100 200 300 400 500 600 700 800 900 1000 1100 1200
Rank Order According to Fortune
(b) 12
Fortune (in log Billion $)
11
10
9
0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
Rank Order According to Fortune (log)
381
382 RARITY AND SKEWNESS IN THE WORLD
Fermi was awaiting the same detonation from a few thousand yards
away. As he sheltered behind a low blast-wall, he tore up sheets of
paper into little pieces, which he tossed into the air when he saw
the flash. After the shock wave passed, he paced off the distance
traveled by the paper shreds, performed a quick back-of-the-enve-
lope calculation, and arrived at an approximately accurate figure
for the explosive yield of the bomb (Logan, 1996). For Fermi, one of
the most important skills a physicist ought to have is the ability
to quickly derive estimates of diverse quantities. He was so con-
vinced of its importance that he used to challenge his students with
problems requiring such estimates—the fabled canonical Fermi
problem was the question: “How many piano tuners are there in
Chicago?”
Being able to make a rough estimate quickly is important not
only for solving odd Fermi problems. There is ample opportunity
and need for people to rely on quick and easy estimates while
navigating through daily life (e.g., how long will it take to get
through this checkout line?). How do people arrive at quick quanti-
tative estimates? For instance, how do they swiftly estimate the
population size of Chicago—a likely first step toward an estimate of
the number of piano tuners in Chicago? Previously, we have argued
that cognitive estimation strategies, specifically, the QuickEst heu-
ristic, may have evolved to exploit the predictable imbalance of
real-world domains so as to reduce the computational effort and
informational demands needed to come up with competitively
accurate estimates (Hertwig, Hoffrage, & Martignon, 1999). In this
chapter, we analyze the ecological rationality of this heuristic in
more precise terms: First, we quantify the degree of imbalance
across a total of 20 real-world domains using the parameter q, the
slope of the straight line fitting the log–log rank–size distribution.
Second, we analyze to what extent this degree of imbalance and
other statistical properties of those environments hinder or foster
the accuracy of the QuickEst heuristic. Before we turn to this analy-
sis, we describe QuickEst in more detail.
The QuickEst heuristic is a model of quantitative inferences
from memory (Gigerenzer & Goldstein, 1996; Gigerenzer, Hoffrage,
& Goldstein, 2008), that is, inferences based on cue information
retrieved from memory. It estimates quantities, such as the size
of Chicago or the number of medals that Russia won at the most
recent Olympic summer games. In general, it estimates the value of
an item a, an element of a set of N alternatives (e.g., objects, people,
events), on a quantitative criterion dimension (e.g., size, age,
frequency). The heuristic’s estimates are based on M binary cues
(1, 2, . . ., i, . . ., M), where the cue values are coded such that 0
and 1 tend to indicate lower and higher criterion values, respec-
tively. As an illustration, consider the reasoning of a job candidate
who is subjected to a brainteaser interview by a company recruiter.
HOW ESTIMATION CAN BENEFIT FROM AN IMBALANCED WORLD 385
3. When the heuristic is initially set up, only as many cues (of all those
available) will be used in the cue order as are necessary to estimate the cri-
terion of four-fifths of the objects in the training set. The remaining one-fifth
of the objects will be put in the catchall category.
4. By building in spontaneous numbers, the heuristic models the obser-
vation that when asked for quantitative estimates (e.g., the number of wind-
mills in Germany), people provide relatively coarse-grained estimates (e.g.,
30,000, i.e., 3 × 104, rather than 27,634). Albers (2001) defined spontaneous
numbers as numbers of the form a × 10i, where a ∈ {1, 1.5, 2, 3, 5, 7} and
i is a natural number.
386 RARITY AND SKEWNESS IN THE WORLD
the basis of the training set. The test set, in turn, provided the
test bed for the strategies’ robustness. The training samples con-
sisted of 10%, 20%, . . ., 90%, and 100% of the 82 cities, comprising
their population sizes and their values on eight cues indicative of
population size. The test set encompassed the complete environ-
ment of 82 cities. That is, the test set included all cities in the
respective training set, thereby providing an even harder test for
QuickEst, because parameter-fitting models like multiple regres-
sion are likely to do relatively better when tested on objects they
were fitted to.
In the environment of German cities, QuickEst, on average, con-
sidered only 2.3 cues per estimate as opposed to 7.3 cues used by
multiple regression and 7.1 (out of 8) used by the estimation tree.
Despite relying on only about a third of the cues used by the other
strategies, QuickEst nonetheless exceeded the performance of
multiple regression and the estimation tree when the strategies
had to rely on quite limited knowledge, with training sets ranging
between 10% and 40%. The 10% training set exemplified the
most pronounced scarcity of information. Faced with such dire
conditions, QuickEst’s estimates in the test set were off by an aver-
age of about 132,000 inhabitants, about half the size of the average
German city in the constructed environment. Multiple regression
and the estimation tree, in contrast, erred on average by about
303,000 and 153,000 inhabitants, respectively.
When 50% or more of the cities were first learned by the strate-
gies, multiple regression began to outperform QuickEst. The edge in
performance, however, was small. To illustrate, when all cities were
known, the estimation errors of multiple regression and QuickEst
were 93,000 and 103,000 respectively, whereas the estimation tree
did considerably better (65,000).5 Based on these results, Hertwig
et al. (1999) concluded that QuickEst is a psychologically plausible
estimation heuristic, achieving a high level of performance under
the realistic circumstances of limited learning and cue use.
5. In fact, when the training set (100%) equals the generalization set, the
estimation tree achieves the optimal performance. Specifically, the optimal
solution is to memorize all cue profiles and collapse cities with the same
profile into the same size category. In statistics, this optimal solution is
known as true regression. Under the circumstances of complete knowl-
edge, the estimation tree is tantamount to true regression.
388 RARITY AND SKEWNESS IN THE WORLD
5
Estimation Tree
4 Multiple Regression
QuickEst
3
0
10 20 30 40 50 60 70 80 90 100
Size of Training Set (%)
of the size of the training set. Across all environments, 7.7 cues,
on average, are available. QuickEst considers, on average, only
two cues (i.e., 26%) per estimate—a figure that remains relatively
stable across various sizes of training set size. In contrast, multiple
regression (which here uses only those cues whose beta weights
are significantly different from zero) and the estimation tree use
more and more cues with increasing training sets. Across all train-
ing set sizes, they use an average of 5.1 (67%) and 5.9 (77%) of all
available cues, respectively.
700
Error (%; Standardized Within Strategy)
Estimation Tree
600 Multiple Regression
QuickEst
500
400
300
200
100
0
10 20 30 40 50 60 70 80 90 100
Size of Training Set (%)
200
Estimation Tree
175 Multiple Regression
Error (%; Standardized With
QuickEst
150
Respect to QuickEst)
125
100
75
50
25
0
10 20 30 40 50 60 70 80 90 100
Size of Training Set (%)
which is the ratio of the standard deviation (SD) of the set of object
criterion values to its mean value.
The next property, skewness, captures how asymmetric or
imbalanced a distribution is, for instance, how much of a “tail” it
has to one side or the other. Skewness can be measured in terms of
the parameter q, estimated with the following method (Levy &
Solomon, 1997): We sort and rank the objects in each environment
according to their criterion values, and fit a straight line to each
rank–size distribution (plotted on log–log axes). We then use the
slope q of this fitted regression line as an estimate of the environ-
ment’s skewness.
The final property in our analysis is the object-to-cue ratio (i.e.,
the ratio between the number of objects and number of cues in an
environment), which has been found to be important in the analy-
sis of inferential heuristics such as take-the-best (see Czerlinski
et al., 1999; Hogarth & Karelaia, 2005a). To assess the relationship
between the statistical properties of the environments and the
differences in the strategies’ performance, we first describe the
results regarding skewness for two environments in detail, before
considering all 20 environments.
Two Distinct Environments: U.S. Fuel Consumption and Oxygen in Dairy Waste
Does an environment that exhibits predictable imbalance, or
skew, such that few objects have large criterion values and most
394 RARITY AND SKEWNESS IN THE WORLD
4
Criterion Value (log)
2
Fuel Consumption
Oxygen
1
0
0 0.5 1.0 1.5 2.0
Rank (log)
QuickEst
Respect to QuickEst)
400
300
200
100
0
10 20 30 40 50 60 70 80 90 100
Size of Training Set (%)
Multiple Regression
Respect to QuickEst)
100
75
50
25
0
10 20 30 40 50 60 70 80 90 100
Size of Training Set (%)
395
396 RARITY AND SKEWNESS IN THE WORLD
75 Body Fat
Oxidant
Fuel Consumption
Homelessness
0
House Price
Rainfall
Biodiversity
Mammals’ Sleep
Oxygen
−75
−1.8 −1.6 −1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0
Environmental Skewness (q)
8. We are grateful to Bettina von Helversen and Jörg Rieskamp for their
valuable input on the following sections.
HOW ESTIMATION CAN BENEFIT FROM AN IMBALANCED WORLD 401
the mapping heuristic predicted best when the criterion was uni-
formly distributed. In addition, the mapping heuristic performed
better than the regression model in both types of environments
and thus was less dependent on the distribution of the criterion
than QuickEst.
Conclusion
Will M. Bennis
Konstantinos V. Katsikopoulos
Daniel G. Goldstein
Anja Dieckmann
Nathan Berg
409
410 DESIGNING THE WORLD
1. The law is the Gesetz über die Spende, Entnahme und Übertragung
von Organen, BGBI 1997, Article 2631. A German government website
(www.organspende-kampagne.de/) provides an official form that one
can use for the purpose of changing donor status. The official form is not
required, however, nor any formal registration. In some cases where rela-
tives have been clearly informed of the individual’s wish to become an
organ donor should the occasion arise, verbal consent may even substitute
for written consent.
DESIGNED TO FIT MINDS: INSTITUTIONS AND ECOLOGICAL RATIONALITY 411
Germany
Netherlands
United Kingdom
Austria
Belgium
France
Hungary
Poland
Portugal
Sweden
Figure 16-1: Population rates of potential organ donors by country.
The first four bars indicate explicit consent countries, where indi-
viduals are assumed not to be organ donors but can take action to
opt in to organ donor status. The remaining bars indicate presumed
consent countries, where the default presumes that individuals
are organ donors while allowing them to opt out if they choose.
(Adapted from Johnson & Goldstein, 2003.)
Imagine that you just moved to a new state and must get a new
driver’s license. As you complete the application, you come
across the following. Please read and respond as you would if
you were actually presented this choice today. We are inter-
ested in your honest response: In this state every person is
412 DESIGNING THE WORLD
Determining Right-of-Way
Ancient Rome was a city of perhaps a million people, but it lacked
traffic signs (let alone stoplights) to guide the many pedestrians,
horse riders, and chariots on its roads. Right-of-way was deter-
mined by wealth, political status, and reputation. In case of ambi-
guity about which of these cues was more important, the issue was
decided by how loudly accompanying slaves could yell, or by
physical force. This led to much confusion and conflict on the
roads of Rome. Historian Michael Grant even controversially
hypothesized that traffic chaos pushed Nero over the edge, leading
him to burn the city in the year 64 A.D. with hopes of subse-
quently building a more efficient road system (Gartner, 2004).
In contrast to the compensatory system of Nero’s time that required
simultaneous consideration of multiple factors, right-of-way through-
out most of the world is now governed by noncompensatory lexico-
graphic rules that leave far less room for ambiguity, although the
details differ between countries. In Germany, for example, the right-
of-way rules for deciding which of two cars approaching an inter-
section gets to go through first include the following hierarchy:
3. In 1982 the winner of a game was allocated two points (not three as is
the case at the time of this writing).
DESIGNED TO FIT MINDS: INSTITUTIONS AND ECOLOGICAL RATIONALITY 419
4. Again, note that this set of cues is not exactly the same as that used in
some of the World Cups we analyzed.
420 DESIGNING THE WORLD
500
471
400
Frequency
300
200
100
38
16
3 0 0 1
0
1 2 3 4 5 6 7
Number of Cues Looked Up
Beliefs About Winning on Slot Machines: It’s Not All in the Players’ Heads
(Bennis, 2004; Smith & Preston, 1984; Wagenaar, Keren, & Pleit-
Kuiper, 1984; Zola, 1963). Nonetheless, although other sources of
utility besides expected winnings are undoubtedly part of what
motivates gamblers, there is abundant evidence that many people
gamble because they have false beliefs about their ability to win.
Often this is a belief that they have an advantage over the casino,
but casino gamblers also systematically overestimate their chances
of winning, overestimate the role of skill in games that are largely
determined by chance, and use gambling strategies that do not work
(Ladouceur, 1993; Lambos & Delfabbro, 2007; Miller & Currie, 2008;
Sundali & Croson, 2006; Wagenaar, 1988; Walker, 1992b). Thus, at
least part of why people gamble seems to stem from a systematic
failure to estimate their expected payoffs correctly.
Theories attempting to account for this faulty payoff estimation
fall into two broad categories. The first, and far more common, type
of theory identifies the source of the problem as originating inside
gamblers’ minds. According to such theories, people gamble
because of shortcomings in how they think and reason, including,
among other things, a failure to understand the nature of probabil-
ity and randomness (Gaboury & Ladouceur, 1988, 1989; Ladouceur
& Dubé, 1997; Ladouceur, Dubé, Giroux, Legendre, & Gaudet, 1995;
Lambos & Delfabbro, 2007; Metzger, 1985; Steenbergh, Meyers,
May, & Whelan, 2002; Sundali & Croson, 2006; Wagenaar, 1988;
Walker, 1990, 1992a).
The second type of explanation, to which we subscribe, focuses
on factors in the external environment: While acknowledging that
gamblers may sometimes have false beliefs about their chances of
winning and use the wrong heuristics, we argue that the source of
these shortcomings lies not so much in biased or irrational think-
ing, but rather in the gamblers’ environment and their interactions
with it (see, e.g., Bennis, 2004; Dickerson, 1977; Griffiths & Parke,
2003a; Harrigan, 2007, 2008; Parke & Griffiths, 2006). Specifically,
there is a mismatch between the (otherwise usually adaptive) heu-
ristics used by gamblers on the one hand, and the structure of the
casino environment on the other—the opposite of the ecologically
rational match between heuristics and environments explored
extensively elsewhere in this book.
Why does this mismatch come about? Because it is in the casi-
nos’ interest for this mismatch to exist, and they construct the
gamblers’ environment so that it does. The degree to which casinos
intentionally design games to exploit normally adaptive heuristics,
or alternatively simply select the games that end up garnering
the greatest profits and which turn out to be the ones that promote
this mismatch, is an open question. But the result is a wide range
of casino games exquisitely designed to exploit otherwise adaptive
heuristics to the casinos’ advantage. They produce representations
422 DESIGNING THE WORLD
in the environment that provide the cues that the gamblers’ heuris-
tics rely on; as we will see, these cues are about the success and
failure of gambling heuristics and about the ways machines
operate. (This is similar to how companies exploit the often-
adaptive use of recognition to lead people to buy the products that
they recognize through advertisement—see Goldstein & Gigerenzer,
1999, 2002.) Unlike the organ-donor example, in which some envi-
ronments were inadvertently designed in a way that discouraged
organ donation, the casino industry has a powerful incentive to
design environments that contribute to false beliefs and a corre-
sponding maladaptive application of heuristics, since their eco-
nomic success stems from their ability to get and keep people
gambling.
We focus here on slot machine environments constructed by
Las Vegas resort casinos to encourage use of misleading cues
(Bennis, 2004). In the standard economic model, logically equiva-
lent representations of information are irrelevant, because deduc-
tive logic, which is equally capable of utilizing information in any
format, is assumed to underlie behavior. But psychologically, dif-
ferent representations of the same information can have a large
impact on how people use it to reach decisions (see, e.g., chapter 17
on the impact of different representations of medical information).
Thus, the casinos’ ability to influence gambling through the strate-
gic representation of information becomes understandable only
when the economic model is revised to incorporate psychologically
realistic theories of cognition.
5. Coin and token payouts are rapidly being replaced with paper
vouchers such that this method of manipulating subjective experience
may soon be a thing of the past.
DESIGNED TO FIT MINDS: INSTITUTIONS AND ECOLOGICAL RATIONALITY 423
the inner workings of slot machines. Until the 1960s, slot machines
worked much as their exterior design suggests. A machine had
three reels covered with symbols, each with around 20 possible
stop positions where the reel could come to rest showing one of the
symbols, and each stop had an equal probability of occurring
(Cardoza, 1998; Kiso, 2004; Nestor, 1999). Given this design, there
would be 203 (i.e., 8,000) possible outcomes, and a jackpot requir-
ing a unique combination of three symbols would occur with
probability 1 in 8,000, or .000125. After observing the pay line (i.e.,
the payoff-determining three symbols shown when the reels stop
spinning) on several spins on an old machine, along with a view
of the symbols above and below the pay line, savvy players could
estimate the actual number of stops and the frequency of each
symbol on each reel. They could then compare this assessment
with the payout chart for winning combinations to determine the
expected value of playing a particular machine.
Figure 16-3 shows an old and a new slot machine side by side.
On the surface, new slot machines look very much like older
machines, but their internal mechanics are entirely different. New
slot machines use digital random number generators rather than
physically spinning reels to determine wins and misses. Never-
theless, contemporary machines continue to display spinning reels,
providing nonrepresentative cues meant to distort the true payoff-
generating process. If, for example, the largest jackpot requires
Figure 16-3: Left: The “Liberty Bell,” the father of the contempo-
rary slot machine (image courtesy of Marshall Fey), released to the
public in 1899 (Legato, 2004). Right: A contemporary 25¢ banking
slot machine with a siren light on top (image courtesy of Paul and
Sarah Gorman).
DESIGNED TO FIT MINDS: INSTITUTIONS AND ECOLOGICAL RATIONALITY 425
428
DESIGNING RISK COMMUNICATION IN HEALTH 429
Error Rates A test can err in one of two ways. It can produce a
“healthy” result when there is in fact a disease (a false-negative
result or “miss”), and it can indicate a disease where there is none
(a false-positive result or “false alarm”). If the proportion of false-
negative test results is low, a test is said to have a high sensitivity
(false-negative rate and sensitivity add up to 1), and if the propor-
tion of false-positive results is low, it has a high specificity (false-
positive rate and specificity add up to 1). It is not possible to increase
sensitivity and specificity of a given test at the same time. If, for
instance, the critical value that determines whether a specific test
value on a continuous scale is classified as a positive or negative test
result is changed such that this test becomes more sensitive (to reduce
the number of misses), its rate of false-positive results necessarily
goes up.
In a large American study with more than 26,000 women between
the ages of 30 and 79 years who participated in a first mammogra-
phy screening, the sensitivity was 90% and the specificity was
93.5% (Kerlikowske, Grady, Barclay, Sickles, & Ernster, 1996). A
meta-analysis over several systematic screening programs found—
over all age groups and for a 1-year interval—sensitivities between
83% and 95% and specificities between 94% and 99% (Mushlin,
Kouides, & Shapiro, 1998). The statistical properties of a test, espe-
cially sensitivity, depend on the age of the women, due to changes
in breast tissue (higher sensitivity in older women), but also on the
radiological criteria being used and on the training and experience
of the radiologists (Mühlhauser & Höldke, 1999).
Predictive Values The probability with which a positive test result cor-
rectly predicts the presence of a disease is called the positive pre-
dictive value of a test. Accordingly, the negative predictive value is
DESIGNING RISK COMMUNICATION IN HEALTH 433
a woman with a positive test not to worry and just to have a follow-
up test, or to start thinking about treatment and life with the dis-
ease.
The difficulties that people have in reasoning with conditional
probabilities are often presented as if they were the natural conse-
quence of flawed mental software (e.g., Bar-Hillel, 1980). This view,
however, overlooks the fundamental fact that the human mind
processes information through external representations, and that
using particular representations can improve or impair our ability
to draw correct conclusions based on statistical information. How
can the different perspective of ecological rationality help us to
construct the information environment in a way that fits human
decision mechanisms?
Natural Frequencies
Studies that previously found that physicians (Berwick, Fineberg,
& Weinstein, 1981) and laypeople (see Koehler, 1996b) have great
difficulties in understanding the predictive value of test results
typically presented information in terms of probabilities and per-
centages, as in Problem 1 above. Now consider the following alter-
native representation:
100
Probabilities Natural Frequencies
Correct Bayesian Inferences (%)
75
50
25
0
Laypeople Medical Students Physicians
p BC | positive M
p BC p positive M | BC
p BC p positive M | BC p no BC p positive M | no BC
.99
.1 (1)
p ( BC|positive M) =
(BC & positive M)
(BC & positive M) ( no BC & positive
p M)
8
=
8 + 99 (2)
10 990
normalization
10 990 BC 1,000 1,000 no BC
2. These are the numbers used in the studies cited in Figure 17-1. As
we mentioned above (Equation 1), the positive predictive value resulting
from this input is 7.5%. With more recent estimates for the prevalence
(0.6%), sensitivity (90%), and false-positive rate (6%), the positive predic-
tive value would be 8.3%.
438 DESIGNING THE WORLD
Relative Risk Reduction: What Does a 25% Chance of a Treatment Benefit Mean?
In addition to single-event probabilities and conditional probabili-
ties, there is a third type of statistical information that frequently
leads to misunderstandings in communicating risk: relative risk
reduction. What is the benefit of mammography screening with
respect to the risk of dying from breast cancer? Women who ask
this question often hear the following answer: By undergoing rou-
tine mammography screening, women over 40 years of age reduce
their risk of dying from breast cancer by 25%. This number is a
relative risk reduction, which is the relative decrease in the number
of breast cancer deaths among women who participate in mammog-
raphy screening compared to the number of breast cancer deaths
among women who do not participate. As a relative value (more
precisely, as a ratio of two ratios), this number is mute about the
underlying absolute frequencies. One source for estimating these
absolute frequencies are four Swedish randomized control trials
that included women between 40 and 74 years of age (Nystroem
et al., 1996). It was found that out of 1,000 women who did not
participate in mammography screening, 4 died of breast cancer,
while out of 1,000 women who did participate in mammography
screening, there were 3 who died of breast cancer. Screening thus
saved the life of 1 out of 4 women who would otherwise have died
from breast cancer, which is a reduction of 25%.3
Relative risk reduction is not the only way to represent the ben-
efits of mammography. Alternatively, its benefits can be framed in
Smith (1998) went one step further and asked patients also to
estimate the predictive values of diagnostic tests (however, not of
mammography screening). They found that patients assumed simi-
lar error rates and positive predictive values for five different diag-
nostic tests, independent of the actual numbers. The patients
expected rather low error rates (false negatives were perceived to be
more likely than false positives) and very high positive predictive
values. If women applied this rationale to the test efficiency of
mammography screening, then one could expect that they would
also overestimate the test’s positive predictive value.
But even if lack of time was not a problem, there are other obsta-
cles: Many physicians are not trained in the communication
skills required for discussing risks and benefits with their patients
(Gigerenzer, 2002; Towle, Godolphin, Grams, & Lamarre, 2006).
Between a quarter and a third of the American physicians in the
previously mentioned study (Dunn et al., 2001) said that the com-
plexity of the topic and a language barrier between themselves and
their patients would keep them from discussing the benefits and
risks of mammography screening with their patients (some even
indicated their own lack of knowledge as a reason).
The evidence on the facilitating effect of intuitive representa-
tions such as natural frequencies presented earlier in this chapter
can also be applied to training programs. Note that the evidence
came from studies in which the format had been experimen-
tally manipulated: The positive effect of natural frequencies was
established without having to provide more knowledge through
training or instruction—just by replacing probabilities and percent-
ages with natural frequencies. But people can also be explicitly
trained to translate conditional probabilities into this format and
thus gain insight even if the information is originally presented
in terms of probabilities. Especially doctors and other health
professionals could benefit from such training, not only for improv-
ing their risk communication skills, but also for improving
their own diagnostic inferences (because they will frequently
encounter statistical information in terms of probabilities and nor-
malized frequencies in medical textbooks). In fact, teaching people
to change representations turns out to be much more effective in
improving diagnostic inferences than training them to apply math-
ematical formulas such as Bayes’s rule (Kurzenhäuser & Hoffrage,
2002; Sedlmeier & Gigerenzer, 2001; see also Gigerenzer, Mata,
et al., 2008).
Given the facilitating effect of natural frequency representations,
it is straightforward to teach risk communicators such an informa-
tion format and to provide them with the necessary data. Textbooks
and training programs, as discussed in the previous paragraph,
are one way to achieve this goal. Another way would be to let the
environment do the work. If physicians lived in an environment in
which they got accurate, timely, and complete feedback, they would
be able to construct natural frequency representations themselves,
based on their own experience. This, however, is often not the case.
Radiologists, for instance, who perform screening mammograms,
usually refer women with a positive result to a pathologist, who in
turn may order a biopsy. If the radiologist is not notified about the
result of the biopsy, he or she cannot build the experience required
to estimate the predictive value of a positive mammogram. An easy
448 DESIGNING THE WORLD
5. For most women this is not the best estimate—particularly not for
those women who have not yet developed this disease. The cumulative life-
time risk is a fictitious probability that is attached to a newborn female, com-
puted based on the assumption that today’s probabilities of developing breast
cancer within a specific age group remain constant until this newborn dies
at the age of 85. But because the probability of getting breast cancer between
the ages of, say, 60 and 85 is necessarily smaller than the probability of get-
ting the disease between birth and the age of 85, any woman who has not
yet had breast cancer has a lower probability than 1-in-9 of getting it before
the age of 85.
452 DESIGNING THE WORLD
Conclusion
454
CAR PARKING AS A GAME BETWEEN SIMPLE HEURISTICS 455
its physical layout and some social factors such as the flux of
arriving cars.
Many real-life parking decisions are complicated by the intricate
topology of streets and parking lots and the idiosyncratic variation
in their likelihood of having vacancies. Indeed, most empirical
work on parking has focused on these higher level structures and
how drivers deal with them, for instance, how they decide which
streets to drive down or parking lots to check to find a good
spot (Salomon, 1986; Thompson & Richardson, 1998). To turn the
spotlight instead on how drivers decide between individual park-
ing places, we constructed our model around a very simple and
constant topology: a long dead-end street, with an approach lane
leading to the destination and a return lane leading away from it,
and a parking strip (central reservation) between the two lanes, one
car wide, where cars going in either direction can park (Figure 18-1).
All drivers have the same destination at the end of this street,
and all pass a common starting point that is far enough away to
be clear of parked cars. There are 150 parking places up to the des-
tination, which is sufficient, given the other conditions, for drivers
always to find somewhere to park. If cars fail to select a parking
space as they approach the destination, they turn around and take
the first vacancy they come to on their way out. Turning around
anywhere other than at the destination is not allowed. Once parked,
drivers walk to the destination, spend a variable but predeter-
mined time there, walk back, and then drive away in the return lane.
We explain later the various rules by which we allow drivers
to decide whether to park in a vacant parking place. All the rules
assume that drivers cannot see whether parking places in front of
them are occupied, with the consistent exception that on their
way to the destination drivers never take a space if the next place
beyond it is also empty. Just occasionally this catches drivers
Destination
1 2 3 4 5 6 7 8
out when a car in the return lane takes the space in front before
they get to it.3
We model time as discrete steps of 0.75 seconds, the time taken
to drive past one parking place (if it is 5 meters long, and speed is
22.5 kilometers per hour). Turning around at the destination is
instantaneous. We assume that walking is one-fifth the speed of
driving. The time a driver spends at the destination is randomly
drawn from a gamma distribution with a mean of 30 minutes, with
shape parameter 2 (i.e., a skewed bell shape with mode = 15 min-
utes), and an upper limit of 3 hours. Observed parking time
distributions are indeed skewed like this or even more so (Young,
1986). Each day, the parking strip starts empty and 1,080 cars
arrive at the end of the street over a period of 9 hours (averaging
two per minute). Arrival times within this period are randomly
drawn from a uniform distribution, except that if two cars draw the
same 0.75-second time step, one randomly draws another time.
In our first investigations we make the simplifying assumption
that the population is composed of drivers all using the same heu-
ristic, and we assess the performance of a single “mutant” driver
using a modified heuristic in such a social environment. (Later we
relax this assumption and develop an evolutionary algorithm in
which there can be many coexisting strategies in the population
competing against each other.) To make comparisons between dif-
ferent strategies efficient, we compare what would happen to the
same car if it went back to its original starting position and tried
another strategy (cf. a repeated-measures design) in the following
way: Each day a car is selected at random from those arriving,
and the simulation proceeds until it is time for this car to enter the
street. The state of all cars is then stored, and the simulation
proceeds with the focal car’s driver using one particular strategy.
Once the driver selects a parking space and the strategy’s perfor-
mance has been assessed, the original state of the street at the car’s
arrival time is restored. Then the simulation restarts with the focal
car using another strategy, but with all other drivers arriving at the
same times, spending the same times at the destination and using
the same strategies as before. Our comparisons of strategies were
typically based on means of 100,000 focal cars.4
3. For each time step we work backward from the destination allowing
cars in the incoming lane to move toward the destination or park in an
adjacent space if empty, then we work back down the return lane, from the
exit toward the destination, moving each car one space forward or letting
it park, and then again in the same direction allowing parked cars to leave
if the owner has returned to the car and there is an adjacent empty gap in
the return lane.
4. A different procedure was used to compare situations in which every
individual in the population uses the same strategy. For each day we
462 DESIGNING THE WORLD
recorded the performance of every car and took the average. We then aver-
aged this average over 100,000 days of independent simulations.
5. More precisely, the vertical axis measures the time from arriving at
a starting position 150 spaces from the destination until returning to this
starting position on the way back, but omitting the time spent at the des-
tination itself.
CAR PARKING AS A GAME BETWEEN SIMPLE HEURISTICS 463
540 DP = 15
DP = 31
DP = 45
Dm = DP
Travel Time (Seconds)
520
500
480
460
0 10 20 30 40 50 60
Dm = Mutant Aspiration Level (Places From Destination)
fact, the next available space, say at position K, would also be the
one taken by mutants with values of Dm between DP and K, so those
mutant strategies will therefore have similar levels of performance
to Dm = 44 (as shown by the flattening of the line of open circles). If
we change the population’s value of DP by a few places the position
of the kink in the graph shifts correspondingly.
The line of crosses in Figure 18-2 shows the outcome when the
population strategy shifts more dramatically to DP = 15. The kink
has disappeared and the mutant driver now does better to accept a
space farther from the destination than would the rest of the popu-
lation. This is because if it seeks only a closer space (Dm < 15), it
will probably not find one on the way to the destination and will
thus waste time driving there and back before taking one farther
than 15 parking places from the destination; this probably was
already available on the inward journey. In this social environment
464 DESIGNING THE WORLD
the algorithm used is that it can find only pure Nash equilibria, in
which every individual adopts the same value of D. But it is also
possible for the population to reach mixed equilibria, in which dif-
ferent values of D would be used by different drivers according to a
particular probability distribution that results in an equal mean
payoff for all drivers. Later we describe an evolutionary algorithm
we used to search for such mixed equilibria.
In the search process just presented, the population’s overall
change in strategy toward the Nash equilibrium is driven by the
selfish behavior of individuals adopting the best-performing strat-
egy. But the mean performance of individuals in the population
need not improve as the population approaches this equilibrium
and may get worse (related to the Tragedy of the Commons, where
all individuals seeking to maximize their own benefit makes things
worse for everyone; in real life people being picky about parking
spaces further reduces overall performance because of the increased
traffic generated; Vanderbilt, 2008, pp. 149 ff.). Here, when DP = 62
we find the social optimum that minimizes mean total travel time,
to 462 seconds, which is 15 seconds less than the mean travel time
for everyone at the Nash equilibrium. Thus, the population as a
whole suffers at equilibrium from everyone’s attempts to find better
parking spots.
7. All strategies will not park in an empty space if the next parking place
closer to the destination is also empty; instead, they move one place for-
ward, reevaluate the available information, and make a decision again. All
strategies take the first free place after turning around at the destination.
CAR PARKING AS A GAME BETWEEN SIMPLE HEURISTICS 469
.5 1000
.25 500
0 0
Car-count Space-count
.75 1500
Proportion of Day Occupied
.25 500
0 0
Block-count x-out-of-y
.75 1500
.5 1000
.25 500
0 0
Linear-operator All heuristics
.75 1500
.5 1000
.25 500
0 0
100 75 50 25 1 25 50 75 100 100 75 50 25 1 25 50 75 100
Distance From Destination
473
474 DESIGNING THE WORLD
Note. The space-count heuristic evolves to require sufficient spaces before parking so that cars
always turn at the destination; therefore parameter values above about 39 spaces are selectively
neutral.
CAR PARKING AS A GAME BETWEEN SIMPLE HEURISTICS 477
100
Parking Places From Destination
80
60
40
DP= 31
20
0 1 2 3 4 5 6 7 8 9 10 11
Time (Hours)
20
10
20
10
20
10
20
10
480
CAR PARKING AS A GAME BETWEEN SIMPLE HEURISTICS 481
Gerd Gigerenzer
Peter M. Todd
487
488 AFTERWORD
Simon’s Question
Three Illustrations
To repeat, ecological rationality is a normative discipline that
requires descriptive knowledge about the processes underlying
decision making. Normative statements about decision making
involve both psychological and environmental structures, and to
know what is best, we must know what structures go into the deci-
sion process. Despite long-standing admonitions to avoid the so-
called naturalistic fallacy—never derive ought from is—in this case
the ought, how people should make decisions, is not independent
from the is, how people are able to make decisions. To illustrate
the importance of understanding cognitive processes for determin-
ing what one ought to do, we revisit three problems in health care
discussed earlier in this book.
What organ donation policy should a government implement? If
we want to save some of the lives of those 5,000 Americans who die
every year waiting in vain for a donation, then we need to know
first how people make decisions (chapter 16). More specifically, we
need to know why the great majority of Americans do not sign up
to be a potential organ donor despite most saying they are in favor
of donation. If people do not sign up because they are not informed
about the problem, then country-wide information campaigns are
what we ought to do. Yet millions of dollars and euros have been
spent on such campaigns with little success, because they are
derived from the wrong psychology, based on the belief that more
information will always help. If the behavior of most people is
instead driven by using the default heuristic in the local legal envi-
ronment concerning organ donation, then we ought to do some-
thing different to save the lives of those waiting: Change the opt-in
default on donation to an opt-out default. To debate what is the
right thing to do without analyzing the interaction between mind
and environment may prove futile and cost lives.
Next, consider another key problem in health care: A majority of
physicians do not understand health statistics, such as how to esti-
mate the probability that a patient has cancer after a positive screen-
ing test (chapter 17). The normative recommendation made for
decades is that physicians should learn how to derive this proba-
bility using Bayes’s rule, given the sensitivity and specificity of
the test, and the prior probability of the disease. Yet, this proposal
has had as little success as the organ donor publicity campaigns.
An efficient solution to this problem starts once again with an anal-
ysis of the cognitive processes of physicians and the structure of
ECOLOGICAL RATIONALITY: THE NORMATIVE STUDY OF HEURISTICS 495
We began this chapter with the schism between “is” and “ought,”
institutionalized in the division of labor between disciplines. Until
recently, the study of cognitive heuristics has been seen as a solely
descriptive enterprise, explaining how people actually make deci-
sions. The study of logic and probability, in contrast, has been
seen as answering the normative question of how one should make
decisions. This split has traditionally elevated logic and probabil-
ity above heuristics—contrasting the pure and rational way people
should reason with the dirty and irrational way people in fact do
reason. Yet logic, statistics, and heuristics finally need to be treated
as equals, each suited to its particular kind of problem.
The study of ecological rationality widens the domain of the
analysis of rational behavior from situations with perfect knowl-
edge to those with imperfect knowledge. It is a more modest kind of
rationality that is not built on what is the best strategy overall, but
what is best among the available alternatives. To strive for the abso-
lute best—optimization—is an appealing but often unrealistic goal,
a rational fiction possibly anchored in our Western religions.
According to many traditions, God or the Creator is omnipotent,
or almighty, with unlimited power to do anything. He (sometimes
she) is also omniscient, knowing everything about his creation.
Furthermore, some theologians proposed that God has created
every animal and plant so perfectly that it could not fit better into
its environment, a concept that we might call optimization today.
These three O’s, omnipotence, omniscience, and optimization, have
ECOLOGICAL RATIONALITY: THE NORMATIVE STUDY OF HEURISTICS 497
498
REFERENCES 499
Bröder, A. (2002). Take the best, Dawes’ rule, and compensatory deci-
sion strategies: A regression-based classification method. Quality
& Quantity, 36, 219–238.
Bröder, A. (2003). Decision making with the “adaptive toolbox”:
Influence of environmental structure, intelligence, and work-
ing memory load. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 29, 611–625.
Bröder, A. (2005). Entscheiden mit der “adaptiven Werkzeugkiste”:
Ein empirisches Forschungsprogramm. [Decision making with the
“adaptive toolbox”: An empirical research program]. Lengerich,
Germany: Pabst Science.
Bröder, A., & Eichler, A. (2001). Individuelle Unterschiede in bevor-
zugten Entscheidungsstrategien. [Individual differences in preferred
decision strategies]. Poster presented at the 43rd “Tagung experi-
mentell arbeitender Psychologen,” April 9–11, 2001, Regensburg,
Germany.
Bröder, A. & Eichler, A. (2006). The use of recognition information and
additional cues in inferences from memory. Acta Psychologica,
121, 275–284.
Bröder, A. & Gaissmaier, W. (2007). Sequential processing of cues in
memory-based multi-attribute decisions. Psychonomic Bulletin
and Review, 14, 895–900.
Bröder, A. & Newell, B. R. (2008). Challenging some common beliefs
about cognitive costs: Empirical work within the adaptive toolbox
metaphor. Judgment and Decision Making, 3, 195–204.
Bröder, A. & Schiffer, S. (2003a). Bayesian strategy assessment in multi-
attribute decision research. Journal of Behavioral Decision Making,
16, 193–213.
Bröder, A. & Schiffer, S. (2003b). “Take the best” versus simultaneous
feature matching: Probabilistic inferences from memory and effects
of representation format. Journal of Experimental Psychology:
General, 132, 277–293.
Bröder, A. & Schiffer, S. (2006a). Adaptive flexibility and maladaptive
routines in selecting fast and frugal decision strategies. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 32,
904–918.
Bröder, A. & Schiffer, S. (2006b). Stimulus format and working memory
in fast and frugal strategy selection. Journal of Behavioral Decision
Making, 19, 361–380.
Brown, N. R. (2002). Real-world estimation: Estimation modes and
seeding effects. Psychology of Learning and Motivation, 41,
321–359.
Bruner, J. S., Goodnow, J. J., & Austin, A. A. (1956). A study of thinking.
New York: Wiley.
Brunswik, E. (1943). Organismic achievement and environmental prob-
ability. Psychological Review, 50, 255–272.
Brunswik, E. (1955). Representative design and probabilistic theory in
a functional psychology. Psychological Review, 62, 193–217.
REFERENCES 505
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multi-
ple regression/correlation analysis for the behavioral sciences (3rd
ed.). Mahwah, NJ: Erlbaum.
Colinvaux, P. A. (1978). Why big fierce animals are rare: An ecologist’s
perspective. Princeton, NJ: Princeton University Press.
Collett, T. S. & Land, M. F. (1975). Visual control of flight behaviour in
the hoverfly, Syritta pipiens L. Journal of Comparative Physiology,
99, 1–66.
Condorcet, N. C. (1785). Essai sur l’application de l’analyse a la probabi-
lite des decisions rendues a la pluralite des voix. Paris: Imprimerie
Royale.
Cook, M. & Mineka, S. (1989). Observational conditioning of fear to
fear-relevant versus fear-irrelevant stimuli in rhesus monkeys.
Journal of Abnormal Psychology, 98, 448–459.
Cook, M. & Mineka, S. (1990). Selective associations in the observational
conditioning of fear in rhesus monkeys. Journal of Experimental
Psychology: Animal Behavior Processes, 16, 372–389.
Cooksey, R. W. (1996). Judgment analysis: Theory, methods, and appli-
cations. London: Academic Press.
Coombs, C. H. & Lehner, P. E. (1981). Evaluation of two alternative
models of a theory of risk: I. Are moments useful in assessing
risks? Journal of Experimental Psychology: Human Perception and
Performance 7, 1110–1123.
Cooper, G. F. (1990). The computational complexity of probabilistic
inference using Bayesian belief networks. Artificial Intelligence,
42, 393–405.
Cooper, R. (2000). Simple heuristics could make us smart; but which
heuristic do we apply when? Behavioral and Brain Sciences, 23,
746.
Corbin, R. M., Olson, C. L., & Abbondanza, M. (1975). Context effects in
optional stopping decisions. Organizational Behavior and Human
Performance, 14, 207–216.
Cosmides, L, & Tooby, J. (1996). Are humans good intuitive statisti-
cians after all? Rethinking some conclusions from the literature on
judgment under uncertainty. Cognition, 58, 1–73.
Costa, P. T. & McCrae, R. R. (1992). The NEO personality inventory
and NEO five factor inventory. Professional manual. Odessa, FL:
Psychological Assessment Resources.
Coulter, A. (1997). Partnerships with patients: The pros and cons
of shared clinical decision-making. Journal of Health Services
Research and Policy, 2, 112–121.
Cover, T. & Hart, P. (1967). Nearest neighbor pattern classification. IEEE
Transactions on Information Theory, 13, 21–27.
Cowan, N. (2001). The magical number 4 in short-term memory: A
reconsideration of mental storage capacity. Behavioral and Brain
Sciences, 24, 87–185.
Czerlinski, J., Gigerenzer, G., & Goldstein, D. G. (1999). How good are
simple heuristics? In G. Gigerenzer, P. M. Todd, & the ABC Research
508 REFERENCES
Group, Simple heuristics that make us smart (pp. 3–34). New York:
Oxford University Press.
Gigerenzer, G., Todd, P. M., & the ABC Research Group. (1999). Simple
heuristics that make us smart. New York: Oxford University
Press.
Gigone, D. & Hastie, R. (1997). The impact of information on small
group choice. Journal of Personality and Social Psychology, 72,
132–140.
Gilbert, D. T. (1991). How mental systems believe. American Psychologist,
46, 107–119.
Gilbert, D. T., Krull, D. S., & Malone, P. S. (1990). Unbelieving the
unbelievable: Some problems in the rejection of false information.
Journal of Personality and Social Psychology, 59, 601–613.
Gilbert, D. T., Tafarodi, R. W., & Malone, P. S. (1993). You can’t not
believe everything you read. Journal of Personality and Social
Psychology, 65, 221–233.
Gimbel, R. W., Strosberg, M. A., Lehrman, S. E., Gefenas, E., & Taft, T.
(2003). Presumed consent and other predictors of cadaveric organ
donation in Europe. Progress in Transplantation, 13, 17–23.
Girotto, V. & Gonzalez, M. (2001). Solving probabilistic and statistical
problems: A matter of information structure and question form.
Cognition, 78, 247–276.
Gladwell, M. (2005). Blink: The power of thinking without thinking.
New York: Little, Brown.
Goldberg, L. R. (1970). Man versus model of man: A rationale, plus
some evidence of improving on clinical inferences. Psychological
Bulletin, 73, 422–432.
Goldberger, A. S. (1991). A course in econometrics. Cambridge, MA:
Harvard University Press.
Goldstein, D. G. & Gigerenzer, G. (1999). The recognition heuristic:
How ignorance makes us smart. In G. Gigerenzer, P. M. Todd, &
the ABC Research Group, Simple heuristics that make us smart
(pp. 37–58). New York: Oxford University Press.
Goldstein, D. G. & Gigerenzer, G. (2002). Models of ecological rationality:
The recognition heuristic. Psychological Review, 109, 75–90.
Goldstein, D. G., Johnson, E. J., Herrmann, A., & Heitmann, M. (2008).
Nudge your customers toward better choices. Harvard Business
Review, 86(12), 99–105.
Good, I. J. (1967). On the principle of total evidence. The British Journal
for the Philosophy of Science, 17, 319–321.
Good, I. J. (1983). Good thinking: The foundations of probability and
its applications. Minneapolis: University of Minnesota Press.
Gordon, K. (1924). Group judgments in the field of lifted weights.
Journal of Experimental Psychology, 3, 398–400.
Gøtzsche, P. C. & Nielsen, M. (2006). Screening for breast cancer with
mammography. Cochrane Database of Systematic Reviews 2006, 4,
Art. No. CD001877.
REFERENCES 517
Hallowell, N., Statham, H., Murton, F., Green, J., & Richards, M. (1997).
“Talking about chance”: The presentation of risk information
during genetic counseling for breast and ovarian cancer. Journal of
Genetic Counseling, 6, 269–286.
Hamilton, D. L. & Gifford, R. K. (1976). Illusory correlation in inter-
personal perception: A cognitive basis of stereotypic judgments.
Journal of Experimental Social Psychology, 12, 392–407.
Hamilton, D. L. & Sherman, S. J. (1989). Illusory correlations: Implications
for stereotype theory and research. In D. Bar-Tal, C. F. Graumann,
A. W. Kruglanski, & W. Stroebe (Eds.), Stereotype and prejudice:
Changing conceptions (pp. 59–82). New York: Springer.
Hamm, R. M. & Smith, S. L. (1998). The accuracy of patients’ judge-
ments of disease probability and test sensitivity and specificity.
Journal of Family Practice, 47, 44–52.
Hammond, K. R. & Wascoe, N. E. (1980). Realizations of Brunswik’s
representative design. New Directions for Methodology of Social
and Behavioral Science, 3, 271–312.
Hann, A. (1999). Propaganda versus evidence based health promo-
tion: The case of breast screening. International Journal of Health
Planning and Management, 14, 329–334.
Hansell, M. (2005). Animal architecture. New York: Oxford University
Press.
Harrigan, K. A. (2007). Slot machine structural characteristics: Distorted
player views of payback percentages. Journal of Gambling Issues,
20, 215–234.
Harrigan, K. A. (2008). Slot machine structural characteristics: Creating
near misses using high award symbol ratios. International Journal
of Mental Health and Addiction, 6, 353–368.
Hasher, L., Goldstein, D., & Toppino, T. (1977). Frequency and the
conference of referential validity. Journal of Verbal Learning and
Verbal Behavior, 16, 107–112.
Hasher, L. & Zacks, R. T. (1984). Automatic processing of fundamen-
tal information: The case of frequency of occurrence. American
Psychologist, 39, 1372–1388.
Hasson, U., Simmons, J. P., & Todorov, A. (2005). Believe it or not:
On the possibility of suspending belief. Psychological Science, 16,
566–571.
Hastie, R. & Kameda, T. (2005). The robust beauty of majority rules in
group decisions. Psychological Review, 112, 494–508.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical
learning: Data mining, inference, and prediction. New York: Springer.
Hauser, M. D., Feigenson, L., Mastro, R. G., & Carey, S. (1999). Non-
linguistic number knowledge: Evidence of ordinal representations
in human infants and rhesus macaques. Poster presented at the
Society for Research in Child Development, Albuquerque, NM.
Hausmann, D. (2004). Informationssuche im Entscheidungsprozess
[Information search in the decision process]. Unpublished doc-
toral dissertation, University of Zürich, Switzerland.
REFERENCES 519
Hausmann, D., Läge, D., Pohl, R., & Bröder, A. (2007). Testing the
QuickEst: No evidence for the quick-estimation heuristic. European
Journal of Cognitive Psychology, 19, 446–456.
Hayek, F. (1945). The use of knowledge in society. American Economic
Review, 35, 519–530.
Heilbrun, K., Philipson, J., Berman, L., & Warren, J. (1999). Risk communi-
cation: Clinicians’ reported approaches and perceived values. Journal
of the American Academy of Psychiatry and Law, 27, 397–406.
Heller, R. F., Sandars, J. E., Patterson, L., & McElduff, P. (2004). GP’s
and physicians’ interpretation of risks, benefits and diagnostic test
results. Family Practice, 21, 155–159.
Helversen, B. von, & Rieskamp, J. (2008). The mapping model: A cog-
nitive theory of quantitative estimation. Journal of Experimental
Psychology: General, 137, 73–79.
Henrich, J. & Gil-White, F. J. (2001). The evolution of prestige: Freely
conferred deference as a mechanism for enhancing the benefits
of cultural transmission. Evolution and Human Behavior, 22,
165–169.
Hertel, G., Kerr, N. L., & Messe, L. A. (2000). Motivation gains in per-
formance groups: Paradigmatic and theoretical developments on
the Koehler effect. Journal of Personality and Social Psychology,
79, 580–601.
Hertwig, R., Davis, J. R., & Sulloway, F. J. (2002). Parental investment:
How an equity motive can produce inequality. Psychological
Bulletin, 128, 728–745.
Hertwig, R. & Gigerenzer, G. (1999). The “conjunction fallacy” revis-
ited: How intelligent inferences look like reasoning errors. Journal
of Behavioral Decision Making, 12, 275–305.
Hertwig, R., Gigerenzer, G., & Hoffrage, U. (1997). The reiteration effect
in hindsight bias. Psychological Review, 104, 194–202.
Hertwig, R., Herzog, S. M., Schooler, L. J., & Reimer, T. (2008). Fluency
heuristic: A model of how the mind exploits a by-product of infor-
mation retrieval. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 34, 1191–1206.
Hertwig, R., Hoffrage, U., & the ABC Research Group. (in press). Simple
heuristics in a social world. New York: Oxford University Press.
Hertwig, R., Hoffrage, U., & Martignon, L. (1999). Quick estimation:
Letting the environment do some of the work. In G. Gigerenzer,
P. M. Todd, & the ABC Research Group, Simple heuristics that
make us smart (pp. 209–234). New York: Oxford University Press.
Hertwig, R., Pachur, T., & Kurzenhäuser, S. (2005). Judgments of risk
frequencies: Tests of possible cognitive mechanisms. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 31,
621–642.
Hertwig, R. & Todd, P. M. (2003). More is not always better: The benefits
of cognitive limits. In D. Hardman and L. Macchi (Eds.), Thinking:
Psychological perspectives on reasoning, judgment and decision
making (pp. 213–231). Chichester, UK: Wiley.
520 REFERENCES
Herzog, S. M., & Hertwig, R. (in press). The ecological validity of flu-
ency. In C. Unkelbach & R. Greifeneder (Eds.), The experience of
thinking. London: Psychology Press.
Hey, J. D. (1982). Search for rules for search. Journal of Economic
Behavior and Organization, 3, 65–81.
Hibbard, J. H. & Peters, E. (2003). Supporting informed consumer
health care decisions: Data presentation approaches that facilitate
the use of information in choice. Annual Review of Public Health,
24, 413–433.
Hilgard, E. R. & Bower, G. H. (1975). Theories of learning (4th ed.).
Englewood Cliffs, NJ: Prentice-Hall.
Hinsz, V. B., Tindale, R. S., & Vollrath, D. A. (1997). The emerging con-
ceptualization of groups as information processors. Psychological
Bulletin, 121, 43–64.
Hintzman, D. L. (1990). Human learning and memory: Connections
and dissociations. Annual Review of Psychology, 41, 109–139.
Hintzman, D. L. & Curran, T. (1994). Retrieval dynamics of recogni-
tion and frequency judgments: Evidence for separate processes of
familiarity and recall. Journal of Memory and Language, 33, 1–18.
Hoffrage, U. (2008). Skewed information structures. Working paper,
University of Lausanne.
Hoffrage, U. (2011). Recognition judgments and the performance of
the recognition heuristic depend on the size of the reference class.
Judgment and Decision Making, 6, 43–57.
Hoffrage, U. & Gigerenzer, G. (1998). Using natural frequencies to
improve diagnostic inferences. Academic Medicine, 73, 538–540.
Hoffrage, U., Gigerenzer, G., Krauss, S., & Martignon, L. (2002).
Representation facilitates reasoning: What natural frequencies are
and what they are not. Cognition, 84, 343–352.
Hoffrage, U. & Hertwig, R. (2006). Which world should be repre-
sented in representative design? In K. Fiedler & P. Juslin (Eds.),
Information sampling and adaptive cognition (pp. 381–408). New
York: Cambridge University Press.
Hoffrage, U., Hertwig, R., & Gigerenzer, G. (2000). Hindsight bias: A
by-product of knowledge updating? Journal of Experimental
Psychology: Learning, Memory, and Cognition, 26, 566–581.
Hoffrage, U., Lindsey, S., Hertwig, R., & Gigerenzer, G. (2000).
Communicating statistical information. Science, 290, 2261–2262.
Hofstee, W. K. B. (1984). Methodological decision rules as research
policies: A betting reconstruction of empirical research. Acta
Psychologica, 56, 93–109.
Hogarth, R. M. (1974). Process tracing in clinical judgment. Behavioral
Science, 19, 298–313.
Hogarth, R. M. (1978). A note on aggregating opinions. Organizational
Behavior and Human Performance, 21, 40–46.
Hogarth, R. M. (1981). Beyond discrete biases: Functional and dysfunc-
tional aspects of judgmental heuristics. Psychological Bulletin, 90,
197–217.
REFERENCES 521
Oz, M. C., Kherani, A. R., Rowe, A., Roels, L., Crandall, C., Tomatis,
L., et al. (2003). How to improve organ donation: Results of the
ISHLT/FACT Poll. Journal of Heart and Lung Transplantation, 22,
389–410.
Pachur, T. (2010). Recognition-based inference: When is less more in
the real world? Psychonomic Bulletin and Review, 17, 589–598.
Pachur, T. (2011). The limited value of precise tests of the recognition
heuristic. Judgment and Decision Making, 6, 413–422.
Pachur, T. & Biele, G. (2007). Forecasting from ignorance: The use and
usefulness of recognition in lay predictions of sports events. Acta
Psychologica, 125, 99–116.
Pachur, T., Bröder, A., & Marewski, J. N. (2008). The recognition heuris-
tic in memory-based inference: Is recognition a non-compensatory
cue? Journal of Behavioral Decision Making, 21, 183–210.
Pachur, T. & Hertwig, R. (2006). On the psychology of the recognition
heuristic: Retrieval primacy as a key determinant of its use. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 32,
983–1002.
Pachur, T., Hertwig, R., & Rieskamp, J. (in press). The mind as an
intuitive pollster: Frugal search in social spaces. In R. Hertwig U.
Hoffrage, & the ABC Research Group, Simple heuristics in a social
world. New York: Oxford University Press.
Pachur, T., Mata, R., & Schooler, L. J. (2009). Cognitive aging and the
use of recognition in decision making. Psychology and Aging, 24,
901–915.
Pachur, T., Todd, P. M., Gigerenzer, G., Schooler, L. J., & Goldstein, D.
G. (2011). The recognition heuristic: A review of theory and tests.
Frontiers in Cognitive Science, 2, 147.
Paepke, S., Schwarz-Boeger, U., Minckwitz, G. von, Kaufmann, M.,
Schultz-Zehden, B., Beck, H., et al. (2001). Brustkrebsfrüherkennung—
Kenntnisstand und Akzeptanz in der weiblichen Bevölkerung. [Early
detection of breast cancer—Knowledge and acceptance in the female
population]. Deutsches Ärzteblatt, 98, 2178–2186.
Parducci, A. (1968). The relativism of absolute judgment. Scientific
American, 19, 84–90.
Pareto V. (1897). Cours d’économie politique. Lausanne: F. Rouge
& Cie.
Parke, J. & Griffiths, M. D. (2006). The psychology of the fruit machine:
The role of structural characteristics (revisited). International
Journal of Mental Health and Addiction, 4, 151–179.
Paulhus, D. L. (1984). Two-component models of socially desirable
responding. Journal of Personality and Social Psychology, 46,
598–609.
Paulus, P. B., Dugosh, K. L., Dzindolet, M. T., Coskun, H., & Putman,
V. L. (2002). Social and cognitive influences in group brainstorm-
ing: Predicting production gains and losses. In W. Stroebe &
M. Hewstone (Eds.), European review of social psychology (Vol.
12, pp. 299–325). London: Wiley.
REFERENCES 537
Reimer, T., Kuendig, S., Hoffrage, U., Park, E., & Hinsz, V. (2007). Effects
of the information environment on group discussions and decisions
in the hidden-profile paradigm. Communication Monographs, 74,
1–28.
Reimer, T., Reimer, A., & Hinsz, V. (2010). Naïve groups can solve the
hidden-profile problem. Human Communication Research, 36,
443–467.
Renner, B. (2004). Biased reasoning: Adaptive responses to health risk
feedback. Personality and Social Psychology Bulletin, 30, 384–
396.
Rice, J. A. (1995). Mathematical statistics and data analysis. Belmont,
CA: Duxbury Press.
Richter, T. & Späth, P. (2006). Recognition is used as one cue among
others in judgment and decision making. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 32, 150–162.
Rieskamp, J. (1997). Die Verwendung von Entscheidungsstrategien
unter verschiedenen Bedingungen: Der Einfluß von Zeitdruck und
Rechtfertigung. [The use of decision strategies in different condi-
tions: Influence of time pressure and accountability]. Unpublished
diploma thesis, Technical University of Berlin.
Rieskamp, J. (2006). Perspectives of probabilistic inferences:
Reinforcement learning and an adaptive network compared. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 32,
1355–1370.
Rieskamp, J. (2008). The importance of learning when making infer-
ences. Judgment and Decision Making, 3, 261–277.
Rieskamp, J. & Hoffrage, U. (1999). When do people use simple heu-
ristics, and how can we tell? In G. Gigerenzer, P. M. Todd, &
the ABC Research Group, Simple heuristics that make us smart
(pp. 141–167). New York: Oxford University Press.
Rieskamp, J. & Hoffrage, U. (2008). Inferences under time pressure:
How opportunity costs affect strategy selection. Acta Psychologica,
127, 258–276.
Rieskamp, J. & Otto, P. E. (2006). SSL: A theory of how people learn
to select strategies. Journal of Experimental Psychology: General,
135, 207–236.
Rilling, M. & McDiarmid, C. (1965). Signal detection in fixed-ratio
schedules. Science, 148, 526–527.
Rimer, B. K., Halabi, S., Skinner, C. S., Lipkus, I., Strigo, T. S., Kaplan,
E. B., et al. (2002). Effects of mammography decision-making
intervention at 12 and 24 months. American Journal of Preventive
Medicine, 22, 247–257.
Rivest, R. (1976). On self-organizing sequential search heuristics.
Communications of the ACM, 19, 63–67.
Roberts, S. & Pashler, H. (2000). How persuasive is a good fit? A com-
ment on theory testing. Psychological Review, 107, 358–367.
Rodkin, D. (1995, February). 10 keys for creating top high schools.
Chicago, 78–85.
540 REFERENCES
Roitberg, B. D., Reid, M. L., & Li, C. (1993). Choosing hosts and mates:
The value of learning. In D. R. Papaj & A. C. Lewis (Eds.), Insect
learning: Ecological and evolutionary perspectives (pp 174–194).
New York: Chapman & Hall.
Romano, N. C., Jr. & Nunamaker, J. F., Jr. (2001). Meeting analysis: Findings
from research and practice. In R. H. Sprague (Ed.), Proceedings
of the 34th Hawaii International Conference on System Sciences
(Vol. 1, p. 1072). Los Alamitos, CA: IEEE Computer Society.
Rose, D. A. (2009, July 11). A better way to get a kidney. New York Times,
p. A19. (Also online at http://www.nytimes.com/2009/07/11/
opinion/11rose.html).
Rosenberg, R. D., Yankasas, B. C., Abraham, L. A., Sickles, E. A.,
Lehman, C. D., Geller, B. M., et al. (2006). Performance benchmarks
for screening mammography. Radiology, 241, 55–66.
Ross, L. (1977). The intuitive psychologist and his shortcomings:
Distortions in the attribution process. In L. Berkowitz (Ed.), Advances
in experimental social psychology (Vol. 10, pp. 173–220). New York:
Academic Press.
Rothman, A. J., Bartels, R. D., Wlaschin, J., & Salovey, P. (2006).
The strategic use of gain- and loss-framed messages to promote
healthy behavior: How theory can inform practice. Journal of
Communication, 56, S202–S220.
Rothman, A. J. & Salovey, P. (1997). Shaping perceptions to motivate
healthy behavior: The role of message framing. Psychological
Bulletin, 121, 3–19.
Rubinstein, A. (1980). Ranking the participants in a tournament. SIAM
Journal on Applied Mathematics, 38, 108–111.
Russo, J. E. & Dosher, B. A. (1983). Strategies for multiattribute binary
choice. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 9, 676–696.
Ruxton, G. D., & Beauchamp, G. (2008). The application of genetic
algorithms in behavioural ecology, illustrated with a model of
anti-predator vigilance. Journal of Theoretical Biology, 250,
435–448.
Saad, G., Eba, A. & Sejean, R. (2009). Sex differences when search-
ing for a mate: A process-tracing approach. Journal of Behavioral
Decision Making, 22, 171–190.
Sackett, D. L. (1996). On some clinically useful measures of the effects
of treatment. Evidence-Based Medicine, 1, 37–38.
Salomon, I. (1986). Towards a behavioural approach to city centre park-
ing: The case of Jerusalem’s CBD. Cities, 3, 200–208.
Sarfati, D., Howden-Chapman, P., Woodward, A., & Salmond, C. (1998).
Does the frame affect the picture? A study into how attitudes to
screening for cancer are affected by the way benefits are expressed.
Journal of Medical Screening, 5, 137–140.
Sargent, T. J. (1993). Bounded rationality in macroeconomics. New
York: Oxford University Press.
Savage, L. J. (1954). The foundations of statistics. New York: Wiley.
REFERENCES 541
552
NAME INDEX 553
1/N rule, 4–5, 15–16, 19, 25, 492. ACT-R. See Adaptive Control of
See also investment Thought–Rational (ACT-R)
definition, 4, 10 adaptive coin-flipping, 238
1R rule, 264 Adaptive Control of Thought–
Rational (ACT-R), 151,
abortion, 428 154–161, 259, 404. See also
absolute risk reduction, 442, 444. rational analysis of memory
See also relative risk adaptive decision maker, 21, 26,
reduction 33, 155, 226
accidents, 105–107, 204, 264, 388 adaptive function, 108, 141,
accountability, 304 145–146, 165. See also
accuracy, 33. See also evolution
generalization; overfitting; adaptive toolbox, 11, 20, 27, 46,
robustness 50, 59, 118, 164, 217,
cumulative (online), 284 221–222, 226, 238, 240, 245,
fitting, 37, 43, 250, 376–377 357, 402, 415, 488, 489, 493.
offline (batch learning), 284–285 See also heuristics
predictive, 43–45 adjustable power tool, 240
achievement motive, 227–228 admissions (college), 65, 326–327
action orientation, 227–228 adversarial collaboration, 79
activation (in memory, advertising, 263, 422
ACT-R), 154ff affect, 86, 142. See also emotions
567
568 SUBJECT INDEX
wealth, 125. See also data sets, working memory. See memory,
billionaires working
distribution, 379–381, 383 World Cup, 417–420. See also
weather, 35–39, 309–310, soccer
312–315, 439–440. See also
climate change; forecasting; x-out-of-y heuristic, 469–471,
temperature 473, 476, 481
Weber’s law, 406
weighted additive linear model zero-sum game, 423
(WADD), 269, 342–343, 344, Zig-Zag rule. See fast and frugal
347–352, 355–356 tree, Zig-Zag rule for
weighting and adding, 12–13, constructing
23–24, 29, 193, 235, 415, Zig-zag tree, 371–372, 375–378.
418. See also Franklin’s See also fast and frugal tree
rule Zipf’s law, 382, 388, 405. See also
word frequency, 382 power law