22

Ecological Rationality
Intelligence in the World

EVOLUTION AND COGNITION
General Editor: Stephen Stich, Rutgers University
Published in the Series

Simple Heuristics That Make Us Smart
Gerd Gigerenzer, Peter M. Todd, and the ABC Research Group
Natural Selection and Social Theory: Selected Papers of Robert

Trivers
Robert Trivers
Adaptive Thinking: Rationality in the Real World

Gerd Gigerenzer
In Gods We Trust: The Evolutionary Landscape of Religion

Scott Atran
The Origin and Evolution of Cultures

Robert Boyd and Peter J. Richerson
The Innate Mind: Structure and Contents

Peter Carruthers, Stephen Laurence, and Stephen Stich, Eds.
The Innate Mind, Volume 2: Culture and Cognition
The Innate Mind, Volume 3: Foundations and the Future

Why Humans Cooperate: A Cultural and Evolutionary
Explanation
Natalie Henrich and Joseph Henrich
Rationality for Mortals: How People Cope with Uncertainty

Gerd Gigerenzer
Ecological Rationality: Intelligence in the World

Peter M. Todd, Gerd Gigerenzer, and the ABC Research Group
Intelligence in the World
Peter M. Todd
Gerd Gigerenzer
and the ABC Research Group
1
1
Oxford University Press, Inc., publishes works that further
Oxford University’s objective of excellence
in research, scholarship, and education.
Oxford New York

Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto
With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam
Copyright © 2012 by Peter M. Todd and Gerd Gigerenzer
Published by Oxford University Press, Inc.

198 Madison Avenue, New York, New York 10016
www.oup.com
Oxford is a registered trademark of Oxford University Press
All rights reserved. No part of this publication may be reproduced,

stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording, or otherwise,
without the prior permission of Oxford University Press.
Library of Congress Cataloging-in-Publication Data

Ecological rationality : intelligence in the world / edited by Peter M. Todd
and Gerd Gigerenzer.
p. cm. — (Evolution and cognition series)
Includes bibliographical references and index.
ISBN 978-0-19-531544-8
1. Environmental psychology. 2. Heuristic. 3. Reason. I. Todd, Peter M.
II. Gigerenzer, Gerd.
BF353.E28 2011
153—dc23
2011040733
987654321
Printed in USA
on acid-free paper
Dedicated to Herbert Simon and Reinhard Selten, who pioneered
the study of rationality in the real world.
This page intentionally left blank
Preface
T welve years ago, we invited readers to participate in a journey

into largely unknown territory. With this call, we began our book,
Simple Heuristics That Make Us Smart. The invitation still stands,
but the territory is no longer quite so unknown, and some of the
formerly blank spaces on the map have been replaced by clear
contours. This progress is due to a large number of researchers from
many disciplines who followed our call and put their expertise to
work to explore the land of rationality occupied by real people
who have only limited time, knowledge, and computational capac-
ities. For instance, researchers on memory have discovered why
and when a beneficial degree of forgetting can lead to better infer-
ences about the world; researchers in business have found out that
managers rely on one-reason heuristics to predict consumer behav-
ior better than costly, complex statistical methods; and philoso-
phers have begun to debate what responsibility and morality mean
in an uncertain world where epistemic laziness—relying on lim-
ited information—can lead to better judgments.
Ecological Rationality focuses on a central and challenging
aspect of this exploration: understanding rationality as a match
between mind and environment. Before Simple Heuristics, a largely
unquestioned view was that humans and other animals rely on
heuristics, but that they would do better if they would process
viii PREFACE
information in a “rational” way, identified variously with proposi-

tional logic, Bayesian probability updating, or the maximization of
expected utility. In contrast, we argued in Simple Heuristics that
there is no single rational tool for all human tasks, based on some
logical principle, but an adaptive toolbox with specific tools imple-
menting bounded rationality, each tool based on mental core capac-
ities. As a consequence, the proper questions are which tools work
well in a given environment, and why. These are the questions of
ecological rationality that we explore in this book. The vision of
rationality is not logical, but ecological.
The environment is crucial for understanding the mind. Herbert
Simon drew attention to this with his metaphor of rationality
emerging from the interaction of two blades of a pair of scissors,
one representing the mental capacities of the actor and the other
the characteristics of the environment. We add “ecological” to
“rationality” to highlight the importance of that second blade,
which is all too often overlooked. This is also the reason for the
subtitle of this book: Intelligent behavior in the world comes about
by exploiting reliable structure in the world—and hence, some of
intelligence is in the world itself.
We set out on this journey as a group of individuals trained in a
number of disciplines, including psychology, economics, mathe-
matics, computer science, biology, business, philosophy, the law,
medicine, and engineering. That this interdisciplinary collabora-
tion has been working and thriving over more than a decade is a
tribute to the young researchers who were willing to take off their
disciplinary blinders and look around and build on what others
brought to the party. The exploration has also flourished under the
generous long-term funding provided by the unique vision of the
Max Planck Society. Much of the work reported in this volume was
carried out at the Max Planck Institute for Human Development in
Berlin, and also by colleagues who joined in the journey after
spending time with us talking, debating, and enjoying getting
together every afternoon at four o’clock for coffee and exploration.
The exploration done since the publication of Simple Heuristics
in fact takes several books to cover. Other volumes investigate
topics such as the role of emotions and culture in bounded rational-
ity (Gigerenzer & Selten, 2001), the role of heuristics in the making
of the law, in litigation, and in court (Gigerenzer & Engel, 2006), the
role of heuristics in intuition (Gigerenzer, 2007), and the founda-
tional work on fast and frugal heuristics (Gigerenzer, Hertwig, &
Pachur, 2011). The third volume in the triptych begun by Simple
Heuristics and this volume extends our exploration from bounded
rationality and ecological rationality to social rationality (Hertwig,
Hoffrage, & the ABC Research Group, in press).
PREFACE ix
There are many people who have helped us in producing this

book. Special thanks go to Peter Carruthers, Stephen Lea, Lauri
Saaksvuori, and the students of Peter Todd’s Structure of Informa-
tion Environments course, all of whom read and commented on
chapters, to Marshall Fey for the image of the Liberty Bell slot
machine used in chapter 16, to Anita Todd and Rona Unrau for
their work in editing everyone’s writing over and over again, to
Doris Gampig for her help with indexing, and to Jürgen Rossbach
and Marianne Hauser for their exemplary work in creating our fig-
ures and graphics. Thanks also to the ever-growing extended ABC
group spread around the globe, for all of your input, insight, and
ideas. And thanks as ever to our families, who create the environ-
mental structure within which we thrive.
Finally, this book is an interim report of an ongoing research
program; for future developments and results, we invite you to visit
our centers’ websites at:
http://www.mpib-berlin.mpg.de/en/research/adaptive-behavior-
and-cognition
http://www.indiana.edu/~abcwest
Bloomington and Berlin Peter M. Todd

October 2010 Gerd Gigerenzer
Contents
The ABC Research Group xv
Part I The Research Agenda

1 What Is Ecological Rationality? 3
Peter M. Todd and Gerd Gigerenzer
Part II Uncertainty in the World

2 How Heuristics Handle Uncertainty 33
Henry Brighton and Gerd Gigerenzer
3 When Simple Is Hard to Accept 61
Robin M. Hogarth
4 Rethinking Cognitive Biases as Environmental
Consequences 80
Gerd Gigerenzer, Klaus Fiedler, and Henrik Olsson
Part III Correlations Between Recognition and the World

5 When Is the Recognition Heuristic an
Adaptive Tool? 113
Thorsten Pachur, Peter M. Todd, Gerd Gigerenzer, Lael J. Schooler,
and Daniel G. Goldstein
xii CONTENTS
6 How Smart Forgetting Helps Heuristic Inference 144

Lael J. Schooler, Ralph Hertwig, and Stefan M. Herzog
7 How Groups Use Partial Ignorance to Make
Good Decisions 167
Konstantinos V. Katsikopoulos and Torsten Reimer
Part IV Redundancy and Variability in the World

8 Redundancy: Environment Structure That Simple
Heuristics Can Exploit 187
Jörg Rieskamp and Anja Dieckmann
9 The Quest for Take-the-Best: Insights and
Outlooks From Experimental Research 216
Arndt Bröder
10 Efficient Cognition Through Limited Search 241
Gerd Gigerenzer, Anja Dieckmann, and Wolfgang Gaissmaier
11 Simple Rules for Ordering Cues in One-Reason
Decision Making 274
Anja Dieckmann and Peter M. Todd
Part V Rarity and Skewness in the World

12 Why Rare Things Are Precious: How Rarity
Benefits Inference 309
Craig R. M. McKenzie and Valerie M. Chase
13 Ecological Rationality for Teams and Committees:
Heuristics in Group Decision Making 335
Torsten Reimer and Ulrich Hoffrage
14 Naïve, Fast, and Frugal Trees for Classification 360
Laura F. Martignon, Konstantinos V. Katsikopoulos,
and Jan K. Woike
15 How Estimation Can Benefit From an
Imbalanced World 379
Ralph Hertwig, Ulrich Hoffrage, and Rüdiger Sparr
Part VI Designing the World

16 Designed to Fit Minds: Institutions and
Ecological Rationality 409
Will M. Bennis, Konstantinos V. Katsikopoulos,
Daniel G. Goldstein, Anja Dieckmann, and Nathan Berg
17 Designing Risk Communication in Health 428
Stephanie Kurzenhäuser and Ulrich Hoffrage
CONTENTS xiii
18 Car Parking as a Game Between Simple Heuristics 454

John M. C. Hutchinson, Carola Fanselow, and Peter M. Todd
Part VII Afterword

19 Ecological Rationality: The Normative
Study of Heuristics 487
Gerd Gigerenzer and Peter M. Todd
References 498
Name Index 552
Subject Index 567
The ABC Research Group
The ABC Research Group is an interdisciplinary and international

collection of scientists studying the mechanisms of bounded
rationality and how good decisions can be made in an uncertain
world. Its home, the Center for Adaptive Behavior and Cognition,
founded in 1995, is at the Max Planck Institute for Human
Development in Berlin, Germany.
Will M. Bennis Henry Brighton

University of New York Center for Adaptive Behavior and
in Prague Cognition
Legerova 72 Max Planck Institute for Human
120 00 Prague Development
Czech Republic Lentzeallee 94
wbennis@faculty.unyp.cz 14195 Berlin
Germany
Nathan Berg hbrighton@mpib-berlin.mpg.de
School of Economic, Political,
and Policy Sciences Arndt Bröder
University of Texas Universität Mannheim
at Dallas Lehrstuhl für Allgemeine Psychologie
800 W. Campbell Rd., GR31 Schloss, EO 265
Richardson, TX 75080-3021 68131 Mannheim
USA Germany
nberg@utdallas.edu broeder@uni-mannheim.de
xvi THE ABC RESEARCH GROUP
Valerie M. Chase Daniel G. Goldstein

Breisacherstrasse 35 Yahoo Research
4057 Basel 111 West 40th Street
Switzerland New York, NY 10018
USA
Anja Dieckmann dan@dangoldstein.com
GfK Group
Nordwestring 101 Ralph Hertwig
90419 Nürnberg Center for Cognitive and
Germany Decision Sciences
anja.dieckmann@gfk.com Department of Psychology
University of Basel
Carola Fanselow Missionsstrasse 64a
Universität Potsdam 4055 Basel
Department Linguistik Switzerland
Haus 14/35 ralph.hertwig@unibas.ch
Karl-Liebknecht-Straße 24-25
14476 Potsdam Stefan Herzog
Germany Center for Cognitive and
cfanselow@yahoo.co.uk Decision Sciences
Department of Psychology
Klaus Fiedler University of Basel
Psychologisches Institut Missionsstrasse 64a
Universität Heidelberg 4055 Basel
Hauptsstrasse 47-51 Switzerland
69117 Heidelberg stefan.herzog@unibas.ch
Germany
klaus.fiedler@psychologie. Ulrich Hoffrage
uni-heidelberg.de Faculty of Business and
Economics (HEC)
Wolfgang Gaissmaier University of Lausanne
Harding Center for Risk Literacy Quartier UNIL-Dorigny
Max Planck Institute for Human Bâtiment Internef
Development 1015 Lausanne
Lentzeallee 94 Switzerland
14195 Berlin ulrich.hoffrage@unil.ch
Germany
gaissmaier@mpib-berlin.mpg.de Robin M. Hogarth
Department of Economics &
Gerd Gigerenzer Business
Center for Adaptive Behavior and Universitat Pompeu Fabra
Cognition Ramon Trias Fargas, 25-27
Max Planck Institute for Human 08005 Barcelona
Development Spain
Lentzeallee 94 robin.hogarth@upf.edu
14195 Berlin
gigerenzer@mpib-berlin.mpg.de
THE ABC RESEARCH GROUP xvii
John M.C. Hutchinson Henrik Olsson

Senckenberg Museum für Center for Adaptive Behavior
Naturkunde Görlitz and Cognition
PF 300154 Max Planck Institute for Human
02806 Görlitz Development
Germany Lentzeallee 94
majmch@googlemail.com 14195 Berlin
Germany
Konstantinos V. Katsikopoulos olsson@mpib-berlin.mpg.de
Center for Adaptive Behavior and
Cognition Thorsten Pachur
Max Planck Institute for Human Center for Cognitive and
Development Decision Sciences
Lentzeallee 94 Department of Psychology
14195 Berlin University of Basel
Germany Missionsstrasse 64a
katsikop@mpib-berlin.mpg.de 4055 Basel
Switzerland
Stephanie Kurzenhäuser thorsten.pachur@unibas.ch
Center for Cognitive and Decision
Sciences Torsten Reimer
Department of Psychology Brian Lamb School of
University of Basel Communication and Department
Missionsstrasse 64a of Psychological Sciences
4055 Basel Purdue University
Switzerland 100 North University Street
s.kurzenhaeuser@gmx.net West Lafayette, IN 47907-2098
USA
Laura F. Martignon treimer@purdue.edu
Institute of Mathematics and
Computing Jörg Rieskamp
Ludwigsburg University Department of Psychology
of Education University of Basel
Reuteallee 46 Missionsstrasse 62a
71634 Ludwigsburg 4055 Basel
Germany Switzerland
martignon@ph-ludwigsburg.de joerg.rieskamp@unibas.ch
Craig R. M. McKenzie Lael J. Schooler
Rady School of Management and Center for Adaptive Behavior and
Department of Psychology Cognition
UC San Diego Max Planck Institute for Human
9500 Gilman Dr. Development
La Jolla, CA 92093-0553 Lentzeallee 94
USA 14195 Berlin
cmckenzie@ucsd.edu Germany
schooler@mpib-berlin.mpg.de
xviii THE ABC RESEARCH GROUP
Rüdiger Sparr Jan K. Woike

Rohde & Schwarz SIT GmbH Faculty of Business and
Am Studio 3 Economics (HEC)
12489 Berlin University of Lausanne
Germany Quartier UNIL-Dorigny
ruediger.sparr@rohde-schwarz.com Bâtiment Internef
1015 Lausanne
Peter M. Todd Switzerland
Cognitive Science Program and jankristian.woike@unil.ch
School of Informatics and
Computing
Indiana University
1101 E. 10th Street
Bloomington, IN 47405
USA
peter.m.todd@gmail.com
Part I
THE RESEARCH AGENDA
1
What Is Ecological Rationality?
Peter M. Todd
Gerd Gigerenzer
Human rational behavior...is shaped by a scissors whose

two blades are the structure of task environments and the
computational capabilities of the actor.
Herbert A. Simon
“
M ore information is always better, full information is best. More
computation is always better, optimization is best.” More-is-better
ideals such as these have long shaped our vision of rationality. The
philosopher Rudolf Carnap (1947), for instance, proposed the
“principle of total evidence,” which is the recommendation to use
all the available evidence when estimating a probability. The statis-
tician I. J. Good (1967) argued, similarly, that it is irrational to make
observations without using them. Going back further in time, the
Old Testament says that God created humans in his image (Genesis
1:26), and it might not be entirely accidental that some form of
omniscience (including knowledge of all relevant probabilities
and utilities) and omnipotence (including the ability to compute
complex functions in a blink) has sneaked into models of human
cognition. Many theories in the cognitive sciences and economics
have recreated humans in this heavenly image—from Bayesian
models to exemplar models to the maximization of expected utility.
Yet as far as we can tell, humans and other animals have always
relied on simple strategies or heuristics to solve adaptive problems,
ignoring most information and eschewing much computation
rather than aiming for as much as possible of both. In this book,
we argue that in an uncertain world, more information and com-
putation is not always better. Most important, we ask why and
when less can be more. The answers to this question constitute the
idea of ecological rationality, how we are able to achieve intelli-
gence in the world by using simple heuristics in appropriate con-
texts. Ecological rationality stems in part from the nature of those
3
4 THE RESEARCH AGENDA
heuristics, and in part from the structure of the environment: Our

intelligent, adaptive behavior emerges from the interaction of both
mind and world. Consider the examples of investment and sports.
Making Money
In 1990, Harry Markowitz received the Nobel Prize in Economics
for his path-breaking work on optimal asset allocation. He addressed
a vital investment problem that everyone faces in some form or
other, be it saving for retirement or earning money on the stock
market: How to invest your money in N available assets. It would
be risky to put everything in one basket; therefore, it makes sense
to diversify. But how? Markowitz (1952) derived the optimal rule
for allocating wealth across assets, known as the mean–variance
portfolio, because it maximizes the return (mean) and minimizes
the risk (variance). When considering his own retirement invest-
ments, we could be forgiven for imagining that Markowitz used his
award-winning optimization technique. But he did not. He relied
instead on a simple heuristic:
1/ N rule: Invest equally in each of the N alternatives.
Markowitz was not alone in using this heuristic; empirical stud-

ies indicate that about 50% of ordinary people intuitively rely on it
(Huberman & Jiang, 2006). But isn’t this rule naive and silly? Isn’t
optimizing always better? To answer these questions, a study com-
pared the 1/N rule with the mean–variance portfolio and 13 other
optimal asset allocation policies in seven investment problems,
such as allocating one’s money among 10 American industry funds
(DeMiguel, Garlappi, & Uppal, 2009). The optimizing models
included sophisticated Bayesian and non-Bayesian models, which
got 10 years of stock data to estimate their parameters for each
month of portfolio prediction and investment choices. The 1/N
rule, in contrast, ignores all past information. The performance of
all 15 strategies was evaluated by three standard financial measures,
and the researchers found that 1/N came out near the top of the
pack for two of them (in first place on certainty equivalent returns,
second on turnover, and fifth on the Sharpe ratio). Despite complex
estimations and computations, none of the optimization methods
could consistently earn better returns than the simple heuristic.
How can a simple heuristic outperform optimizing strategies?
Note that in an ideal world where the mean–variance portfolio
could estimate its parameters perfectly, that is, without error, it
would do best. But in an uncertain world, even with 10 years’ worth
of data, optimization no longer necessarily leads to the best out-
come. In an uncertain world, one needs to ignore information to
WHAT IS ECOLOGICAL RATIONALITY? 5
make better decisions. Yet our point is not that simple heuristics
are better than optimization methods, nor the opposite, as is typi-
cally assumed. No heuristic or optimizing strategy is the best in all
worlds. Rather, we must always ask, in what environments does a
given heuristic perform better than a complex strategy, and when is
the opposite true? This is the question of the ecological rationality
of a heuristic. The answer requires analyzing the information-
processing mechanism of the heuristic, the information structures
of the environment, and the match between the two. For the choice
between 1/N and the mean–variance portfolio, the relevant envi-
ronmental features include (a) degree of uncertainty, (b) number
N of alternatives, and (c) size of the learning sample.
It is difficult to predict the future performance of funds because
uncertainty is high. The size of the learning sample is the estima-
tion window, with 5 to 10 years of data typically being used to cali-
brate portfolio models in investment practice. The 1/N rule tends to
outperform the mean–variance portfolio if uncertainty is high, the
number of alternatives is large, and the learning sample is small.
This qualitative insight allows us to ask a quantitative question: If
we have 50 alternatives, how large a learning sample do we need so
that the mean–variance portfolio eventually outperforms the simple
heuristic? The answer is: 500 years of stock data (DeMiguel et al.,
2009). Thus, if you started keeping track of your investments now,
in the 26th century optimization would finally pay off, assuming
that the same funds, and the stock market, are still around.
Catching Balls
Now let us think about sports, where players are also faced with
challenging, often emotionally charged problems. How do players
catch a fly ball? If you ask professional players, they may well stare
at you blankly and respond that they had never thought about it—
they just run to the ball and catch it. But how do players know
where to run? A standard account is that minds solve such complex
problems with complex algorithms. An obvious candidate complex
algorithm is that players unconsciously estimate the ball’s trajec-
tory and run as fast as possible to the spot where the ball will hit
the ground. How else could it work? In The Selfish Gene, biologist
Richard Dawkins (1989, p. 96) discusses exactly this:
When a man throws a ball high in the air and catches it again, he
behaves as if he had solved a set of differential equations in pre-
dicting the trajectory of the ball. He may neither know nor care
what a differential equation is, but this does not affect his skill
with the ball. At some subconscious level, something function-
ally equivalent to the mathematical calculation is going on.
Computing the trajectory of a ball is not a simple feat. Theoretically,

balls have parabolic trajectories. To select the right parabola, play-
ers would have to estimate the ball’s initial distance, initial veloc-
ity, and projection angle. Yet in the real world, balls do not fly in
parabolas, due to air resistance, wind, and spin. Thus, players’
brains would further need to estimate, among other things, the
speed and direction of the wind at each point of the ball’s flight, in
order to compute the resulting path and the point where the ball
will land. All this would have to be completed within a few sec-
onds—the time a ball is in the air. Note that Dawkins carefully
inserts the term “as if,” realizing that the estimations and computa-
tions cannot really be done consciously but suggesting that the
unconscious somehow does something akin to solving the differen-
tial equations. Yet the evidence does not support this view: In
experiments, players performed poorly in estimating where the ball
would strike the ground (Babler & Dannemiller, 1993; Saxberg,
1987; Todd, 1981). After all, if professional baseball players were
able to estimate the trajectory of each hit and know when it would
land out of reach, we would not see them running into walls, dug-
outs, and over the stands trying to catch fly balls.
As in the investment problem, we can take a different approach
and instead ask: Is there a simple heuristic that players use to
catch balls? Experimental studies have shown that experienced
players in fact use various rules of thumb. One of these is the gaze
heuristic, which works in situations where a ball is already high up
in the air:
Gaze heuristic: Fixate your gaze on the ball, start running, and
adjust your running speed so that the angle of gaze remains
constant.
The angle of gaze is the angle between the eye and the ball, rela-
tive to the ground. Players who use this rule do not need to measure
wind, air resistance, spin, or the other causal variables. They can
get away with ignoring all these pieces of causal information. All
the relevant facts are contained in only one variable: the angle of
gaze. Note that players using the gaze heuristic are not able to com-
pute the point at which the ball will land, just as demonstrated by
the experimental results. But the heuristic nevertheless leads them
to the landing point in time to make the catch.
Like the 1/N rule, the gaze heuristic is successful in a particular
class of situations, not in all cases, and the study of its ecological
rationality aims at identifying that class. As many ball players say,
the hardest ball to catch is the one that heads straight at you, a situ-
ation in which the gaze heuristic is of no use. As mentioned before,
the gaze heuristic works in situations where the ball is already high
in the air, but it fails if applied right when the ball is at the begin-
ning of its flight. However, in this different environmental condi-
tion, players do not need a completely new heuristic—just a slightly
modified one, with a different final step (McBeath, Shaffer, & Kaiser,
1995; Shaffer, Krauchunas, Eddy, & McBeath, 2004):
Modified gaze heuristic: Fixate your gaze on the ball, start run-
ning, and adjust your running speed so that the image of the
ball rises at a constant rate.
The operation of this modified rule is intuitive: If players see the

ball appear to rise with accelerating gaze angle, they had better run
backward, because otherwise the ball will hit the ground behind
their present position. If, however, the ball rises with decreasing
apparent speed, they need to run toward it instead. Thus, different
but related rules apply in different situations—these are the kinds
of relationships that the study of ecological rationality aims to
reveal. As we will see, there is much work to be done—and many
approaches that can be applied—to reveal these relationships.
Unfortunately, we cannot simply ask the users of these rules: Most
fielders are blithely unaware of their reliance on the gaze heuristic,
despite its simplicity (McBeath et al., 1995; Shaffer & McBeath,
2005). Other heuristics such as the 1/N rule may be consciously
taught and applied, but without practitioners knowing why they
work, and when. We must explore to find out.
What Is a Heuristic?
As these examples illustrate, a heuristic is a strategy that ignores

available information. It focuses on just a few key pieces of data to
make a decision. Yet ignoring some information is exactly what is
needed for better (and faster) judgments, and in this book we inves-
tigate how and when this can be so. Heuristics are where the rubber
meets the road, or where the mind meets the environment, by guid-
ing action in the world. They process the patterns of information
available from the environment, via their building blocks based on
evolved capacities (described below), to produce goal-directed
behavior.
Humans and other animals use many types of heuristics to meet
the adaptive challenges they face. But each new task does not nec-
essarily demand a new heuristic: One heuristic can be useful for a
broad range of problems. The gaze heuristic, for instance, did not
evolve for the benefit of baseball and cricket outfielders. Intercepting
moving objects is an important adaptive task in human and animal
history. From fish to birds to bats, many animals are able to track an
object moving through three-dimensional space, which is an

evolved capacity necessary for executing the gaze heuristic. Some
teleost fish catch their prey by keeping a constant angle between
their own line of motion and that of their target; male hoverflies
intercept females in the same way for mating (Collett & Land, 1975).
And we can readily generalize the gaze heuristic from its evolution-
ary origins, such as in hunting, to ball games and other modern
applications. Sailors use the heuristic in a related way: If another
boat approaches and a collision might occur, then fixate your
eye on the other boat; if the bearing remains constant, turn away,
because otherwise a collision will occur. Again, these methods
are faster and more reliable than estimating the courses of two
moving objects and calculating whether there is an intersection
point. As we will see, simple rules are less prone to estimation and
calculation error and hence often more reliable in appropriate
situations.
Similarly, the 1/N rule is not just for making money. It is an
instance of a class of rules known as equality heuristics, which are
used to solve problems beyond financial investment. If you have
two or more children, how do you allocate your time and resources
among them? Many parents try to distribute their attention equally
among their N children (Hertwig, Davis, & Sulloway, 2002). Children
themselves often divide money equally among players in experi-
mental games such as the ultimatum game, a behavior that is not
predicted by game theory but is consistent with the human sense of
fairness and justice (Takezawa, Gummerum, & Keller, 2006).
Building Blocks of Heuristics

Most heuristics are made up of multiple building blocks. There are
a limited number of kinds of building blocks, including search
rules, stopping rules, and decision rules; by combining different
sets of these, many different heuristics can be constructed. For
instance, to choose a mate, a peahen does not investigate all pea-
cocks posing and displaying to get her attention, nor does she
weight and add all male features to calculate the one with the high-
est expected utility. Rather, she investigates only three or four and
picks the one with the largest number of eyespots (Petrie & Halliday,
1994). This mate choice heuristic is a form of satisficing (Table 1-1)
that consists of the simple search rule “investigate males in your
proximity,” the stopping rule “stop search after a sample of four,”
and the decision rule “choose on the basis of one cue (number of
eyespots).” Given a particular heuristic, changing one or more of its
building blocks allows the creation of a related heuristic adapted to
different problems, as illustrated by the modifications of the gaze
heuristic above.
Table 1-1: Twelve Well-Studied Heuristics With Evidence of Use in the Adaptive Toolbox of Humans
Heuristic Definition Ecologically rational if: Surprising findings (examples)
Recognition heuristic If one of two alternatives is Recognition validity > .5 Less-is-more effect if α > β;
(Goldstein & recognized, infer that it has the systematic forgetting can be
Gigerenzer, 2002; higher value on the criterion. beneficial (chapter 6)
chapter 5)
Fluency heuristic If both alternatives are recognized Fluency validity > .5 Less-is-more effect; systematic
(Schooler & Hertwig, but one is recognized faster, forgetting can be beneficial
2005; chapter 6) infer that it has the higher value
on the criterion.
Take-the-best (Gigerenzer To infer which of two alternatives Cue validities vary, high Often predicts more accurately
& Goldstein, 1996; has the higher value: (a) search redundancy than multiple regression
chapter 2) through cues in order of validity; (Czerlinski, Gigerenzer, &
(b) stop search as soon as a cue Goldstein, 1999), neural
discriminates; (c) choose the networks, exemplar models, and
alternative this cue favors. decision tree algorithms
Tallying (unit-weight To estimate a criterion, do not Cue validities vary Often predicts as accurately as or
linear model; Dawes, estimate weights but simply little, low redundancy better than multiple regression
1979) count the number of positive (Hogarth & Karelaia, (Czerlinski et al., 1999)
cues. 2005a, 2006b)
Satisficing (Simon, Search through alternatives and Distributions of available Aspiration levels can lead to
1955a; Todd & Miller, choose the first one that exceeds options and other costs substantially better choice than
1999; chapter 18) your aspiration level. and benefits of search chance, even if they are arbitrary
are unknown (e.g., Bruss, 2000)
One-bounce rule (Hey, Continue searching (e.g., for prices) Improvements come in Taking search costs into
1982) as long as options improve; at the streaks consideration in this rule does
first downturn, stop search and not improve performance
take the previous best option.
(Continued )
Table 1-1: Twelve Well-Studied Heuristics With Evidence of Use in the Adaptive Toolbox of Humans
Heuristic Definition Ecologically rational if: Surprising findings (examples)
Gaze heuristic To catch a ball, fix your gaze on it, The ball is coming down Balls will be caught while
(Gigerenzer, 2007; start running, and adjust your from overhead running, possibly on a curved
McBeath, Shaffer, & running speed so that the angle path
Kaiser, 1995) of gaze remains constant.
1/N rule (DeMiguel, Allocate resources equally to each High unpredictability, Can outperform optimal asset
Garlappi, & Uppal, of N alternatives. small learning sample, allocation portfolios
2009) large N
Default heuristic If there is a default, follow it. Values of those who Explains why advertising has
(Johnson & Goldstein, set defaults match little effect on organ donor
2003; chapter 16) those of the decision registration; predicts behavior
maker; consequences when trait and preference
of a choice are hard to theories fail
foresee
Tit-for-tat (Axelrod, Cooperate first and then imitate The other players also Can lead to a higher payoff than
1984) your partner’s last behavior. play tit-for-tat “rational” strategies (e.g. by
backward induction)
Imitate the majority Determine the behavior followed Environment is stable or A driving force in bonding,
(Boyd & Richerson, by the majority of people in only changes slowly; group identification, and moral
2005) your group and imitate it. info search is costly or behavior
time consuming
Imitate the successful Determine the most successful Individual learning is A driving force in cultural
(Boyd & Richerson, person and imitate his or her slow; info search is evolution
2005) behavior. costly or time consuming
Note. For formal definitions and conditions concerning ecological rationality and surprising findings, see references indicated and related chapters
in this book.
Evolved Capacities
Building blocks of heuristics are generally based on evolved cap-
acities. For instance, in the gaze heuristic, to keep the gaze angle
constant an organism needs the capacity to track an object visually
against a noisy background—something that no modern robot or
computer vision system can do as well as organisms (e.g., humans)
that have evolved to follow targets. When we use the term evolved
capacity, we refer to a product of nature and nurture—a capacity
that is prepared by the genes of a species but usually needs experi-
ence to be fully expressed. For instance, 3-month-old babies spon-
taneously practice holding their gaze on moving targets, such as
mobiles hanging over their crib. Evolved capacities are one reason
why simple heuristics can perform so well: They enable solutions
to complex problems that are fundamentally different from the
mathematically inspired ideal of humans and animals somehow
optimizing their choices. Other capacities underlying heuristic
building blocks include recognition memory, which the recogni-
tion heuristic and fluency heuristics exploit, and counting and
recall, which take-the-best and similar heuristics can use to esti-
mate cue orders.
The Adaptive Toolbox

We refer to the repertoire of heuristics, their building blocks, and
the evolved capacities they exploit as the mind’s adaptive toolbox
(Gigerenzer & Selten, 2001; Gigerenzer & Todd 1999). Table 1-1
lists a dozen heuristics that are likely in the adaptive toolbox of
humans, and in some other animal species, although the last couple
are rare even in primates and the evidence is controversial. The
content of the adaptive toolbox depends not only on the species,
but also on the individual and its particular stage of ontogenetic
development and the culture in which it lives.
The degree to which species share heuristics will depend on
whether they face the same adaptive problems, inhabit environ-
ments with similar structures, and share the evolved capacities on
which the heuristics are built. For instance, while the absence of
language production from the adaptive toolbox of other animals
means they cannot use name recognition to make inferences about
their world, some animal species can use other capacities, such as
taste and smell recognition, as input for the recognition heuristic.
A shared capacity between two species makes it more likely that
they will rely on similar heuristics, even if they have to solve differ-
ent problems, such as intercepting prey as opposed to fly balls. If
two species face the same adaptive problem but their evolved capac-
ities differ, this will lead to different heuristics. Consider estimation
of area. Humans can visually estimate area by combining height

and width dimensions. Some species of ants, instead, can produce
pheromone trails, leading to a very different area-estimation heu-
ristic based on this capacity: To judge the area of a candidate nest
cavity (typically a narrow crack in a rock), run around on an irregu-
lar path for a fixed period of time, laying down a pheromone trail;
then leave; then return to the cavity, move around on a different
irregular path, and estimate the cavity’s size by the inverse of the
frequency of reencountering the old trail. This heuristic is remark-
ably precise—nests that are half the area of others yield reencoun-
ter frequencies about 1.96 times greater (Mugford, Mallon, & Franks,
2001). Many such evolved rules of thumb in animals (including
humans) are amazingly simple and efficient (see the overview by
Hutchinson & Gigerenzer, 2005).
What Is Not a Heuristic?

Not all of the cognitive mechanisms that humans use, or devise for
use by artificial systems, are heuristics. Strategies such as the mean–
variance portfolio and the trajectory prediction approach described
above are not heuristics, because they attempt to weight and add
all available information and make use of heavy computation to
reach “optimal” decisions. The origins of such optimization theo-
ries can be traced back to the classical theory of rationality that
emerged during the Enlightenment. The birth year of this view has
been dated 1654, when the French mathematicians Blaise Pascal and
Pierre Fermat defined rational behavior as the maximization of the
expected value of alternative courses of action (Daston, 1988;
Gigerenzer et al., 1989). This vision of rationality goes hand in hand
with the notion that complex problems need to be solved by complex
algorithms and that more information is always better. A century later,
Benjamin Franklin described the ideal of weighting and adding all
reasons in a letter to his nephew (Franklin, 1779/1907 pp. 281-282):
April 8, 1779
If you doubt, set down all the Reasons, pro and con, in
opposite Columns on a Sheet of Paper, and when you have
considered them two or three Days, perform an Operation
similar to that in some questions of Algebra; observe what
Reasons or Motives in each Column are equal in weight, one
to one, one to two, two to three, or the like, and when you
have struck out from both Sides all the Equalities, you will see
in which column remains the Balance.… This kind of Moral
Algebra I have often practiced in important and dubious
Concerns, and tho’ it cannot be mathematically exact, I have
found it extreamly [sic] useful. By the way, if you do not learn

it, I apprehend you will never be married.
I am ever your affectionate Uncle,
B. FRANKLIN
Modern versions of Franklin’s moral algebra include expected

utility maximization in economics, Bayesian inference theories in
the cognitive sciences, and various bookkeeping principles taught
in MBA courses and recommended by consulting firms. Markowitz’s
mean–variance optimization model and the calculation of a ball’s
trajectory are all variants of this form of calculative rationality.
Note that Franklin ends with the warning that learning his moral
algebra is necessary for marriage. We checked whether Franklin’s
admonition holds among a sample of economists who teach modern
versions of this optimizing view of rationality, asking them whether
they had chosen their partner using their favorite rational method.
Only one had. He explained that he had listed all the options he
had and all the important consequences that he could think of
for each woman, such as whether she would still be interesting to
talk to after the honeymoon excitement was over, would be good
at taking care of children, and would support him in his work
(cf. Darwin’s similar considerations—Gigerenzer & Todd, 1999). He
took several days to estimate the utilities of each of these conse-
quences and the probabilities for each woman that these conse-
quences would actually occur. Then he calculated the expected
utility for each candidate and proposed to the woman with the
highest value, without telling her how he had made his choice. She
accepted and they married. And now he is divorced.
The point of this story is emphatically not that Franklin’s
rational bookkeeping method is less successful in finding good
mates than simple heuristics, such as “try to get the woman that
your peers desire” (known as mate choice copying, which humans
and other animals follow—Place, Todd, Penke, & Asendorpf, 2010).
Rather, our point is that there is a discrepancy between theory and
practice: Despite the weight-and-add approach being advertised as
the rational method, even devoted proponents often instead rely on
heuristics in important decisions (Gigerenzer, 2007). Health is
another case in point. In a study, more than 100 male economists
were asked how they decided whether to have a prostate cancer
screening test (the PSA, or prostate specific antigen test—Berg,
Biele, & Gigerenzer, 2010). For this and other screening tests, virtually
all medical societies recommend that patients carefully weigh pros
and cons before deciding whether or not to have it; in this par-
ticular case, the benefit remains controversial (it is not proven that
screening saves lives) whereas its harms are clear (such as possible
incontinence and impotence from operations following positive

tests). Yet two thirds of the economists interviewed said that they
had not weighed any pros and cons regarding this test but just did
whatever their doctors (or wives) said they should do. These cham-
pions of rationality were using the social heuristic “trust your
doctor” rather than the traditionally rational approach to make this
important decision. Again, theory and practice are at odds.
But which is right? We cannot say, without further investigation:
A heuristic is neither good nor bad per se, nor is a rational approach
such as Franklin’s bookkeeping method. Rather, the study of
ecological rationality informs us that we must ask a further all-
important question: In what environments does a given decision
strategy or heuristic perform better than other approaches? For
instance, in a world where doctors practice defensive decision
making because of fear of lawyers and malpractice trials (leading to
overtreatment and overmedication of patients) and where most
doctors do not have the time to read the relevant medical studies, it
pays to weigh pros and cons oneself rather than rely on the trust-
your-doctor heuristic (Gigerenzer, Gaissmaier, Kurz-Milcke,
Schwartz, & Woloshin, 2007).
What Is Ecological Rationality?
The concept of ecological rationality—of specific decision-making

tools fit to particular environments—is intimately linked to that
of the adaptive toolbox. Traditional theories of rationality that
instead assume one single universal decision mechanism do not
even ask when this universal tool works better or worse than any
other, because it is the only one thought to exist. Yet the empirical
evidence looks clear: Humans and other animals rely on multiple
cognitive tools. And cognition in an uncertain world would be
inferior, inflexible, and inefficient with a general-purpose optimiz-
ing calculator, for reasons described in the next section (see also
chapter 2).
We use the term ecological rationality both for a general vision of
rationality and a specific research program. As a general vision, it
provides an alternative to views of rationality that focus on internal
consistency, coherence, or logic and leave out the external environ-
ment. Ecological rationality is about the success of cognitive strate-
gies in the world, as measured by currencies such as the accuracy,
frugality, or speed of decisions. In our previous book, Simple
Heuristics That Make Us Smart, we introduced this term to flesh
out Herbert Simon’s adaptive view of rational behavior (Gigerenzer,
Todd, & the ABC Research Group, 1999). As Simon put it, “Human
rational behavior...is shaped by a scissors whose two blades are
the structure of task environments and the computational capabili-

ties of the actor” (Simon, 1990, p. 7). We use the term logical ratio-
nality for theories that evaluate behavior against the laws of logic
or probability rather than success in the world, and that ask ques-
tions such as whether behavior is consistent, uses all information,
or corresponds to an optimization model. Logical rationality is
determined a priori—that is, what is good and bad is decided by
abstract principles—instead of by testing behavior in natural envi-
ronments. Shortly before his death, Simon assessed the ecological
rationality approach as a “revolution in cognitive science, striking
a great blow for sanity in the approach to human rationality”
(see Gigerenzer, 2004b), and Vernon Smith further promoted the
approach, using it in the title of his Nobel Laureate lecture (Smith,
2003). While it is being pursued by a growing number of such
leading researchers, the ecological approach is at present still a
small island compared to the wide empire of logical theories of
rationality.
As a research program, the study of ecological rationality inves-
tigates the fit between the two blades of Simon’s scissors. Fitting
well does not mean that the blades are mirror image reflections of
each other (cf. Shepard, 1994/2001; Todd & Gigerenzer, 2001)—in
manufacturing, the two blades of a good pair of scissors are made to
slightly twist or to curve with respect to one another so that they
touch at only two places: the joint and the spot along the blades
where the cutting is taking place. Furthermore, for cognition to be
successful, there is no need for a perfect mental image of the envi-
ronment—just as a useful mental model is not a veridical copy of
the world, but provides key abstractions while ignoring the rest. In
the finest scissors, the two blades that are made to fit each other are
coded with an identification mark to make sure that they are treated
as a pair. The study of ecological rationality is about finding out
which pairs of mental and environmental structures go together. As
we discuss in more detail in a section to come, it is based on envi-
ronment description, computer simulation, empirical test, and
analysis and proof, and it centers on three questions:
Given a heuristic, in what environments does it succeed?

Given an environment, what heuristics succeed?
How do heuristics and environments co-evolve to shape each
other?
The investment example answers the first and second questions,

which are intimately related. For instance, given the 1/N rule,
investment environments with many options—large N—and a
relatively small sample size of past data are the right match. Or
given an environment with N = 50 and 10 years of stock data, the
1/N rule is likely to perform better than the mean–variance port-

folio. Table 1-1 provides further such results, and so do the follow-
ing chapters. The third question addresses a larger issue, the
co-evolution of the adaptive toolbox and its environment. About
this, we know comparatively little—more research is needed to
study systematically the mutual adaptation of heuristics and envi-
ronments in ontogenetic or phylogenetic time (see chapter 18 for an
example).
The Structure of Environments

An environment is what an agent acts in and upon. The environ-
ment also influences the agent’s actions in multiple ways, by deter-
mining the goals that the agent aims to fulfill, shaping the tools
that the agent has for reaching those goals, and providing the
inputs processed by the agent to guide its decisions and behavior.
No thorough classification of environment structures exists at pres-
ent, but several important structures have been identified. Three of
these were revealed in the analysis of the investment problem
above: the degree of uncertainty, the number of alternatives, and
the size of the learning sample. Given their relevance for a wide
range of tasks, we consider them here in more detail.
Uncertainty The degree of uncertainty refers to how well the available

cues can predict a criterion. Uncertainty varies with the kind of cri-
terion and the prediction to be made. Next month’s performance of
stocks and funds is highly unpredictable, heart attacks are slightly
more predictable, and tomorrow’s weather is the most accurately
predictable among these three criteria. Furthermore, uncertainty is
higher when one has to make predictions about a different popula-
tion rather than just a different time period for the same population
(see chapter 2). Our investment example illustrates the important
principle that the greater the uncertainty, the greater can be the
advantage of simple heuristics over optimization methods, Bayesian
and otherwise.
There is an intuitive way to understand this result. In a world
without uncertainty, inhabited by gods and their secularized
version, Laplace’s demon, all relevant past information will aid
in predicting the future and so needs to be considered. In a fully
unpredictable world, such as a perfect roulette wheel, one can
ignore all information about the past performance of the wheel,
which is useless in saying what will come next. Most of the time,
though, humble humans live in the twilight of partial predictability
and partial uncertainty. In this challenging world, a principal way
to cope with the rampant uncertainty we face is to simplify, that is,
to ignore much of the available information and use fast and frugal
heuristics. And yet this approach is often resisted: When a forecast-
ing model does not predict a criterion, such as the performance of
funds, as well as hoped, the gut reaction of many people, experts
and novices alike, is to do the opposite and call for more informa-
tion and more computation. The possibility that the solution may
lie in eliminating information and fancy computation is still
unimaginable for many and hard to digest even after it has been
demonstrated again and again (see chapter 3).
Number of Alternatives In general, problems with a large number of

alternatives pose difficulties for optimization methods. The term
alternatives can refer to individual objects (such as funds) or actions
(such as moves in a game). Even in many cases where there is an
optimal (best) sequence of moves, such as in chess, no computer or
mind can determine it, because the number of alternative action
sequences is too large and the problem is computationally intrac-
table. The computer chess program Deep Blue and human chess
masters (as well as Tetris players—see Michalewicz & Fogel, 2000)
have to rely instead on nonoptimizing techniques, including heu-
ristics. And people in more mundane everyday settings char-
acterized by an abundance of choices—such as when browsing
supermarket shelves or comparing phone service plans—are indeed
generally able to employ decision strategies to deal effectively
with numerous alternatives (Scheibehenne, Greifeneder, & Todd,
2010).
Sample Size In general, the smaller the sample size of available data
in the environment, the larger the advantage for simple heuristics.
One of the reasons is that complex statistical models have to esti-
mate their parameter values from past data, and if the sample size is
small, then the resulting error due to “variance” can exceed the error
due to “bias” in competing heuristics (see chapter 2). What consti-
tutes a small sample size depends on the degree of uncertainty, as
can be seen in the investment problem, where uncertainty is high:
In this case, a sample size of hundreds of years of stock data is
needed for the mean–variance portfolio to surpass the accuracy of
the 1/N rule.
There are many other important types of environment struc-
ture relevant for understanding ecological rationality. Two of
the major ones also considered in this book are redundancy and
variability.
Redundancy How highly correlated different cues are in the environ-

ment is an indication of that environment’s redundancy. This structure
can be exploited by heuristics such as take-the-best that rely on

the first good reason that allows a decision to be made and ignore
subsequent redundant cues (see chapters 8 and 9).
Variability The variability of importance of cues can be exploited by

several heuristics. For instance, when variability is high, heuristics
that rely on only the best cue perform better than when the vari-
ability is low (Hogarth & Karelaia, 2005a, 2006b; Martignon &
Hoffrage, 2002; see also chapter 13).
Note that our use of the term environment is not identical with
the physical or “objective” environment (Todd, 2001). For instance,
the first environment structure we discussed above, uncertainty,
comprises aspects of both the external environment (its inherent
unpredictability, or ontic uncertainty) and the mind’s limited under-
standing of that environment (epistemic uncertainty). Thus, the
degree of uncertainty is a property of the mind–environment system.
Similarly, the number of alternatives and the sample size depend
both on what is available in an environment and what an agent actu-
ally includes in its consideration set (such as the number N of funds
to be decided upon). Finally, redundancy and variability of cues
depend on what information is available in the physical environ-
ment, and also on what the decision makers actually perceive and
attend to, which can result in a more or less redundant and varying
set of cues to use. People in groups, for instance, tend to consider
redundant cues, but they could choose to explore further and dis-
cover more independent cues, and in this way partly create their
environment (see chapter 13). Thus, the environment considered in
ecological rationality is the subjective ecology of the organism that
emerges through the interaction of its mind, body, and sensory organs
with its physical environment (similar to von Uexküll’s, 1957, notion
of Umwelt).
Sources of Environment Structure

The patterns of information that decision mechanisms may (or may
not) be matched to can arise from a variety of environmental pro-
cesses, including physical, biological, social, and cultural sources.
Some of these patterns can be described in similar ways (e.g., in
terms of uncertainty or cue redundancy), but others are unique to
particular domains (e.g., the representation of medical informa-
tion). For humans and other social animals, the social and cultural
environment composed of other conspecifics can be just as impor-
tant as the physical or biological, and indeed all four interact
and overlap. For instance, an investment decision can be made
individually and purchased on the Internet without interacting
with anyone else, but the stock market itself is driven by both nature
(e.g., a disastrous hurricane) and other people (e.g., the public reac-
tion to a disaster). Each of the heuristics in Table 1-1 can be applied
to social objects (e.g., whom to hire, to trust, to marry) as well
as to physical objects (e.g., what goods to buy). As an example, the
recognition heuristic (see chapters 5 and 6) exploits environment
structures in which lack of recognition is valuable information and
aids inferences about, say, what microbrew to order and where to
invest, but also whom to talk to and whom to trust (“don’t ride with
a stranger”). Similarly, a satisficing heuristic can be used to select a
pair of jeans but also choose a mate (Todd & Miller, 1999), and the
1/N rule can help investors to diversify but also guide parents in
allocating their time and resources equally to their children.
Environment structures are also deliberately created by institu-
tions to influence behavior. Sometimes this is felicitous, as when
governments figure out how to get citizens to donate organs by
default, or design traffic laws for intersection right-of-way in a hier-
archical manner that matches people’s one-reason decision mecha-
nisms (chapter 16). In other cases, institutions create environments
that do not fit well with people’s cognitive processes and instead
cloud minds, accidentally or deliberately. For instance, informa-
tion about medical treatments is often represented in ways that
make benefits appear huge and harms inconsequential (chapter 17),
casinos set up gambling environments with cues that make gam-
blers believe the chance of winning is greater than it really is
(chapter 16), and store displays and shopping websites are crowded
with long lists of features of numerous products that can confuse cus-
tomers with information overload (Fasolo, McClelland, & Todd, 2007).
But there are ways to fix such problematic designs and make new
ones that people can readily find their way through, as we will see.
Finally, environment structure can emerge without design
through the social interactions of multiple decision makers. For
instance, people choosing a city to move to are often attracted by
large, vibrant metropolises, so that the “big get bigger,” which can
result in a J-shaped (or power-law) distribution of city populations
(a few teeming burgs, a number of medium-sized ones, and numer-
ous smaller towns). Such an emergent distribution, which is
seen in many domains ranging from book sales to website visits,
can in turn be exploited by heuristics for choice or estimation
(chapter 15). Similarly, drivers seeking a parking space using a par-
ticular heuristic create a pattern of available spots that serves as the
environment for future drivers to search through with their own
strategies, which may or may not fit that environment structure
(chapter 18). In these cases, individuals are, through the effects
of their own choices, shaping the environment in which they
and others must make further choices, creating the possibility of a
co-adapting loop between mind and world.
What We Already Know
To answer our questions about ecological rationality—when

and why different decision mechanisms in the mind’s adaptive
toolbox fit to different environment structures—we must build on a
growing foundation of knowledge about bounded rationality
and the use of heuristics. This was largely unknown territory in
1999 when we published Simple Heuristics That Make Us Smart,
laying out the program on which the present book is based. Since
then, an increasing number of researchers have contributed to
the exploration of this territory, providing evidence that people rely
on heuristics in situations where it is ecologically rational and
demonstrating the power of appropriate heuristics in the wild,
including business, medical diagnosis, and the law. Here we briefly
review the progress made that supports the work reported in this
book.
What Is in the Adaptive Toolbox?

To study the ecological rationality of heuristics, we must first iden-
tify those being used. Table 1-1 provides an indication of the range
of heuristics that have been studied, but there are numerous others.
We know that many of the same heuristics are relied on by humans
and other animal species (Hutchinson & Gigerenzer, 2005). There is
now considerable evidence of the use of heuristics that make no
trade-offs between cues, such as take-the-best (chapter 9) and elim-
ination-by-aspects (Tversky, 1972). Recent studies have provided
further evidence for such so-called noncompensatory strategies in
consumer choice (Kohli & Jedidi, 2007; Yee, Hauser, Orlin, & Dahan,
2007). Related “one reason” decision heuristics have also been
proposed for another domain, choices between gambles, that has
traditionally been the realm of weighting-and-adding theories,
but the evidence for these mechanisms, such as the priority heuris-
tic (Brandstätter, Gigerenzer, & Hertwig, 2006; Katsikopoulos &
Gigerenzer, 2008), is under debate (e.g., Brandstätter, Gigerenzer, &
Hertwig, 2008; Johnson, Schulte-Mecklenbeck, & Willemsen, 2008).
Other recently investigated heuristics in the adaptive toolbox are
instead compensatory, combining more than one piece of informa-
tion while still ignoring much of what is available (e.g., tallying and
take-two—see chapters 3 and 10).
Among humans, an individual’s adaptive toolbox is not fixed—
its contents can grow as a consequence of development, individual
learning, and cultural experience. But little is known about how the
set of available tools changes over the life course, from birth to death
(Gigerenzer, 2003). Preliminary results suggest that age-related cog-
nitive decline leads to reliance on simpler strategies; nevertheless,
young and old adults seem to be equally adapted decision makers

in how they adjust their use of heuristics as a function of environ-
ment structure (Mata, Schooler, & Rieskamp, 2007). This result
leads to the next issue.
How Are Heuristics Selected?

Ecologically rational behavior arises from the fit between the
current task environment and the particular decision mechanism
that is applied to it—so to study such behavior, we must also
know what heuristics an individual has selected to use. In their
seminal work on the adaptive decision maker, Payne, Bettman, and
Johnson (1993) provided evidence that people tend to select heuris-
tics in an adaptive way. This evidence focused on preferential
choice, where there is no objectively correct answer. Subsequently,
similar evidence was obtained for the ecologically rational use of
heuristics in inductive inference, where decision accuracy can be
assessed (e.g., Bröder, 2003; Dieckmann & Rieskamp, 2007; Pohl,
2006; Rieskamp & Hoffrage, 2008; Rieskamp & Otto, 2006). The
observation that people tend to rely on specific heuristics in appro-
priate situations where they perform well raised a new question:
How does the mind select heuristics from the adaptive toolbox?
This mostly unconscious process is only partly understood, but
three selection principles have been explored.
Memory Constrains Selection First, consider making a selection among the

top three heuristics in Table 1-1: the recognition heuristic, the flu-
ency heuristic, and take-the-best. Say we are betting on a tennis
match between Andy Roddick and Tommy Robredo. What strategy
can we use to select a winner before the start of the match? If
we have heard of Roddick but not of Robredo, then this available
information in memory restricts the strategy choice set to the recog-
nition heuristic alone (which in this case may well lead to a correct
prediction—the two contestants have played each other many times,
with Roddick usually winning); if we have heard of both players but
know nothing except their names, this restricts the choice to the flu-
ency heuristic (see chapter 6); and if we have heard of both and
know some additional facts about them, then we can choose between
the fluency heuristic and take-the-best. If neither player’s name is in
our memory, then none of these three heuristics applies. This does
not mean that we have to guess—we can check the current odds and
then imitate the majority, betting on the player whom most others
also favor (Table 1-1). Thus, the information available in the deci-
sion maker’s memory constrains the choice set of heuristics
(Marewski & Schooler, 2011), creating a first heuristic selection
principle.
Learning by Feedback The available information in memory limits what

heuristics can be used. But if there are still multiple alternative
heuristics to choose from, feedback from past experience can
guide their selection. Strategy selection theory (Rieskamp & Otto,
2006) provides a quantitative model that can be understood in
terms of reinforcement learning, where the unit of reinforcement
is not a behavior, but a heuristic. This model makes predictions
about the probability that a person selects one strategy within
a defined set of strategies (e.g., the set that remains after memory
constraints).
Ecological Rationality The third selection principle relies on the struc-

ture of the environment, as described by the study of ecological
rationality. For instance, the recognition heuristic is likely to lead
to accurate (and fast) judgments if the validity of recognition infor-
mation is high; that is, if a strong correlation between recognition
and the criterion exists, as is the case for professional tennis play-
ers and the probability that they will win a match. There is experi-
mental evidence that people tend to rely on this heuristic if the
recognition validity is high, but less so if it is low or at chance level
(see chapter 5). For instance, Pohl (2006) reported that 89% of par-
ticipants relied on the recognition heuristic in judgments of the
population of Swiss cities, where their recognition validity was
high, but only 54% in judgments of distance of those cities to the
center of Switzerland, where recognition validity was near chance.
Thus, the participants changed their reliance on the recognition
heuristic in an ecologically rational way when judging the same
cities, depending on the correlation between recognition and the
criterion.
This suggests that choosing to use the recognition heuristic
involves two processes: first, assessing recognition to see whether
the heuristic can be applied—the application of the memory con-
straints mentioned above; and second, evaluation to judge whether
it should be applied—the assessment of the ecological rationality
of the heuristic in the current situation. This is further supported
by fMRI results (Volz et al., 2006) indicating specific neural activity
corresponding to these two processes. Whether a similar combina-
tion of processes applies to the selection of other heuristics must
still be explored.
Are There Individual Differences in the Use of Heuristics?

If individuals all used the same heuristics when facing the
same situations, they would exhibit the same degree of ecological
rationality. But while a majority typically rely on the same particu-

lar heuristic in experimental situations, others vary in the decision
mechanisms they employ, both between individuals and intra-
individually over time. Why would such individual variation exist?
Part of the answer lies in differences in experience that lead people
to have different strategies in their adaptive toolbox or to select
among the tools they have in different ways. But some researchers
have also sought personality traits and attitudes as the roots of these
differences in use of decision strategies that can lead some people
to be more rational (ecologically or logically) than others (e.g.,
Stanovich & West, 2000; see chapter 9).
Individual differences in heuristic use may not, however, indicate
differences in ecological rationality. There are at least two ecologi-
cally rational reasons for inter- and intra-individual strategy varia-
tion: exploratory behavior and flat performance maxima. Exploratory
behavior can be useful to learn about the choice alternatives and
cues available and their relative importance (or even about what
heuristics may be applicable in the current situation). It often takes
the form of trial-and-error learning and leads to individual differ-
ences and to what looks like intra-individual inconsistency in the
use of heuristics, but exploratory behavior can also often result in
better performance over the longer term. On the other hand, an envi-
ronment shows flat maxima when two or more heuristics lead to
roughly equal performance. In such a setting, different individuals
may settle on using one or another essentially equivalent strategy (or
even switch between them on different occasions) and show no dif-
ference in performance or hence, ecological rationality.
With sufficient appropriate experience, performance differences
can appear, coupled with differences in use of decision strategies.
In general, experts know where to look and tend to rely on limited
search more often than laypeople do (Camerer & Johnson, 1991).
This is illustrated by a study on burglary in which graduate stu-
dents were given pairs of residential properties described by eight
binary cues, such as apartment versus house, mailbox empty versus
stuffed with letters, and burglar alarm system present versus lack-
ing (Garcia-Retamero & Dhami, 2009). The students were asked
which property was more likely to be burgled. Two models of
cognitive processes were tested: weighting and adding of multi-
ple pieces of information and the take-the-best heuristic, which
bases its decision on only the most important discriminating
cue. The result was that 95% and 2.5% of the students were classi-
fied as relying on weighting-and-adding and take-the-best, respec-
tively. Normally, psychology experiments stop here. But the
authors then went on to study experts, in this case burglars from an
English prison who reported having committed burglary, on average,
57 times. The burglars’ decisions were based on different cognitive

processes; 85% of the men were classified as relying on take-the-
best and only 7.5% on weighting-and-adding. A second expert
group, police officers who had investigated residential burglaries,
showed the same predominance of take-the-best. The weighting-
and-adding process among students may largely reflect exploratory
behavior. These findings are consistent with other studies conclud-
ing that experts tend to rely on simple heuristics, often on only one
cue, whereas novices sometimes combine more of the available
information (Dhami, 2003; Dhami & Ayton, 2001; Ettenson,
Shanteau, & Krogstad, 1987; Shanteau, 1992).
Why Not Use a General-Purpose Optimizing Strategy Instead of an

Adaptive Toolbox?
Ecological rationality focuses on the fit between different decision
strategies applied by minds in different environmental circum-
stances. If there is only ever one decision mechanism to be applied,
then the question of ecological rationality does not even come up.
Thus, for those scientists who still yearn for Leibniz’s dream of a
universal calculus that could solve all problems or a single general-
purpose optimizing approach to make every decision, the fit
between the mind and the world is irrelevant. Logic, Bayesian sta-
tistics, and expected utility maximization are among the systems
that have been proposed as general-purpose problem-solving
machines. But they cannot do all that the mind can. Logic can solve
neither the investment problem nor the ball-catching task; Bayesian
statistics can solve the first but, as we have seen, not as well as a
simple heuristic, and the expected utility calculus has similar
limits. Still, why not strive for finding a better, more general opti-
mizing method?
In general, an optimization model works by defining a problem
in terms of a number of mathematically convenient assump-
tions that allow an optimal solution to be found and then proving
the existence of a strategy that optimizes the criterion of interest in
this simplified situation. For instance, the mean–variance portfolio
is an optimization model for the investment problem, given some
constraints. But it is important to remember, as the investment
case illustrates, that an optimization model for a tractable setting
does not imply optimal behavior in the unsimplified real world.
One of the main reasons why optimization methods can fall
behind simple heuristics in real-world applications is that they
often do not generalize well to new situations—that is, they are
not as robust as simpler mechanisms. In general, optimization
can only lead to optimal outcomes if it can estimate parameters
with no or minimal error, which requires environments with
low uncertainty and large sample size, among other factors. We

deal extensively with this foundational issue of robustness and
why simple heuristics can lead to more accurate inferences than
sophisticated statistical methods in the next chapter, covering two
important types of uncertainty in prediction. The first is out-of-
sample prediction, where one knows a sample of events in a popu-
lation and has to make predictions about another sample from the
same population. This corresponds to the investment problem,
where the performance of funds up to some time is known, and
predictions are made about the performance in the next month,
assuming the market is stable. As we saw with the investment prob-
lem, simple heuristics like the 1/N rule that avoid parameter esti-
mation can be more robust than optimization methods in the face of
this kind of uncertainty. The second type of uncertainty appears in
out-of-population prediction, where one has information about a
particular population and then predicts outcomes for another pop-
ulation that differs in unknown ways. For instance, when a diag-
nostic system for predicting heart attacks is validated on a sample
of patients in Boston and then applied to patients in Michigan, it
confronts out-of-population uncertainty. Here again, robustness is
vital, and it can be achieved by radically simplifying the decision
mechanism, such as by replacing a logistic regression diagnostic
system with a fast and frugal tree for predicting heart disease
(see chapter 14). (A third type of uncertainty can also occur, related
to novelty and surprise. In this case, whole new choice alternatives
or consequences can appear—for instance, new prey species moving
into a territory due to climate change. To be prepared for such sur-
prises, coarse behavior that appears rigid and inflexible may be
superior to behavior fine-tuned and optimized to a past environment
that was assumed to be stable—see Bookstaber & Langsam, 1985.)
To summarize, despite the widespread use of optimization in
theory (as opposed to actual practice in business or medicine), there
are several good reasons not to rely routinely on this technique as a
strategy for understanding human behavior (Gigerenzer, 2004b;
Selten, 2001). In contrast, the study of the ecological rationality of
a heuristic is more general and does not require replacing the prob-
lem in question with a mathematically convenient small-world
problem (Savage, 1972) that can be optimized. Because it asks in
what environments particular heuristics perform well (and better
than other strategies), ecological rationality focuses on what is good
enough or better, not necessarily what is best.
Why Not Use More Complex Decision Strategies?

Although optimization is unrealistic as a general method for making
decisions, people and other animals could still use strategies that
are more complex than simple heuristics. Why should decision

makers ever rely on simple mechanisms that ignore information
and forego sophisticated processing? The classical justification is
that people save effort with heuristics, but at the cost of accuracy
(Payne et al., 1993; Shah & Oppenheimer, 2008). This interpreta-
tion of the reason for heuristics is known as the effort–accuracy
trade-off:
Humans and other animals rely on heuristics because infor-

mation search and computation cost time and effort; thus,
they trade off some loss in accuracy for faster and more frugal
cognition.
This view starts from the dictum that more is always better, as
described at the beginning of this chapter—more information and
computation would result in greater accuracy. But since in the real
world, so the argument goes, information is not free and computa-
tion takes time that could be spent on other things (Todd, 2001),
there is a point where the costs of further search exceed the
benefits. This assumed trade-off underlies optimization-under-
constraints theories of decision making, in which information
search in the external world (e.g., Stigler, 1961) or in memory (e.g.,
Anderson, 1990) is terminated when the expected costs exceed its
benefits. Similarly, the seminal analysis of the adaptive decision
maker (Payne et al., 1993) is built around the assumption that heu-
ristics achieve a beneficial trade-off between accuracy and effort,
where effort is a function of the amount of information and compu-
tation consumed. And indeed, as has been shown by Payne et al.’s
research and much since, heuristics can save effort.
The major discovery, however, is that saving effort does not nec-
essarily lead to a loss in accuracy. The trade-off is unnecessary.
Heuristics can be faster and more accurate than strategies that use
more information and more computation, including optimization
techniques. Our analysis of the ecological rationality of heuristics
goes beyond the incorrect universal assumption of effort–accuracy
trade-offs to ask empirically where less information and computa-
tion leads to more accurate judgments—that is, where less effortful
heuristics are more accurate than more costly methods.
These less-is-more effects have been popping up in a variety of
domains for years, but have been routinely ignored, as documented
in chapter 3. Now, though, a critical mass of instances is being
assembled, as shown throughout this book. For instance, in an age
in which companies maintain databases of their customers, com-
plete with historical purchase data, a key question becomes pre-
dicting which customers are likely to purchase again in a given
timeframe and which will be inactive. Wübben and Wangenheim
(2008) found that managers in airline and apparel industries rely

on a simple hiatus heuristic: If a customer has not purchased within
the past 9 months (the “hiatus”), the customer is classified as inac-
tive; otherwise, the customer is considered active. The researchers
compared this hiatus heuristic with a more complex Pareto/nega-
tive binomial distribution (NBD) model, which assumes that pur-
chases follow a Poisson process with a purchase rate parameter λ,
customer lifetimes follow an exponential distribution with a drop-
out rate parameter μ, and across customers, purchase and dropout
rates follow a gamma distribution. For both industries, the simple,
less effortful heuristic correctly classified more customers than the
more computationally costly Pareto/NBD model. Similarly, in library
searches for appropriate references, a one-reason decision heuristic
produced better orders of titles than a Bayesian model and PsychInfo
(Lee, Loughlin, & Lundberg, 2002). Thus, for many decision prob-
lems in the real world, there is an inverse-U-shaped relation between
amount of information, computation, and time on the one hand and
predictive accuracy on the other. There is not always a trade-off
to be made between accuracy and effort—the mind can have it both
ways. The study of ecological rationality tells us when.
Methodology
Progress in science comes through finding good questions, not just

answers. Finding the right answer to the wrong question (some-
times known as a Type III error) is a waste of effort. We believe
the traditional perspective of logical rationality has been posing
the wrong questions—and with the study of ecological rationality,
researchers ask different ones. Consider the question, “Do people’s
intuitive judgments follow Bayes’s rule?” Before 1970, the answer
was yes, people are Bayesians, albeit conservative ones (Edwards,
1968). After 1970, the answer was changed to no: “In his evaluation
of evidence, man is apparently not a conservative Bayesian: he is
not Bayesian at all” (Kahneman & Tversky, 1982, p. 46). Recently,
the answer has swung back toward yes in research on the “Bayesian
brain” (Doya, Ishii, Pouget, & Rao, 2007) and new Bayesian models
of reasoning (Tenenbaum, Griffiths, & Kemp, 2006). This inconsis-
tency over time indicates that this yes/no question is probably the
wrong one, whatever the answers are. From the perspective of
the adaptive toolbox, the mind has several tools, not just Bayesian
probability updating, and the better question is, “In what envi-
ronments do people use particular strategies?” The follow-up ques-
tion is, “When (and why) are particular strategies ecologically
rational?”
There are three essential components in these better questions:

heuristics and other decision strategies (in the plural), environment
structures (in the plural), and statements about ecological rational-
ity. Thus, to answer these questions, we need a research program
consisting of several steps that get at each of these components,
including the following:
1. Design computational models of heuristics and specify

structures of environments.
2. Use analysis and computer simulation to study the ecologi-
cal rationality of each heuristic, given various environment
structures, using accuracy or some other criterion.
3. Empirically test whether people’s (a) behavior and (b) cog-
nitive processes can be predicted by particular heuristics
fit to a given environment.
4. Use the results to construct a systematic theory of the
adaptive toolbox and its ecological rationality.
All of these steps can be applied to understand the ecological ratio-

nality of other species besides humans. For humans in particular,
we can also add another step to apply the research program to fur-
ther real-world problems (see chapters 16 and 17):
5. Use the results to design environments and expert systems

to improve decisions.
Note that computational models of heuristics are specific models

of cognitive processes (termed proximal mechanisms in biology),
including the building blocks for information search, stopping
search, and decision. As indicated in step 3, they can predict both
an individual’s (or group’s) decision process and the resulting
behavior, and they should be tested against both. For instance,
consider two competing hypotheses about how outfielders catch a
ball: relying on the gaze heuristic or computing the ball’s trajectory.
Each assumes different cognitive processes that lead to different
measurable behavior. Trajectory computation predicts that players
first estimate the point where the ball will come down, then run as
fast as they can to this point and wait for the ball. In contrast, the
gaze heuristic predicts that players catch the ball while running,
because they constantly have to adjust their angle of gaze. The heu-
ristic makes further predictions that the trajectory computation
theory does not make, including the pattern of players’ changes in
speed while running and that in certain situations players will
run in a slight arc rather than in a straight line. These predicted
behaviors have been observed and documented, supporting the use
of the gaze heuristic and its variants (Saxberg, 1987; Shaffer &
McBeath, 2005 Todd, 1981). Furthermore, the predicted process of
trajectory computation implies that players will calculate where a
ball will land, whereas the gaze heuristic makes no such predic-
tion. Comparing these process-level predictions can help explain
an apparent fallacy on the part of expert players—that they are
not able to say where a ball will come down (e.g., Saxberg, 1987).
When using the gaze heuristic, players would not have this ability,
because they would not need it to catch the ball. Such an analysis
of heuristics and their ecological rationality can thus help research-
ers to avoid misjudging adaptive behavior as fallacies (Gigerenzer,
2000).
There are a number of useful methodological considerations that
are prompted by the study of ecological rationality. First, research
should proceed by means of testing multiple models of heuristics
(or other strategies) comparatively, determining which perform best
in a particular environment and which best predict behavior
observed in that environment. This enables finding better models
than those that already exist, not just assessing only one model in
isolation and then proclaiming that it fits the data or not. Second,
given the evidence discussed earlier for individual differences in
the use of heuristics, the tests of predictive accuracy should be
done at the level of each individual’s behavior, not in terms of
sample averages that may represent few or none of the individuals.
Finally, because individuals may vary in their own use of heuristics
as they explore a new problem, experiments should leave individu-
als sufficient time to learn about the alternatives and cues, and
researchers should not confuse trial and error exploration at the
beginning of an experiment as evidence for weighting and adding
of all information.
Several studies of heuristics exemplify these methodological
criteria. For instance, Bergert and Nosofsky (2007) formulated a
stochastic version of take-the-best and tested it against an additive-
weighting model at the individual level. They concluded that the
“vast majority of subjects” (p. 107) adopted the take-the-best strat-
egy. Another study by Nosofsky and Bergert (2007) compared take-
the-best with both additive-weighting and exemplar models of
categorization and concluded that “most did not use an exemplar-
based strategy” but followed the response time predictions of take-
the-best. There are also examples where not following some of these
criteria has led to results that are difficult to interpret. For instance,
if a study on how people learn about and use cues does not provide
enough trials for subjects to explore and distinguish those cues,
then lack of learning cannot be used as evidence of inability to learn
or failure to use a particular heuristic (e.g., Gigerenzer, Hertwig, &
Pachur, 2011). This shows another benefit of comparative testing:

If there are such flaws in the experimental design, they will hurt all
models tested equally, not just one.
In sum, studying ecological rationality requires computational
models of heuristics (and other strategies) that are tested at the level
of individual behavior, in a range of appropriate environments,
and in a comparative way. Progress relies on analytical proof, com-
puter simulation, and empirical studies in the field and in the lab,
and on developing a conceptual language for the structures of heu-
ristics and environments and the fit between the two.
The Rational and the Psychological
The message of this book is to study mind and environment in

tandem. Intelligence is in the mind but also in the world, inherent
in the structures in our physical, biological, social, and cultural
surroundings. The traditional view of heuristics as lazy mental
shortcuts, falling short of some idealized vision of general rational-
ity, relegates the study of heuristics to merely a descriptive role. It
draws a strict line between how behavior is and how it should be,
with psychology answering the first question but silent on the
second, the territory of logic and probability theory. The study of
ecological rationality overthrows this separation of the psychologi-
cal and the rational and creates a descriptive and prescriptive role
for heuristics. In the right environment, a heuristic can be better
than optimization or other complex strategies. Rationality and psy-
chology both emerge from the meeting of the two blades of Herbert
Simon’s scissors, the mental and the environmental.
Part II
UNCERTAINTY IN THE WORLD
2
How Heuristics Handle Uncertainty
Henry Brighton
Gerd Gigerenzer
Prediction is very difficult, especially if it’s about the

future.
Niels Bohr
W hy do people rely on simple heuristics instead of more sophis-

ticated processing strategies? The classical answer comes in the
form of the effort–accuracy trade-off hypothesis (Beach & Mitchell,
1978; Payne, Bettman, & Johnson, 1993), which provides a justifica-
tion for why it is rational for an organism to rely on simple heuris-
tics and ignore information. The argument is that cognitive effort,
manifest in activities such as attention or recall, is a scarce resource
(Simon, 1978); therefore, an adaptive decision maker will select
a decision mechanism that reduces this costly effort while decreas-
ing beneficial accuracy only a little. The effort–accuracy hypothesis
corresponds to the intuition that more effort is always better (or at
least, cannot hurt) but also has increasing costs, so there is an opti-
mal trade-off point at which it is no longer worth putting in more
effort. Mathematically, it is modeled by two conflicting curves:
accuracy increases monotonically with more effort, but the costs
decrease monotonically with less effort. This is the optimization-
under-constraints framework that is frequently found underlying
the question of strategy selection in models of memory, reasoning,
and decision making (e.g., Anderson, 1990; Stigler, 1961).
But the hypothesis of an effort–accuracy trade-off has proven
wrong as a general rule. Studies comparing simple heuristics that
demand less effort to multiple regression and other statistical strat-
egies that demand more effort have found that heuristics can also
make more accurate predictions in particular settings (as discussed
in several chapters in this book; see also Czerlinski, Gigerenzer,
& Goldstein, 1999; Gigerenzer & Brighton, 2009; Gigerenzer
& Goldstein, 1999). In these situations there is no trade-off—the
33
34 UNCERTAINTY IN THE WORLD
decision maker gets more accuracy along with less effort. Thus,
there is a second answer to the question we started with: People
also rely on simple heuristics in situations where there is no effort–
accuracy trade-off. These results call for a different, more general
account of why it is rational to use simple heuristics, one that
includes both situations in which the effort–accuracy trade-off
holds and those where it does not. The surprising situation of
no trade-offs leads to another question, which we address in this
chapter: How can heuristics that ignore part of the available infor-
mation make more accurate inferences about the world than strate-
gies that do not ignore information?
To find answers to this new question, we first identify a useful
metaphor for the adaptive relationship between mind and environ-
ment. We then provide an analytical framework to understand how
cognition without trade-offs can work.
What Is the Nature of Adaptive Decision Making?
To begin with, we need a way to think about the relationship between

cognitive strategies and the environment. Metaphors guide our
thinking, often unconsciously, and they are responsible for provid-
ing the questions we ask, including the wrong ones. We focus on
three metaphors (Todd & Gigerenzer, 2001): Shepard’s mirrors,
Brunswik’s lenses, and Simon’s scissors. For Roger Shepard (e.g.,
1994/2001), much of cognition is done with mirrors: Key aspects of
the environment are internalized in the brain “by natural selection
specifically to provide a veridical representation of significant objects
and events in the world” (p. 582). One of Shepard’s proposals is that
the three-dimensional nature of the world is mirrored in our percep-
tual system, and this internalization helps us to make inferences
about the size or distance of objects. In this view, an adaptive strat-
egy is one that mirrors the relevant aspects of the environment. For
instance, when we argue that a linear model, such as regression, is
the best model if the world is also linear, then we are relying implic-
itly on the mirror metaphor. If we test this assumption and find that
a strategy that mirrors its environment performs worse than some
that do not (as we will see below), we must question the usefulness
of this metaphor. Egon Brunswik (1955) proposed a lens metaphor
to capture how accurately judgment models the outside world. In his
view, there are uncertain proximal cues that indicate but do not
mirror the outside world, and these are bundled into a judgment
like light rays are in a lens to produce our impression of the world.
For Brunswik, the mind infers the world rather than reflects it.
Neither the mirror nor the lens, however, can explain why there
would be situations where less effort—using less information—is
better. Herbert Simon (1990) proposed another tool to understand
HOW HEURISTICS HANDLE UNCERTAINTY 35
the nature of adaptive decision making. Human behavior, he argued,

is shaped by a pair of scissors whose two blades are cognition and
the environment. In this view, a cognitive heuristic need not mirror
the environment, but the two must be closely complementary for
cognition to function adaptively. This chapter can be seen as an
exploration of the scissors metaphor of cognition from the perspec-
tive of a statistical problem known as the bias–variance dilemma
(Geman, Bienenstock, & Doursat, 1992). We first show that ignoring
large amounts of the available information can pay off, by consider-
ing an agent that needs to predict temperature. We then apply the
same concepts and insights to understanding when and why heu-
ristics like take-the-best are successful. Some of our results may be
disturbing for the reader who thinks in terms of the mirror meta-
phor (as they were for us). But such results are necessary for rethink-
ing the nature of adaptive decision making and understanding the
workings of ecological rationality.
Robust Models of Uncertain Environments
The temperature in London on a given day of the year is uncertain,

but nevertheless follows a seasonal pattern. Using the year 2000 as
an example, we have plotted London’s mean daily temperature in
Figure 2-1a. On top of these observations we have plotted two
polynomial models that attempt to capture what is systematic in
London’s temperatures. The first model is a degree-3 polynomial
(a cubic equation with 4 parameters), and the second is a degree-12
polynomial (which has 13 parameters). Comparing these two
models, we see that the degree-12 polynomial captures monthly
fluctuations in temperature while the degree-3 polynomial captures
a simpler pattern charting a rise in temperature that peaks in the
summer, followed by a slightly sharper fall. Which model is best?
It depends on what we mean by “best”—what kind of performance
we seek. One way of deciding between the models is to pick the
one that fits the data with the least error—in other words, the one
with the greatest goodness of fit—which in this case is the degree-12
polynomial. But why stop at a degree-12 polynomial when we can
achieve an even better fit with, say, a degree-50 polynomial?
If London’s daily temperatures for all subsequent years were
guaranteed to match precisely those measured in the year 2000,
then there is no reason to stop with a lower degree polynomial:
Since what we have observed in the past will continue to be
observed in the future, and by describing the past more accurately,
as with a higher degree function, we will also describe the future
more accurately. There is no uncertainty in this hypothetical world,
and the best model would be the best-fitting model we could find.
But the real world is different: Despite the widespread use of
(a) 80
Degree-12 Polynomial
Degree-3 Polynomial
70
Temperature (°F)
60
50
40
30
20
0 100 200 300 400
Days Since January 1, 2000
(b) 350
Error in Predicting the Sample
Error in Fitting the Sample
300
250
200
Error
150
100
50
0
0 2 4 6 8 10 12
Degree of Polynomial
Figure 2-1: Model fits for temperature data. (a) Mean daily tem-
perature in London for the year 2000. Two polynomial models are
fitted to this data, one of degree 3 and one of degree 12. (b) Model
performance for London temperatures in 2000. For the same data,
mean error in fitting the observed samples decreases as a function
of polynomial degree. Mean error in predicting the whole popula-
tion of the entire year’s temperatures using the same polynomial
models is minimized by a degree-4 polynomial.
36
goodness of fit in evaluating models in many domains, including

psychology, education, and sociology, human behavior (and the
behavior of other natural systems) is not a clear, certain window
into the underlying processes producing that behavior. Models of
different complexity deal with uncertainty with varying degrees
of success. Using goodness of fit to judge this ability is a dangerous
practice that can easily lead to faulty conclusions (Pitt, Myung, &
Zhang, 2002; Roberts & Pashler, 2000).
Out-of-Sample Robustness
There can be negative consequences of using too many free param-
eters in our models. To see this, we can frame the task as one of
estimating model parameters using only a sample of the observa-
tions and then test how well such models predict the entire popula-
tion of instances. This allows us to get closer to estimating how
well different models can predict the future, based on the past, even
though here we are “predicting” past (but unseen) outcomes. If the
model performs well at this task, we can be more confident that
it captures systematic patterns in the data, rather than accidental
patterns. For example, if we observe the temperature on 50 ran-
domly selected days in the year 2000 and then fit a series of poly-
nomial models of varying degree to this sample, we can measure
how accurately each model goes on to predict the temperature on
every day of the year 2000, including those days we did not observe.
This is an indication of the generalization ability of a model. As a
function of the degree of the polynomial model, the mean error in
performing this prediction task is plotted in Figure 2-1b. The model
with the lowest mean error (with respect to many such samples of
size 50) is a degree-4 polynomial—more complexity is not better.
Contrast this generalization performance for predicting unseen
data with the objective of selecting the model with the lowest
error in fitting the observed sample, that is, producing the correct
temperature on days we have observed. For this task, Figure 2-1b
tells us that error decreases as a function of the degree of the poly-
nomial, which means that the best-predicting model would not
be found if we select models merely by checking how well they
fit the observations. Notice also that the best-predicting polynomi-
als in this example are close to a theoretically reasonable lower
bound of between degree 3 and degree 4. This lower bound on the
problem exists because we should expect temperatures at the end
of the year to continue smoothly over to the predictions for tem-
peratures at the beginning of the next year. A degree-2 polynomial
cannot readily accommodate this smooth transition from one year to
the next, but degree-3 or degree-4 polynomials can. This prediction
task considers the out-of-sample robustness of the models, which is
the degree to which they are accurate at predicting outcomes for the
entire population when estimated from the contents of samples of
that population. Here, the most predictive model is very close to the
lower bound of complexity, rather than at some intermediate or high
level. This example illustrates that simpler models can cope better
with the problem of generalizing from samples.
Out-of-Population Robustness
A more realistic test of the models estimated from a sample of mea-
surements is to consider how well they go on to predict events in
the future, such as in this example, the temperature on each day of,
for instance, the year 2001. What we are predicting now lies out-
side the population used to estimate the model parameters. The
two populations may differ because factors operating over longer
time scales come into play, such as climate change. The difference
between the two populations could range from negligible to severe.
For example, Figure 2-2 shows how well the models estimated
400
Paris (2002)
Paris (2001)
350 London (2002)
London (2001)
300 London (2000)
250
Error
200
150
100
50
0 2 4 6 8 10 12
Figure 2-2: Out-of-population prediction. The models estimated

from 50 samples of London’s daily temperatures in 2000 can be
used to predict the daily temperature for that entire year (thick
line). This plot also shows how well these models go on to pre-
dict the daily temperature in London for the years 2001 and 2002,
and in Paris for the years 2001 and 2002. Much the same pattern
is observed across applications of the model, although the error
increases due to greater uncertainty arising from changes over time
and space.
from samples of the temperatures in London in 2000 go on to pre-

dict the temperatures in 2001 and 2002. We have also plotted out-
of-sample error for 2000 as a point of comparison, and, as we should
expect, the error increases when we move from the out-of-sample
problem to the out-of-population problem. Although there is more
uncertainty in the out-of-population setting, much the same pattern
can be observed as before: A degree-4 polynomial yields the mini-
mum mean error. This tells us that what we learned from the out-
of-sample task also carries over to the out-of-population task, since
a degree-4 polynomial remains a good choice of model. An addi-
tional change to the population we are predicting can be introduced
by imagining that we want to use the temperatures in London to
predict those observed in Paris. Paris lies 212 miles southeast of
London, and Figure 2-2 shows how the prediction error suffers as
a result of this geographical shift, but the finding that degree-4 poly-
nomials predict with the least error remains.
Novelty Robustness and the Problem of Extended Uncertainty

When assessing the performance of a model, we put ourselves in
the position of an omniscient observer. Either we assume that
we know the truth against which our models are judged, or we
assume that our knowledge closely approximates the truth. For
example, when judging the above polynomial models, we had
access to the “true” future temperatures in London. For real-world
problems, our assessments of model performance are always
estimates because our assumptions about the environment are
nearly always wrong: Unmeasured, unobservable, and unforesee-
able environmental factors all contribute to the success or failure of
a model. This is why weather forecasters often err quite signifi-
cantly when forecasting beyond the short term: As time goes by, the
natural system they are attempting to predict is likely to deviate
more and more from the predictions of a model based on partial
and uncertain knowledge of this natural system. Although assump-
tions and idealizations are necessary in order to model at all, they
nevertheless come at a price: Some models will be more robust
against uncertainty than others. The same issue of robustness is
faced by organisms relying on biological machinery that is an
evolved response to events and pressures occurring in the past. The
same machinery must also control their behavior in the future.
What we will term novelty robustness considers the ability of
organisms, and models of organisms, to handle uncertainty arising
from unforeseen and significant events. For example, changes in
the environment such as war, political revolution, an overhaul of
the tax system, volcanoes, tsunamis, or a new predator can be dif-
ficult or even impossible to anticipate. Climate change, to take
another example, has led to a decline in Dutch populations of

pied flycatchers due to their inability to adjust their behavior
robustly. These birds suffer from a mismatch between the time
at which they reproduce, and the peak abundance of the caterpil-
lars they use to feed their chicks. These two times used to be syn-
chronized, but the early onset of spring caused by climate change
has resulted in an earlier peak abundance of caterpillars (Both,
Bouwhuis, Lessells, & Visser, 2006). The process used by pied
flycatchers to decide when to reproduce appears not to be robust
against changes in climate, which has resulted in a 90% population
decline in the space of two decades. Problems like these high-
light the need for novelty robustness and have been used to
explain events as serious as the collapse of whole societies (Weiss
& Bradley, 2001).
Novelty robustness is a response to uncertainty in its most
extreme form. Even after we carry out a thorough examination of
the environment, it still includes events that cannot be reliably
foreseen. Bookstaber and Langsam (1985) termed this form of uncer-
tainty extended uncertainty and argued that organisms can guard
against it by preferring coarse behavior rules, those that are less
sensitive to change than is considered optimal under conditions of
conventional uncertainty. The problem of novelty robustness high-
lights that uncertainty is inherent and occurs for many reasons.
Although organisms cannot hope to respond effectively to all forms
of uncertainty, coarse behavior rules and polynomial models with
few parameters point to how simplicity can contribute to robust-
ness. Next, we consider how these observations suggest a theory of
how less cognitive processing can result in greater robustness to
uncertainty.
The Robustness of Learning Algorithms
Given a series of observations, a learning algorithm specifies a

method for selecting a parameterized model from a space of given
possibilities and a method for using the selected model to make
predictions about future events. For example, the least squares
method (e.g., Fox, 1997) used to fit a given degree-p polynomial
model to London’s daily temperatures is an example of a learning
algorithm. This method first selects the parameters of the polyno-
mial model. To make a prediction, the method then evaluates this
polynomial when queried with a day for which the temperature
has not been observed. In our daily temperature example we, as
experimenters, considered several models and concluded that
p = 4 was often a good value. A learning algorithm might make this
decision itself and choose the value of p from a space of many
possible values. In general, the range of models considered by an

algorithm, the method for selecting among them, and how the
selected model is then used to make decisions all play a crucial role
in determining the robustness of the selected model. Whether or
not a learning algorithm induces a robust model for a given prob-
lem will depend on the interaction between the properties of
the problem and the processing assumptions of the algorithm.
Many simple heuristics involve the learning of some parameters
from a set of data, such as learning the order in which they will
use cues when making decisions (see chapter 11), which is a form
of model selection. In comparison to most other learning algo-
rithms, simple heuristics learn within a small space of parameter-
ized models and tend to consume fewer processing resources by
seeking a good enough model, rather than attempting to optimize
some criterion and find the best one. Take-the-best, for example,
is a heuristic that decides between two objects using the most valid
discriminating cue in its consideration set and ignoring all other
cues. When selecting a model by learning about the validity order
of the cues it can search through, take-the-best also ignores any
interactions that may exist between the cues, which reduces the
space of parameterized models it considers to just the possible cue
orders (and not, for instance, ways to combine cues). Despite these
simplifying assumptions, Czerlinski et al. (1999) and Gigerenzer
and Goldstein (1999) showed that take-the-best outperforms a linear
regression model over a diverse selection of 20 natural environ-
ments (see also chapter 8 for similar results). A linear regression
model selects a vector of cue weights that minimizes the residual
sum of squared error. In comparison to the space of cue orders,
the space of cue weights is vast. In comparison to the process of
minimizing the residual sum of squared error, sorting a list of cues
into the validity order used by take-the-best is a less resource-
intensive operation (see chapter 11 for a discussion of what com-
putations are involved). That take-the-best nevertheless often
outperformed a linear regression model suggests a less-is-more
effect in processing—and not, as we have seen, an effort–accuracy
trade-off. The degree of this effect, however, rests in large part on
the strength of linear regression as a competitor. Given this, should
we expect the less-is-more effect to disappear when take-the-best
is compared with more resource-intensive methods? Or will
take-the-best still outcompete more complex competitors in some
environments?
Since the initial studies highlighting the impressive performance
of take-the-best, Schmitt and Martignon (2006) proposed a greedy ver-
sion of take-the-best “that performs provably better” (p. 55) than the
original heuristic, while Chater, Oaksford, Nakisa, and Redington
(2003) argued that take-the-best “does not perform noticeably
better” (p. 63) than a number of standard machine learning algo-

rithms. Both these studies point to a limit on the less-is-more effect
and suggest that take-the-best ultimately pays a price for its sim-
plicity that is revealed when it is compared with more sophisti-
cated and resource-intensive methods. Before examining this
possibility, it is worth considering these methods in more detail.
Schmitt and Martignon provided a formal demonstration of the
superiority of a greedy version of take-the-best that computes cue
orders using conditional validity. To order cues by conditional
validity, the most valid cue is chosen first. Then, before selecting
the next cue in the order, the validities of the remaining cues are
recomputed against all those paired comparisons not discriminated
by the first cue. This procedure is repeated, recursively, such that
the validities of the different cues are often calculated over differ-
ent reference classes. Finding the conditional validity order requires
significant amounts of extra processing. This extra processing
results in a cue order that takes into account the fact that the valid-
ity of a cue is, in practice, likely to change depending on what cues
are checked before it.
Like the greedy version of take-the-best, Chater et al. (2003) con-
sidered a number of methods that conduct significant amounts of
extra processing. Unlike the greedy version of take-the-best, these
methods go a step further and consider a much richer class of
models that allows them to capture complex interactions between
cues. These models are drawn from three popular processing para-
digms in machine learning and cognitive modeling: rule-based
methods that induce decision trees, connectionist methods that
learn activation strengths within a neural network, and exemplar
methods that store observations to be retrieved later when a predic-
tion is required. Let us now revisit these studies by comparing three
complex decision mechanisms with take-the-best. An important
difference between the comparison we report here and the studies
of Schmitt and Martignon (2006) and Chater et al. (2003) is that
cross-validation, described below, will be used to assess the models.
First, we consider the greedy version of take-the-best that orders
cues by conditional validity. Second, we consider two classic deci-
sion-tree induction algorithms, CART and C4.5 (Breiman, Friedman,
Olshen, & Stone, 1993; Hastie, Tibshirani, & Friedman, 2001;
Quinlan, 1993). Third, we consider the nearest neighbor classifier
(Cover & Hart, 1967; Dasarathy, 1991).
Figure 2-3a compares the performance of take-the-best with
these four competitors for the often-studied task of predicting
which of two German cities has the larger population. Performance
is measured by cross-validation where a subset T of the objects in
the environment are used to estimate the parameters of each model,
and then the complement of this set, T ′, is used to assess the
predictive accuracy of the models. This is done for various sample

sizes that specify how many objects are used to construct the train-
ing set T. In contrast to the findings of Chater et al. (2003) and the
analysis of Schmitt and Martignon (2006), take-the-best clearly out-
performs all the competitors across the majority of sample sizes.
Figure 2-3b–d shows, much as the Czerlinski et al. (1999) study
did, that take-the-best’s performance is by no means specific to
the city size environment but also fares well in many others. The
three further environments shown in Figure 2-3b–d concern the
tasks of deciding which of two houses has the higher price, which
of two Galapagos islands has greater biodiversity, and which of two
mammals is likely to live longer. We found very similar compara-
tive results across 20 natural environments, which raises the fol-
lowing question: Why do these results suggest a different picture
from those reported by Schmitt and Martignon (2006) and Chater
et al. (2003), who found that models that conduct more processing
than take-the-best tend to perform better, thereby identifying a limit
on the less-is-more effect? The difference stems from looking at dif-
ferent types of performance, as we described in the previous sec-
tion. Our estimate of predictive accuracy is calculated using
cross-validation, which provides a more reliable measure of robust-
ness and is standard practice in machine learning and statistics.
In contrast, the findings of Schmitt and Martignon (2006) hold for
the case of data fitting, where cue validities are given rather than
estimated from samples. Similarly, the performance criterion used
by Chater et al. (2003), which considered a combination of predic-
tive accuracy and goodness of fit, differed from the standard mea-
sure of out-of-sample predictive accuracy used here.
The results shown in Figure 2-3 clearly demonstrate that relying
on one good reason can be more accurate than alternative linear
methods such as regression, and nonlinear methods such as neural
networks or exemplar models. This has an important implication
for cognitive science: Assuming that the mind is designed not to
waste cognitive effort, it should use simple heuristics rather than
complex computations whenever this is ecologically rational.
Supporting this idea, Nosofsky and Bergert (2007) concluded that
take-the-best predicts the cognitive processes of people systemati-
cally better than exemplar (and weighted linear) models—to
Nosofsky’s surprise as one of the originators of these models, and to
his credit as a researcher willing to test models against each other
and reevaluate conclusions in light of new evidence.
These results also point to another surprising conclusion. Take-
the-best employs two key simplifications: searching for cues in
order of validity, and stopping on the first discriminating cue. While
we and others have assumed that take-the-best’s stopping rule
underlies its robustness in particular environments, much of the
(a) City Populations
Mean Predictive Accuracy (% Correct) 75
70
65
60
Take-the-best
Nearest Neighbor
C4.5
55 CART
Greedy Take-the-best
50
0 10 20 30 40 50 60 70 80 90
Sample Size, n
(b) House Prices

85
Mean Predictive Accuracy (% Correct)
80
75
70
65
Take-the-best
60 Nearest Neighbor
C4.5
CART
55 Greedy Take-the-best
50
0 5 10 15 20 25 30
Sample Size, n
Figure 2-3: The performance of take-the-best in comparison to three

well-known learning algorithms (nearest neighbor classifier, C4.5,
and CART) and the greedy version of take-the-best, which orders
cues by conditional validity. Mean predictive accuracy (percent cor-
rect) in cross-validation is plotted as a function of the size of the
training sample for the task of deciding (a) which of two German
cities has the larger population; (b) which of two houses has the
higher price; (Continued)
44
(c) Biodiversity
85
80
75
70
65
Take-the-best
60 Nearest Neighbor
C4.5
CART
55 Greedy Take-the-best
50
0 5 10 15 20 25 30
Sample Size, n
(d) Mammal Life-spans

75
70
65
60
Take-the-best
Nearest Neighbor
C4.5
55 CART
50
0 10 20 30 40 50 60
Sample Size, n
Figure 2-3: (Continued) (c) which of two Galapagos islands has

greater biodiversity; and (d) which of two mammals is likely to live
longer. These environments are taken from the study by Czerlinski
et al. (1999).
robustness may actually stem from its search rule. The greedy
version of take-the-best, which has the same stopping rule but a dif-
ferent search rule, differs considerably in robustness from take-the-
best but is indistinguishable from the other complex models, both
when they are inferior to take-the-best (Figure 2-3) and when they
are superior (Figure 2-6). This implicates the search rule itself as a
key factor influencing the robustness of heuristics. In particular,

these results show that it is a mistake to regard a person who ignores
conditional dependencies between cues as being irrational; such a
view assumes that the mirror metaphor described earlier always
holds true.
Simple heuristics are specialized tools. Most learning algorithms
attempt to be robust over as wide a range of environments and
problems as possible. Similarly, by subscribing to the intuitions
of the effort–accuracy trade-off, most theories of cognitive process-
ing view heuristics as low-cost stand-ins for superior methods that
are more like adjustable spanners than specialized tools (Newell,
2005). The idea is that a single tool offers a more parsimonious
approach to cognitive processing: One complex multipurpose tool
may cost more effort to apply, but the rewards are somehow worth
it. The picture we propose starts, in contrast, from the realization
that the effort–accuracy trade-off does not always hold—rather the
mind can draw on an adaptive toolbox of simple special-purpose
decision tools that can perform well even without much computa-
tional effort. Next, we develop this picture by working toward a
solid statistical explanation of when and why less can be more in
cognitive processing.
Uncertainty and the Bias–Variance Dilemma
Predicting the temperature in London one year from now to the

nearest degree is harder than predicting which month will be the
sunniest, or whether the sun will rise at all. Some tasks involve
more uncertainty than others and, from the perspective of the organ-
ism, error is almost always inevitable. Understanding how proper-
ties of a decision maker’s learning algorithm interact with properties
of its task environment is a crucial step toward understanding how
organisms deal with uncertainty and error. To understand this prob-
lem, we will adopt the perspective of an omniscient observer and
consider the bias–variance dilemma (Geman et al., 1992), a statisti-
cal perspective on induction that decomposes prediction error into
three components: a bias component, a variance component, and a
noise component. Total error is the sum of these three terms:
Error = (bias)2 + variance + noise
This decomposition clarifies the different sources of error, which

can in turn be related to the properties of the learning algorithm.
Ultimately, this will allow us to draw a connection between the
properties of information-processing strategies and the robustness
to uncertainty of those strategies. To illustrate this connection, let
us revisit the daily temperature example but change the rules of the
game. Nobody knows the “true” underlying function behind
London’s mean daily temperatures, but we will now put ourselves
in the position of grand planner with full knowledge of the under-
lying function for the mean daily temperatures in some fictional
location. We denote this degree-3 polynomial function h(x) and
define it as
15 120 2 130 3
h( x ) = 37 + x+ x + x , here 0 ≤ x ≤ 364.
365 365 365
Figure 2-4a plots this underlying trend for each day of the
year. We will also assume that when h(x) is sampled, our observa-
tions suffer from normally distributed measurement error with
μ = 0, σ2 = 4 (which corresponds to the noise component in the
bias–variance decomposition above). A random sample of 30 obser-
vations of h(x) with this added error is shown on top of the under-
lying trend in Figure 2-4a.
If we now go on to fit a degree-p polynomial to this sample of
observations, and measure its error in approximating the function
h(x), can we draw a conclusion about the ability of degree-p poly-
nomials to fit our “true” temperature function in general? Not really,
because the sample we drew may be unrepresentative: It could
result in a lucky application of our fitting procedure that identifies
the underlying polynomial h(x), or an unlucky one incurring high
error. Thus, this single sample may not reflect the true performance
of degree-p polynomials for the problem at hand. A more reliable
test of a model is to measure its accuracy for many different sam-
ples, by taking k random samples of size n, fitting a degree-p poly-
nomial model to each one, and then considering this ensemble of
models denoted by y1(x), y2(x), . . ., yk(x). Figure 2-4b shows five
polynomials of degree 2 resulting from k = 5 samples of n = 30
observations of h(x). From the perspective of the organism, these
samples can be likened to separate encounters with the environ-
ment, and the fitted polynomials likened to the responses of the
organism to these encounters.
The question now is how well a given type of model—here poly-
nomials of degree 2—captures the underlying function h(x), which
we can estimate by seeing how well the induced models perform
on average, given their individual encounters with data samples.
First, consider the function y ( x ), which for each x gives the mean
response of the ensemble of k polynomials:
1 k
y (x ) = ∑ y (x ).
k i =1
(a) 70
h(x)
60
Temperature (°F)
50
40
30
0 100 200 300 400
Day
(b) 70
h(x)
yi (x)
60
Temperature (°F)
50
40
30
0 100 200 300 400
Day
(c) 70
h(x)
_
y(x)
60
Temperature (°F)
50
40
30
0 100 200 300 400
Day
48
The bias of the model is the sum squared difference between this
mean function and the true underlying function. Our omniscience
is important now, because to measure the bias we need to know the
underlying function h(x). More precisely, bias is given by
( )2 = ∑ { y ((x
xn) ( x n )}2
n
where xn here refers to the vector of x-values of the nth observation

and the sum runs over all n such observations in the training sample.
Figure 2-4c shows the y ( x ) arising from the five polynomials
shown in Figure 2-4b. Assuming k = 5 is sufficient to provide us
with a good estimate of y ( x ), this plot tells us that the model is
biased, since it differs from h(x). Zero bias is achieved if our aver-
age function is precisely the true function. Bias usually occurs
when the model we use to explain the observations lacks the flexi-
bility to capture the true underlying function. In the absence of
knowledge about the underlying function, bias can be reduced
by making the space of models considered by the learning algo-
rithm sufficiently rich. But by doing this we can easily introduce
another problem. Although the mean function averaged over models
induced from many samples may capture the true underlying
function without error, the individual models that contribute to
this mean may each incur high error. That is, zero mean error can
potentially hide high error of the individual estimates. This source
of error, which arises from the sensitivity of the learning algorithm
to the contents of individual samples, is termed variance. Variance
is the mean squared difference between each induced model func-
tion and the mean function:
1 k
variance = ∑ ∑ {y (x n ) − y (x n )}2 .
n k i =1
Intuitively, this variance reflects how scattered around the mean

our model estimates are. When variance increases as we consider
Figure 2-4: A fictional daily temperature function h(x) used to illus-

trate bias and variance. (a) Graph of h(x) and a sample of 30 points
with added noise. (b) Five polynomials of degree 2, yi(x) for 1 ≤ i
≤ 5, fitted to five further samples. (c) Mean of these five functions,
y ( x ) . Bias is the squared difference between h(x) and y ( x ) . Variance
is the sum of the squared difference between each function yi(x)
and y ( x ) , measuring how much the induced functions vary about
their mean.
more complex models, we say that these models are overfitting the
data, fitting not just the underlying function but also the noise
inherent in each particular data sample. The two properties of bias
and variance reveal that the inductive inference of models involves
a fundamental trade-off. We can try using a general purpose learn-
ing algorithm, such as a feed-forward neural network, that employs
a wide and rich space of potential models, which more or less guar-
antees low bias. But problems start when we have a limited number
of observations, because the richness of the model space can incur a
cost in high variance: The richer the model space, the greater the pos-
sibility that the learning algorithm will induce a model that captures
unsystematic variation. To combat high variance, we can place
restrictions on the model space and thereby limit the sensitivity of
the learning algorithm to the vagaries of samples. But these restric-
tions run counter to the objective of general purpose inference, since
they will necessarily cause an increase in bias for some problems.
This is the bias–variance dilemma. All cognitive systems face
this dilemma when confronted with an uncertain world. The bal-
ancing act required to achieve both low variance and low bias is
plain to see in Figure 2-5, which decomposes the error arising from
polynomials from degree 1 (a straight line) to degree 10 at predict-
ing our temperature function h(x) from samples of size 30. For each
polynomial degree we have plotted the bias (squared) of this type of
model, its variance, and their sum. The polynomial degree that
minimizes the total error is, not surprisingly, 3, because h(x) is a
degree-3 polynomial. Polynomial models of less than degree 3
suffer from bias, since they lack the ability to capture the underly-
ing pattern. Polynomials of degree 3 or more have zero bias, as we
would expect. But for polynomials of degree 4 or more, the problem
of overfitting kicks in and the variance begins to rise due to their
excess complexity. None of the models achieve zero error. This is
due to the observation error we added when sampling, which cor-
responds to the noise term in the bias–variance decomposition.
Take-the-Best: A Case Study in Ecological Rationality
The bias–variance dilemma tells us why learning algorithms work

well in some contexts but not in others and provides an analytic
framework for rethinking the nature of cognitive architecture:
If organisms had general-purpose mental algorithms, they would
not do well in an uncertain world, because they would pay too
much attention to unsystematic variation. To make good infer-
ences under uncertainty, an organism has to systematically ignore
information. An adaptive toolbox of specialized biased heuristics
achieves exactly that.
20000
(Bias)2 + Variance
(Bias)2
Variance
15000
Error
10000
5000
0
0 2 4 6 8 10
Figure 2-5: Decomposition of prediction error as a function of poly-

nomial degree. For the underlying function h(x) (a degree-3 poly-
nomial), polynomial models from degree 1 to degree 10 are fitted
to samples from h(x) of size 30, along with added noise. This plot
shows (bias)2, variance, and their sum as a function of the degree
of the fitted polynomial model. Polynomials of less than degree 3
suffer from bias. For polynomials of degree 3 or higher, the variance
increases as a function of degree of the polynomial. The best model
is the degree-3 polynomial.
Furthermore, the bias–variance dilemma proves essential to

understanding when and why simple heuristics in general, and
take-the-best in particular, are so successful in some environments
and not others. The theory of ecological rationality hinges on this
match between simple heuristics and natural environments. Our
starting point in the analysis of the ecological rationality of take-the-
best is Martignon and Hoffrage’s (1999, 2002) proof that in the class
of environments specified by noncompensatory cue weights, take-
the-best fits the data as well as any linear model, despite its frugality
(provided that the order of cues corresponds to the order of weights
in the linear model). An environment is defined as noncompensatory
if, with respect to a weighted linear model with m cue weights given
in decreasing order w1, . . ., wm, the weight of the ith cue is greater
than or equal to the sum of the weights of all subsequent cues:
w i ≥ ∑ w j where 1 ≤ i (m − ).
j i
This means that in noncompensatory environments the weights

of the cues decay rapidly as a function of their rank. The idea is
that the inferences made by take-the-best will be indistinguishable
from those of the linear model with these weights in this environ-
ment because the influence of the most valid cue cannot be out-
weighed by the subsequent cues. This important result, however,
cannot explain why take-the-best can be even more accurate than
other linear (or nonlinear) models, as illustrated in Figure 2-3,
because it only applies to the situation where the validity order of
cues is known, as in fitting. We now build on these fitting results by
using the bias–variance perspective to help us understand when
and how such simple heuristics can actually achieve greater pre-
dictive accuracy in generalization.
We start by considering a subset of the class of noncompensa-
tory environments (i.e., those with noncompensatory weights),
specifically, what we term the binary environments. Table 2-1
shows an example binary environment with noncompensatory
weights for m = 3 cues. Given m binary cues, a binary environment
is composed of 2m objects. For each of the values 0 through 2m−1, a
binary environment contains an object that has this value as its
criterion. The cue values for each of these objects are then set to
reflect the binary representation of the object’s criterion value,
coded using the binary cues (e.g., in Table 2-1 the object with crite-
rion = 6 is assigned cue values corresponding to the binary repre-
sentation [1, 1, 0]). Environments constructed in this way always
have noncompensatory weights, and no correlations exist between
the cues. Furthermore, in such binary environments, all cues have
a conditional validity of 1, despite having differing ecological
validities, which is an indication that strong conditional depen-
dencies exist between the cues. For example, cue 3 in Table 2-1 is
uncorrelated with the criterion and has ecological validity 0.5
Table 2-1: An Example of a Binary Environment With m = 3 Cues

Object Cue 1 Cue 2 Cue 3 Criterion
A 0 0 0 0
B 0 0 1 1
C 0 1 0 2
D 0 1 1 3
E 1 0 0 4
F 1 0 1 5
G 1 1 0 6
H 1 1 1 7
Note. The cue values of each object (A–H) are used to code a binary representation
of its integer-valued criterion. The cues are uncorrelated and have noncompensatory
weights.
when considered by itself. However, if this cue is used condition-

ally, in those cases when the first two cues fail to discriminate
between objects, it has the maximum possible (conditional)
validity, 1.
How well does take-the-best perform in such a noncompensatory
environment, when compared with its greedy counterpart, C4.5,
CART, and the nearest neighbor classifier? For a binary environ-
ment with six cues, Figure 2-6a plots the predictive accuracy of
take-the-best and these alternative models as a function of the
sample size. For very small samples, take-the-best narrowly outper-
forms the other methods, but for larger sample sizes it performs
consistently worse. This result can easily be explained using the
concepts of bias and variance. Previous analyses focusing on
the class of noncompensatory environments can be viewed as iden-
tifying an environmental condition under which take-the-best is
unbiased (Martignon & Hoffrage, 1999, 2002). However, if take-the-
best is unbiased in these environments, then practically all linear
and nonlinear learning algorithms will be too, because all learning
algorithms capable of capturing a linear relationship can represent
the noncompensatory function underlying these environments
without error. This suggests that the error component that leads one
algorithm to outperform another in these environments will be
variance, not bias.
This example also highlights that the performance of a heuristic
in an environment is not reflected by a single number such as pre-
dictive accuracy, but by a learning curve revealing how bias and
variance change as more observations become available (Perlich,
Provost, & Simonoff, 2003). Because the learning curves of two
algorithms can cross (as they do in Figure 2-6a), the superiority of
one algorithm over another will depend on the size of the training
sample. Saying that a heuristic works because it avoids overfitting
the data is really only a shorthand explanation for what is often a
more complex interaction between the heuristic, the environment,
and the sample size. Figure 2-6b and c confirms this point by
decomposing the error of take-the-best and its greedy variant into
bias and variance. It shows that the ability to reduce variance is
what distinguishes the two methods, and take-the-best does a
poor job of this in this particular environment. This reasoning
also tells us that for those environments where take-the-best out-
performs the other algorithms, such as the examples given in Figure
2-3, it does so by reducing variance.
To illustrate these issues further, we will perform the same com-
parison but with a different class of environment—a compensatory
one this time. Given m cues, what we term a Guttman environment
has m+1 objects and a structure inspired by the Guttman Scale
(Guttman, 1944). The m+1 objects have the criterion values 0
Binary Environment
(a)
100

90
80
70
Take-the-best
Nearest Neighbor
C4.5
60 CART
50
0 10 20 30 40 50 60 70
Sample Size, n
Take-the-best
(b)
3000
(Bias)2 + Variance
2500 (Bias)2
Variance
2000
Error
1500
1000
500
0
0 10 20 30 40 50 60 70
Sample Size, n
(c)
3000
(Bias)2 + Variance
2500 (Bias)2
Variance
2000
Error
1500
1000
500
0
0 10 20 30 40 50 60 70
Sample Size, n
54
Table 2-2: An Example of a Guttman Environment With m = 5 Cues

Object Cue 1 Cue 2 Cue 3 Cue 4 Cue 5 Criterion
A 0 0 0 0 0 0
B 1 0 0 0 0 1
C 1 1 0 0 0 2
D 1 1 1 0 0 3
E 1 1 1 1 0 4
F 1 1 1 1 1 5
Note. The cue values of each object (A–F) are used to code the criterion value using
the Guttman Scale. The cues are maximally correlated with the criterion, all with an
ecological validity of 1.
through m. For an object with a criterion value of N, the first N cues

are set to 1, and all others set to 0. Table 2-2 provides an example
of a Guttman environment with m = 5 cues. In Guttman environ-
ments, all cues have equal weight and are maximally correlated
with the criterion. The correlations between cues are also high,
making the cues in Guttman environments highly redundant (see
chapter 8). In contrast to binary environments, no conditional
dependencies exist between the cues (because all of the ecological
validities and all the conditional validities are equal to 1).
For a Guttman environment with m = 31 cues, Figure 2-7a com-
pares the predictive accuracy of the same strategies as before for
different training sample sizes. In contrast to the comparison in the
binary environment, take-the-best now outperforms the other
models across the majority of sample sizes. Figure 2-7b and c
decomposes the error of take-the-best and its greedy counterpart
into bias and variance, plotting them as a function of the sample
size. Once again, the performance differences we see can be
explained almost entirely by variance, not bias. Furthermore, the
key difference between the two models, the property that leads
them to perform so differently, is not the flexibility in the class of
models they use: Take-the-best and its greedy counterpart search
through exactly the same space of models. Rather, the crucial dif-
ference is whether or not conditional dependencies between cues
Figure 2-6: An illustration of the role played by variance in the

performance of take-the-best in a binary environment with m = 6
cues. (a) Take-the-best is outperformed by the rival strategies across
the majority of sample sizes. (b, c) Decomposition of the error of
take-the-best and its greedy counterpart, respectively. The relative
performance differences between the two are explained almost
entirely by variance.
Guttman
(a)
100

90
80
70
Take-the-best
Nearest Neighbor
C4.5
60 CART
50
0 5 10 15 20 25 30
Sample Size, n
(b) Take-the-best
350
(Bias)2 + Variance
300
(Bias)2
Variance
250
200
Error
150
100
50
0
0 5 10 15 20 25 30 35
Sample Size, n
(c)
350
(Bias)2 + Variance
300 (Bias)2
Variance
250
200
Error
150
100
50
0
0 5 10 15 20 25 30 35
Sample Size, n
56
are used to guide this search. In Guttman environments, and the

natural environments such as those considered in Figure 2-3, we
see a less-is-more effect: The simplicity of take-the-best leads to
superior performance. In binary environments, the simplicity of
take-the-best leads to inferior performance. This finding highlights
that environments with a noncompensatory structure (such as
binary environments) favor take-the-best when the cue validities
are known with certainty (Martignon & Hoffrage, 1999, 2002) but
not necessarily when cue validities are uncertain and need to be
estimated from a sample. Findings such as these help us refine our
understanding of when simplicity can improve performance. But
in what senses is take-the-best simple, and why does this simplic-
ity lead to improved performance?
Being Robust by Being Simple

There are three senses in which take-the-best is simple. First, the
model space of take-the-best—the space of cue orders—has lower
cardinality than the model spaces of the other strategies we have
considered. Second, the models themselves have a relatively simple
functional form. For example, the cue orders induced by take-
the-best are equivalent to a decision tree with a decision node at
each depth (see chapter 14). In contrast, the decision trees induced
by C4.5 and CART have unrestricted functional form. In both of
these respects, greedy take-the-best is also simple. But, third, the
process of selecting the model used by take-the-best is simpler than
that used by the other approaches, including greedy take-the-best,
in the sense of being less resource intensive. The principle differ-
ence is that, unlike the other methods considered, take-the-best
does not consider conditional dependencies between cues. In con-
trast to C4.5, CART, and greedy take-the-best, this policy of ignor-
ing conditional dependencies eliminates the need to measure the
predictive value of each cue relative to several alternative subsets
of observations.
Figure 2-7: An illustration of the role played by variance in the

performance of take-the-best in a Guttman environment with
m = 31 cues. (a) Take-the-best outperforms the rival strategies across
the majority of sample sizes. (b, c) Decomposition of the error of
take-the-best and its greedy counterpart, respectively. The relative
performance differences between the two are explained almost
entirely by variance.
What does it mean for a simple heuristic like take-the-best to

exploit the structure of the environment? We need to start by recog-
nizing that organisms do not experience the environment as a
whole—they experience a sample of observations that are taken
from, and are therefore governed by, the environment. Samples can
contain spurious correlations and accidental patterns, and when
generalizing from a sample, the learning algorithm must make a bet
on which of these patterns is systematic, rather than accidental.
What kind of bet does take-the-best place? First, it bets that the
size of the sample of observations will be small, because under
these conditions the variance component of the error will domi-
nate. The greater the number of observations, the less the variance
dominates the error, and in such situations, the simplicity of take-
the-best is unlikely to result in it outperforming more sophisticated
methods. Second, and complementing the first bet on sparse expo-
sure, take-the-best also bets that any identifiable conditional depen-
dencies between the cues will be unreliable and can safely be
ignored. As the model comparison in the binary environment
shows, this bet does not always pay off. But in many natural envi-
ronments, it does (see Figure 2-3).
It is important not to forget that both binary environments and
Guttman environments are unrepresentative of natural environ-
ments: They contain no noise, and they have a perfectly regular
structure not seen in any of the natural environments we have
examined. The binary environment combines low cue redundancy
with high conditional dependence between the cues, whereas the
Guttman environment has high cue redundancy combined with
low conditional dependence. Natural environments sit somewhere
between these two structural points and tend to have some inter-
mediate conditional dependency along with some degree of corre-
lation between cues. The fact that take-the-best can often outperform
methods that are capable of modeling both of these properties
highlights that the mind need not precisely reflect all aspects of the
environment. In environments where there are some nonzero
cue dependencies, take-the-best’s bet on ignoring these properties
can pay off because estimating them from small samples is likely
to incur high variance. Conditions such as these are where the scis-
sors metaphor comes into play: The task of generalizing from small
samples is an uncertain one, and it can pay to ignore information in
order to keep variance within acceptable limits.
The Importance of the Bias–Variance Dilemma in Cognition

Our cognitive systems are confronted with the bias–variance
dilemma whenever they attempt to make inferences about the
world. What can this tell us about the cognitive processes used to
make these inferences? First of all, cognitive science is increasingly

stressing the ways in which the cognitive system performs remark-
ably well when generalizing from few observations, so much so
that human performance in those situations has been characterized
as optimal (e.g., Griffiths & Tenenbaum, 2006; Oaksford & Chater,
1998). Such findings place considerable constraints on the range of
potential processing models capable of explaining human perfor-
mance. From the perspective of the bias–variance dilemma, the
ability of the cognitive system to make accurate predictions despite
sparse exposure to the environment provides a strong indication
that the variance component of error is successfully being kept
within acceptable limits. Although variance is likely to be the
dominant source of error when observations are sparse, it is never-
theless controllable. To control variance, one must abandon the
ideal of general-purpose inductive inference and instead consider,
to one degree or another, specialization (Geman et al., 1992). Put
simply, the bias–variance dilemma shows formally why a mind can
be better off with an adaptive toolbox of biased, specialized heuris-
tics. A single, general-purpose tool with many adjustable parame-
ters is likely to be unstable and incur greater prediction error as a
result of high variance.
Take-the-best points to how this problem can be solved with
simplicity, but could the success of take-the-best be a quirk, a one-
off exception to the purported rule that more processing means
better performance? Quite the opposite. The success of take-the-
best taps into something fundamental about statistical inference.
For example, for a given linear problem, the Gauss/Markov theo-
rem states that among the unbiased linear models, the least squares
estimate will have the lowest variance (e.g., Fox, 1997, p. 217). This
is a key result in statistics that, taken naïvely, would suggest that
the least squares estimate is always the best policy. But statisticians
have realized that biased methods may lead to lower total error
when their increase in bias can be outweighed by a greater decrease
in variance, especially when data are sparse. Ridge regression is
one example of a biased linear model that is often successful for
this reason (Hastie et al., 2001, p. 49). Related work in the 1970s
also found that equal (or random) weights can predict almost as
accurately as, and sometimes better than, multiple linear regression
(Dawes, 1979; Dawes & Corrigan, 1974; Einhorn & Hogarth, 1975;
Schmidt, 1971; see chapter 3). Another example is the naïve Bayes
classifier that, like take-the-best, ignores dependencies between
cues (Martignon & Laskey, 1999). This simplification often leads to
improved performance over more resource-intensive methods
when data are sparse, despite the naïve Bayes assumption explic-
itly violating known properties of the environment (Domingos &
Pazzani, 1997; Friedman, 1997).
As well as there being sound statistical reasons for why take-the-

best’s simplicity can result in robust inference, one can also make
the argument that biased methods, such as take-the-best, are likely
to be the norm in the natural world. To think otherwise requires a
commitment to the view that organisms have near-perfect models
of the processes governing environmental regularities. Theoretical
notions of unbiased models and infinitely large samples are useful
analytic constructs but have questionable value in practice.
Achieving Robustness Through Simplicity
Take-the-best is an example of how ignoring information and per-

forming less processing can result in more robust inferences.
Findings such as these raise significant issues. First, they tell us
that the effort–accuracy trade-off provides a potentially misleading
hypothesis when considering the range of possible processing strat-
egies available to an organism. In an uncertain world, less effort can
lead to greater accuracy. Second, they show how an organism can
adapt itself to the environment without necessarily reflecting its
properties directly but instead exploiting the fact that uncertainty
is often best dealt with by ignoring information, and being biased
(Gigerenzer & Brighton, 2009). In an entirely certain world that can
be observed fully, the best strategy is to represent the world as accu-
rately as possible and be unbiased. But the world is shot through
with uncertainty, observations are often limited and costly, and we
cannot hope to be unbiased in all situations. Given these con-
straints, the best approach that evolution can build into organisms
comes in the form of efficient mechanisms that ignore information,
using fewer processing resources and making more robust infer-
ences as a consequence.
3
When Simple Is Hard to Accept
Robin M. Hogarth*
In a world in which information is relatively scarce, and

where problems for decision are few and simple, infor-
mation is almost always a positive good. In a world where
attention is a major scarce resource, information may be
an expensive luxury, for it may turn our attention from
what is important to what is unimportant. We cannot
afford to attend to information simply because it is
there.
Herbert Simon
A lthough people make many decisions quite easily every day,

most think of making decisions as being a difficult, complex task—
possibly because active decision making is associated in people’s
minds with complex problems. This complexity can have several
sources: lack of familiarity with the type of problem and thus uncer-
tainty about how to proceed; lack of information or, alternatively,
so much information that it is difficult to know what is relevant;
and uncertainty about values and thus what trade-offs are involved,
to name a few.
Without denying the inherent complexity of many decisions,
the goal of this chapter is to explore why people resist the fact that
many complex decision problems can sometimes be satisfactorily
handled by quite simple methods. These methods have two key fea-
tures: One is the deliberate use of limited information; the other
involves simple ways of “processing” the information used. As evi-
dence, I provide four case studies from the decision-making litera-
ture that demonstrate these features. In all four cases, the simple
methods have not been easily accepted by the scientific community.
* I am grateful for comments on an earlier version of this work by Robyn

M. Dawes, Spyros Makridakis, Natalia Karelaia, and J. Scott Armstrong.
This research was financed in part by a grant from the Spanish Ministerio
de Ciencia e Innovación.
61
There are three main reasons for this: (a) Researchers believe that
complex systems or problems require complex solutions; (b) new
ideas and methods, which are often simpler, can be resisted just for
being new; and (c) it is sometimes difficult to know when simplic-
ity works. Figuring out when simple methods succeed or fail is
challenging and can itself be complex.
This chapter is organized as follows. I first point out that deci-
sion makers—and students of judgment and decision making—are
not unique in failing to adapt to conceptual innovations that imply
greater simplicity. Indeed, the history of science is replete with
many examples. I then discuss the four cases drawn from the
decision-making literature. These are, first, the findings that predic-
tions of “clinical” judgment are inferior to actuarial models; second,
how simple methods in times series forecasting have proven supe-
rior to more sophisticated and “theoretically correct” methods
advocated by statisticians; third, how in combining information for
prediction, equal weighting of variables is often more accurate than
trying to estimate differential weights; and fourth, the observation
that, on occasion, decisions can be improved when relevant infor-
mation is deliberately discarded. I follow this by examining the
rationale for the fourth case in greater depth.
In a fascinating review, Barber (1961) documented many cases
of failure to accept new concepts involving scientific giants operat-
ing in the physical sciences where, one might suppose, hard evi-
dence would be difficult to overcome. Among the various sources
of resistance to new ideas, Barber gives as examples difficulties
understanding substantive concepts, different methodological con-
ceptions, religious ideas, professional standing (e.g., failure to
accept discoveries by young scientists), professional specialization
(e.g., work by people outside a discipline), and the dysfunctional
role sometimes played by professional societies. He goes on to
quote Max Planck, who, frustrated by the fact that his own ideas
were not always accepted, stated that “a new scientific truth does
not triumph by convincing its opponents and making them see the
light, but rather because its opponents eventually die, and a new
generation grows up that is familiar with it” (Barber, 1961, p. 597).
In this chapter, I discuss this phenomenon with respect to the
field of judgment and decision making. There are two reasons why
this field provides an interesting setting for this issue. First, for sci-
entists concerned with how decisions are and should be made, one
might imagine that there would be little resistance to adopting
methods that improve decision making by increasing accuracy,
reducing effort, or both. Second, the studies in which these new
methods were discovered are empirical and often supported
by analytical rationales. A priori, it is not a question of dubious
evidence.
WHEN SIMPLE IS HARD TO ACCEPT 63
Clinical Versus Statistical Prediction
A book published by Paul Meehl in 1954 is the first case I consider.

In this book, Meehl asked the question whether—in predictions
made in clinical psychology—clinicians would be better off using
statistical aggregations of the limited data available on clients
or alternatively relying on their traditional method of supposedly
complex and holistic clinical judgments, that is, subjective
interpretations based on all data available to them. Meehl reviewed
some 20 studies and discovered, provocatively, that the statisti-
cal method of prediction was superior to what is known as the
“clinical” method.
At one level, one might have thought that this finding would
have been welcome. After all, the costs of clinical prediction are
high. If a method could be devised that was both cheaper and more
accurate, surely this would be in everyone’s interest. Nothing could
have been further from the case. Clinicians were outraged by the
implications of Meehl’s (1954) study. The use of statistical formulas
instead of trained professionals was seen as degrading. The study
also struck at the heart of an important debate in the philosophy
underlying clinical psychology, namely, the extent to which the
science should be nomothetic (concerned with general laws that
apply to groups of people) or idiographic (concerned with particu-
lar individuals). Many clinicians who found Meehl’s results dis-
tasteful were clearly in the latter group (Holt, 1962).
The most eloquent—and persistent—of Meehl’s critics has been
Holt (1958, 2004). It is therefore instructive to consider the kinds of
arguments that were brought to bear against Meehl’s (1954) find-
ings. In Holt (1958), we find several attempts to suggest that com-
paring clinical and statistical judgment in the manner done by
Meehl was just inappropriate. Holt stated that “clinicians do have
a kind of justified grievance against Meehl, growing out of his for-
mulation of the issues rather than his arguments, which are sound”
(p. 1). He went on to argue that the process of clinical prediction
involves various phases and that Meehl’s comparisons did not
match like with like and thus “in none of the 20 studies Meehl cites
were the comparisons pertinent to the point” (p. 4). In other words,
Holt rejected both the problem, as formulated by Meehl, as well as
the specific comparisons he made, as being irrelevant. He also went
on to suggest a conceptual framework for prediction that he claimed
was more “scientific” than the studies reviewed by Meehl.
Holt’s article contains many good points about aspects of the
clinical process where human judgment is essential. And yet, he
never wanted to accept that there are situations where the benefits
of clinical judgment might be replaced by the consistent use of
statistical decision rules (cf. Goldberg, 1970). Also, it is clear that
there are problems for which it is infeasible to build adequate sta-

tistical models and where clinical judgment is necessarily better
than actuarial formulas (see, e.g., Meehl’s 1954 discussion of “bro-
ken-leg” cues; also Yaniv & Hogarth, 1993). Indeed, Garb’s (1998)
comprehensive review shows that clinical judgments are far from
being universally ineffective in a relative sense.
In the half century that followed the publication of Meehl’s book,
many studies have reinforced the original findings (see, e.g., Dawes,
Faust, & Meehl, 1989; Kleinmuntz, 1990; Sawyer, 1966). In 2000, a
meta-analysis by Grove and colleagues summarized the results
of 136 studies comparing clinical and statistical judgments across
a wide range of task environments. Their findings did not show
that statistical methods were always better and, in fact, they identi-
fied a few studies in which clinical judgment was superior. On the
other hand, they summarized their results by stating:
We identified no systematic exceptions to the general superi-

ority (or at least material equivalence) of mechanical predic-
tion. It holds in general medicine, in mental health, in
personality, and in education and training settings. It holds for
medically trained judges and for psychologists. It holds for
inexperienced and seasoned judges. (Grove, Zald, Lebow,
Snitz, & Nelson, 2000, p. 25)
As evident from this meta-analysis, it is clear that the implica-

tions of Meehl’s original insights go beyond the clinical–statistical
debate in psychology and apply to any area of activity where data
need to be aggregated in a consistent manner. Computers are just
much better at this task than humans and yet, depending on the
kind of task that is considered, people have difficulty in accepting
this fact. Let me illustrate.
In 1972, Hillel Einhorn published a study of judgments made
by physicians who were experts on a certain form of cancer. The
physicians’ task was to view biopsy slides taken from patients and
to (a) define the level of presence/absence of different indicators of
disease in the slides and (b) estimate the overall severity of the dis-
ease as evidenced by the slides. Einhorn used the study to demon-
strate the combined effectiveness of humans and computers as
opposed to the use of humans or computers alone. He did this
by showing that a statistical model that aggregated the physicians’
judgments of levels of indicators of disease in the slides, that is,
(a) above, was a more effective predictor of outcomes (length of
patients’ survival) than the physicians’ severity judgments alone,
that is, (b). Einhorn’s point was that better outcomes could be
achieved by a system of “expert measurement and mechanical com-
bination” than by a system that only relied on the expert physicians.
In this particular case, the physicians’ judgments of (a) were

essential to the development of the model because there was no
other way of measuring these cues. Einhorn’s point was not to
denigrate the expertise shown by the physicians in their reading
of the biopsy slides. However, the physicians felt quite clearly
that the study was an unfair condemnation of their abilities and
became quite defensive about it.1 In fact, I subsequently used the
same dataset in my PhD thesis (Hogarth, 1974). When I attempted
to contact the physicians with questions, they were so upset over
the questioning of their judgment that their initial reaction was that
I should not be allowed to use the data.
A further illustration arises from an experience involving a large
academic program. Here, the director of admissions spent an
enormous amount of time each year reading applications before
using “clinical” judgment to make decisions. A faculty committee
studied the admissions process and suggested using a statistical
model based on the information in the application files. The sug-
gestion was not well received even though it was stated that the
model should only be used to pick the top 10% for admission and
to reject the lowest 10% (thereby economizing some 20% of appli-
cation reading time). The director clearly felt that the model was an
intrusion into his domain of expertise (see also Dawes, 1979).
Moreover, it would no longer allow him to claim that he read all
files personally.
On the other hand, there are situations where the clinical–statis-
tical controversy is well understood and has huge economic
consequences. Consider, for example, the use of credit scoring by
banks and finance companies. For many kinds of accounts, these
corporations no longer rely on human judgment when granting
credit. Instead, they rely on simple models with a handful of vari-
ables (sometimes as few as one or two) to predict which potential
clients are or are not good credit risks. (For an interesting applica-
tion of when telephone companies should require deposits of new
customers, see Showers & Chakrin, 1981.) In these applications,
economic incentives certainly seem to make a difference in the
acceptance of “mechanical” decision making.
In summary, if—in several professional domains—human judg-
ments using all available information were replaced by statistical
models using only a few variables, the accuracy of predictions
could be increased significantly. Perhaps the major obstacle to
this occurring is the belief that complex problems require complex
professional assessment (such as holistic clinical judgment),
1. Parenthetically, by a peculiar twist of fate, Einhorn in fact suffered

from the same disease that the physicians were attempting to predict.
which is always better than simple models based on a few vari-

ables. When economic incentives for making accurate predictions
are both large and visible, however, such resistance is more likely
to be overcome.
Simple Models in Time Series
A critical operational concern in economics and business (private

and public) is the forecasting of many different time series. Consider,
for example, data concerning imports and exports across time, the
supply and demand for specific products and classes of goods,
inventories, and various economic indicators. Forecasting these
variables with a reasonable level of accuracy is essential because,
without good forecasts, individuals and firms cannot plan and eco-
nomic activity suffers.
Since the 1950s and 1960s the availability of computers has
considerably increased the ability to forecast millions of time
series. At the same time, theoretical statisticians have spent con-
siderable effort developing increasingly sophisticated methods
for determining patterns in time series with the ostensible objective
of achieving better predictions.
However, it was not until the 1970s that statisticians first started
to question which particular methods might work best for predict-
ing actual series in practice. These first studies (e.g., Newbold &
Granger, 1974) compared relatively few methods (see below) and,
although their results were not unambiguous, they were generally
supportive of the complex status quo models in the theoretical
statistical literature (Box & Jenkins, 1976).
In 1979, Spyros Makridakis and Michèle Hibon (at the time
comparatively unknown researchers) broke with tradition by pre-
senting a paper at the prestigious Royal Statistical Society in which
they compared the out-of-sample forecasting performance of 22
forecasting methods on 111 time series they had obtained from var-
ious sources in business and economics. Their methodology was
conceptually simple: Separate each time series into a fitting phase
and predictive phase; fit all models on the fitting data; use the fitted
models to make predictions for the predictive phase; and compare
predictions with actual outcomes (i.e., similar to cross-validation
in using multiple regression).
The results surprised even the authors: “If a single user had to
forecast for all 111 series, he would have achieved the best results
by using exponential smoothing methods after adjusting the
data for seasonality” (Makridakis & Hibon, 1979, p. 101). In other
words, a very simple model (that essentially combines only the last
few observations) outpredicted many complex and statistically
sophisticated models that used many variables and provided closer

fits to the data in the fitting phase of the analyses. The essential
point made by Makridakis and Hibon was also conceptually simple:
The real processes underlying time series in business and econom-
ics do not fully conform with the assumptions of complex statis-
tical models, and thus extreme caution should be taken when
predicting out-of-sample. Moreover, assumptions made by simple
models are more robust against such violations and, on this basis,
should be preferred to complex models. Thus, even though the
complex models can fit past data well, their predictive ability
in future samples falls short of the performance of their simpler
counterparts.
Comments made at the meeting, and afterward, were published by
the Journal of the Royal Statistical Society and make interesting
reading today. Between the compliments for conducting a demanding
empirical study and legitimate questions about methodology, there
were several published statements that were clearly intended to
dismiss the results. For example, one prominent commentator stated:
If the series conforms to an ARMA model, and the model has

been fitted correctly, then the forecast based on this ARMA
model must, by definition, be optimal. (Apart from the ARMA
model, all the other forecasting methods considered are of an
ad hoc nature. The ARMA method involves model fitting and
its performance depends to a large extent on the ability of the
user to identify correctly the underlying model.) (Italics and
parentheses in original; Priestley, 1979, p. 128)2
The commentator did not appear to be concerned by empirical

evidence and also hinted that the investigators had not followed
appropriate procedures (note the last sentence quoted). Other
commentators wondered whether there was something peculiar
about the particular time series the authors had assembled. One
went so far as to state that Makridakis’s competence to perform
appropriate time-series analyses should not be trusted.
Makridakis’s reactions since 1979 have been exemplary. In 1982,
he published results of the so-called M-competition (Makridakis
et al., 1982), in which experts in different forecasting methods were
invited to predict 1,001 series (thereby avoiding the criticism that
he had used the methods inappropriately). In 1993, results of the
M2-competition were made available (Makridakis et al., 1993). This
competition was similar to the M-competition in that experts were
invited to use their own methods. It differed, however, in that there
2. ARMA stands for auto-regressive moving average.

were fewer forecasts but these were conducted in real time (e.g.,
participants were asked to provide a forecast for next year).
Moreover, forecasters could obtain background and qualitative data
on the series they were asked to forecast (a criticism of the
M-competition was that experts lacked access to important contex-
tual information). Finally, in the M3-competition (Makridakis &
Hibon, 2000), forecasts were prepared for several models using
3,003 time series drawn from various areas of economic activity
and for different forecast horizons. All of these M-competitions
(along with similar studies by other scholars) essentially replicated
the earlier findings of Makridakis and Hibon, namely, that
(a) statistically sophisticated or complex methods do not neces-

sarily provide more accurate forecasts than simpler ones. (b) The
relative ranking of the performance of the various methods varies
according to the accuracy measure being used. (c) The accuracy
when various methods are being combined outperforms, on
average, the individual methods being combined and does very
well in comparison to the other methods. (d) The accuracy of
the various methods depends on the length of the forecasting
horizon involved. (Makridakis & Hibon, 2000, p. 452)
One might imagine that, with this weight of evidence, the aca-
demic forecasting community would have taken notice and devel-
oped models that could explain the interaction between model
performance and task characteristics. However, there seems to be
little evidence of this occurring. For example, Fildes and Makridakis
(1995) used citation analysis in statistical journals to assess the
impact of empirical forecasting studies on theoretical work in
time-series analysis. Basically, their question was whether the con-
sistent out-of-sample performance of simple forecasting models
had led to theoretical work on illuminating this phenomenon. The
answer was a resounding “no”:
Empirical validation, comparative modeling and the choice

between alternative models (and methods) seem to have been
regarded as unimportant by theoreticians in the field of statistical
forecasting.…the evidence is straightforward: those interested
in applying forecasting regard the empirical studies as directly
relevant to both their research and to applications…those inter-
ested in developing statistical models…pay little attention or
ignore such studies. (Fildes & Makridakis, 1995, p. 300)
Ten years after this study was published, I contacted Makridakis

to ask whether the situation had changed in the interim. The answer
was no (Spyros Makridakis, personal communication, January 2005).
Once again, it seems that whereas direct economic incentives

have an important impact on the applied practice of forecasting,
scientists working on the theoretical side are not quick to see the
implications of negative evidence. As the quote from Makridakis
and Hibon (2000) above states, a simple model that involves
averaging different forecasts can be very effective, outperforming
more complex models. We next consider an analogous situation in
modeling human behavior.
“Optimal” Versus Equal Weighting
During their studies, most social scientists learn the statistical tech-
nique of multiple regression. Given observations on a dependent
variable yi (i = 1, . . ., n) and k independent or predictor variables xij
(j = 1, . . ., k), the budding scientists learn that the “best” predictive
equation for y expressed as a linear function of the xs is obtained by
the well-known least-squares algorithm. The use of this technique
(and more complex adaptations of it) is probably most common in
hypothesis testing. Is the overall relationship statistically signifi-
cant (i.e., is population R2 > 0?). What are the signs and relative
sizes of the different regression coefficients? Which are most impor-
tant? And so on.
In addition to fitting data, another important function of multiple
regression is to make predictions. Given a new so-called hold-out
sample of xs, what are the associated predicted y values? In using a
regression equation in this manner, most researchers appreciate
that the R2 achieved on initial fit of the model will not be matched
in the predictive sample due to “shrinkage” (the smaller the ratio
n/k, the greater the shrinkage). However, they do not question that
the regression weights initially calculated on the “fitting sample”
are the best that could have been obtained and thus that this is still
the optimal method of prediction. They should.
In 1974, Dawes and Corrigan reported the following interesting
experiment: Instead of using weights in a linear model that have
been determined by the least-squares algorithm, use weights that
are chosen at random (between 0 and 1) but have the appropriate
sign. The results of this experiment were most surprising to scien-
tists brought up in the tradition of least-squares modeling. The
predictions of the quasi-random linear models were quite good
and, in fact, on all four datasets Dawes and Corrigan analyzed, they
were better than the predictions made by human judges who had
been provided with the same data (i.e., values of the predictor
variables). This result, however, did not impress referees at the
Psychological Review who rejected the paper. It was deemed “pre-
mature.” In addition, the authors were told that, despite their
results, differential regression coefficients are important for describ-

ing the strategies of judges. Subsequently, and before the paper
appeared in the Psychological Bulletin, Dawes presented the
results at a professional conference only to be told by distinguished
attendees that the results were “impossible.” On the other hand, it
should be added that some scientists who had heard one of Dawes’s
earlier talks on this subject tried the “method” on their own data-
sets and saw that it worked (Robyn Dawes, personal communica-
tion, December 2004).
Dawes and Corrigan (1974) outlined four reasons for the success
of their simple method: (a) in prediction, having the appropriate
variables in the equation may be more important than the precise
form of the function; (b) each predictor has a conditionally mono-
tone relationship with the criterion; (c) error may be present in
measurement; and (d) deviations from optimal weighting may not
make much practical difference. Subsequently, Einhorn and I exam-
ined the phenomenon analytically (Einhorn & Hogarth, 1975).
To do so, we first transformed the Dawes and Corrigan (1974) model
by assuming an equal-weight model (i.e., all regression coefficients
are given equal, rather than random, weight) subject only to knowing
the correct sign (zero-order correlation) of each variable. (This is the
same as Dawes and Corrigan’s model if one uses the expected values
of the random weights.) We then went on to show the rather general
conditions under which the predictions of such equal- or unit-weight-
ing (all weights equal to 1.0) models correlate highly with those of
models with so-called optimal weights calculated using least squares.
Furthermore, we indicated how predictions based on unit weights
are not subject to shrinkage on cross-validation and that there are
conditions under which such simpler models predict more accurately
than ordinary least squares. In fact, prior to the appearance of both
our paper and that of Dawes and Corrigan, several other papers had
hinted at these results (see, in particular, Claudy, 1972; Schmidt,
1971; Wilks, 1938). In addition, Wainer (1976) published an article in
the Psychological Bulletin with the catchy title “Estimating coeffi-
cients in linear models: It don’t make no nevermind” in which he also
showed that least-squares regression weights could often be replaced
by equal weights with little or no loss in accuracy.
By this time, with both empirical and analytical results avail-
able, one might imagine that users of regression techniques would
now be cautious in believing that regression coefficients are truly
“optimal.” Moreover, to show real effects of differential sizes of
coefficients, one should put estimated models to predictive tests
where equal-weight models provide a baseline. However, it is hard
to find examples of this level of understanding in the literature. It is
not that the original papers have been ignored. Indeed, according to
the ISI Web of Knowledge, the Dawes and Corrigan paper was cited
more than 600 times in the 20 years following its publication.
Moreover, a number of studies in the decision-making literature
have exploited the results. However, the implications of this work
have had surprisingly little impact on the methods of scientists
who make great use of regression analysis.
Economists, for example, are among the most sophisticated
users of regression analysis. I therefore sampled five standard text-
books in econometrics to assess whether young economists are
taught about ambiguity in regression weights and the use of bench-
marks of equal or unit-weighting models for prediction. The spe-
cific textbooks were by Goldberger (1991), Greene (1991), Griffiths,
Hill, and Judge (1993), Johnston (1991), and Mittelhammer, Judge,
and Miller (2000). The answer was an overwhelming “no.” The
major concern of the texts seems to lie in justifying parameter esti-
mates through appropriate optimization procedures. The topic of
prediction is given little attention, and when it is, emphasis is
placed on justifying the “optimal” regression coefficients in the
prediction equations that have been estimated on the data avail-
able. None of the books gives any attention to equal- or unit-weight-
ing models. In addition, in a handbook whose contributors were
leading econometricians, I located a chapter entitled “Evaluating
the predictive accuracy of models” (Fair, 1986), but even this chap-
ter showed no awareness of the equal-weight findings.
In psychology, on the other hand, the statistical theory underly-
ing the development of tests draws the attention of students to
the properties and use of equally weighted composite variables
(cf. Ghiselli, Campbell, & Zedeck, 1981). Indeed, the third edition
of Nunnally and Bernstein’s Psychometric Theory (1994) explicitly
devotes a section of a chapter (p. 154) to equal weighting—citing,
among others, Dawes and Corrigan (1974) and Wainer (1976). It is
notable that they emphasize the use of equal weights when ques-
tions center on prediction in applied problems.
How does one explain the relative lack of interest in equal
weights in economics when the case against naively accepting esti-
mates of regression coefficients has been made on both empirical
and analytical grounds? Perhaps the reason is that there is a huge
“industry” propagating the use of regression analysis involving
textbooks, computer software, and willing consumers who accept
analytical results with little critical spirit, somewhat similar in
manner to the use of significance tests in reports of psychological
experiments (cf. Gigerenzer, 1998b, 2004a). Just because ideas are
“good” does not mean that they will be presented in textbooks
and handed down to succeeding generations of scientists (see, for
example, the discussion by Dhami, Hertwig, & Hoffrage, 2004,
concerning Brunswik’s concept of representative design of experi-

ments, which has been largely overlooked).
It is important to recognize that the equal-weight model is a form
of averaging in that it correlates perfectly with the arithmetic mean
of the x variables (assuming equal standard deviations). Moreover,
much literature demonstrates that, when estimating uncertain
quantities, people underestimate the power of averages. Indeed, at
the beginning of the 20th century, even sophisticated scientists
such as Francis Galton were surprised to discover that the average
of uneducated guesses of many people could be quite accurate (one
case involved estimating the weight of an ox—see Surowiecki,
2005). Similarly, some time ago, social psychologists found that to
guess a quantity (e.g., the number of jelly beans in a jar), one of
the best methods was simply to average the estimates of different
individuals (Gordon, 1924). In addition, as noted above (Makridakis
& Hibon, 2000), the average of several forecasts is typically one of
the more accurate of the forecasts averaged (see also Hogarth,
1978).
This surprising property of averages is counterintuitive to many
people and has been “rediscovered” on many occasions. For exam-
ple, Larrick and Soll (2006) have documented that if a person wants
to make a prediction and can also obtain the advice of an expert,
that person is often better off averaging his or her own and the
expert’s opinions as opposed to differentially weighting one or the
other. The underlying rationale for the power of averaging several
judgments, forecasts, or variables is simple. Basically, imagine that
a prediction by one of k forecasters can be expressed as
zj = μ + δj + εj (1)
where μ represents the overall average of all k forecasters; δj repre-

sents any bias specific to forecaster j; and εj is an idiosyncratic error
term associated with forecaster j. Now, if one simply assumes that
δj and εj are uncorrelated and have means of zero across the
k forecasters, it follows that taking the arithmetic average is an opti-
mal strategy (since the expected value of the criterion is equal to μ).
Clearly such assumptions will not hold perfectly, but even if they
are only approximately true, the arithmetic average is a powerful
predictor.
It is puzzling why people have such trouble in appreciating the
power of the mean, but perhaps this also explains, in part, why
there is still such a common belief that it is important to find the
precise weights in regression analysis. In the next section, we will
see that good models can be simpler still, giving zero weight to—
and thus ignoring—some of the available information.
Discarding Relevant Information, or When “Less” Can Be “More”
In normative theories of choice, the values of alternatives are typi-

cally assessed by calculating a weighted sum of outcomes. Thus, in
expected utility theory, the utilities of outcomes are weighted by
their probabilities of occurrence. Similarly, in the additive form of
multiattribute utility theory, the utility of an alternative yi = (xi1, xi2,
. . ., xik) is determined by the function
k
U(y i ) = ∑ w u( x
j =1
j ij ) (2)
where U(.) and u(.) denote utility

y and the wj are weighting parameters
k
subject to the constraint that ∑ w j = 1 (see, e.g., Keeney & Raiffa,

j =1
1993).
Models such as Equation 2 (and the multiple regression models
considered in the previous section) have “gold standard” status in
decision making because they essentially define what is “optimal.”
Moreover, they seem to make good sense in that they consider all
the information and weight it appropriately. But do people need to
consider all the information when they make a decision? Could
they actually do “better” if they ignored some information?
One of the first researchers to examine this issue was Thorngate
(1980). Using simulations, Thorngate investigated how often vari-
ous heuristic decision strategies would select the highest expected
value alternatives from different choice sets. In short, the criterion
was a weighted sum (i.e., similar to Equation 2 above) and the heu-
ristic models only used part of the available information. For exam-
ple, the most successful strategy in the simulation was one that
assumed all probabilities were equal (akin to the equal-weight
models discussed earlier). Thorngate’s results were surprising in
that the most successful heuristics usually (75% or more of the time)
selected the best from two to four alternatives. Clearly, for models to
be effective, it is not necessary to use all the information.
Payne, Bettman, and Johnson (1993) conducted more simula-
tions of the same type but also specifically considered the extent
to which different heuristics involved various levels of effort
(conceptualized by the number of mental operations used in imple-
menting them). These investigators also used the criterion of a
weighted sum (e.g., similar to Equation 2) and further investigated
how different heuristics were subject to different task factors or
environment structure (e.g., levels of intercorrelations between
variables and the relative presence/absence of dominated alterna-
tives in choice sets). Once again, several heuristics that did not use
all available information performed quite well. However, as in

Thorngate’s study, no heuristic could possibly perform better
than the weighted sum of all information that was used as the
criterion.
The conclusion from these studies was that heuristics could
perform quite effectively but could never be better than using all
information (because of how the studies were constructed). This
view is known as the effort–accuracy trade-off (see chapter 2).
However, would it be possible to remove this design constraint and
observe situations where “less” is “more”? Moreover, while one
could justify employing models that use less information by accept-
ing an effort–accuracy trade-off, are there situations where one does
not have to make this trade-off?
In a 1996 paper, Gigerenzer and Goldstein indicated two ways
in which “less” might be “more.” Significantly, both involve the
use of a heuristic decision rule that exploits an environmental
“niche” to which it is well adapted. The first example involves the
use of the recognition heuristic (see also Goldstein & Gigerenzer,
1999, 2002, and chapters 5 and 6).
Imagine two people who have to choose between two alterna-
tives. One person knows very little about the situation but does
recognize one of the alternatives. She therefore chooses it. The
second person, on the other hand, recognizes both alternatives
and is generally quite knowledgeable about them. Normally, one
would expect the second person to be more likely to make the cor-
rect choice. However, imagine that the first person’s recognition
knowledge is fairly highly correlated with the criterion. As the
second person cannot use recognition to discriminate between the
alternatives, he must use his additional knowledge. Now, if his
additional knowledge is less highly correlated with the criterion than
the first person’s “recognition knowledge,” his choice will be less
accurate. Paradoxically, although the first person has “less” knowl-
edge, her predictive ability is “more” than that of the second.
The second phenomenon illustrated by Gigerenzer and Goldstein
(1996, 1999) was the surprising predictive ability of the take-the-
best heuristic. This is a simple, lexicographic decision rule for
binary choices where the decision depends on the first piece of
information examined that discriminates between the two alterna-
tives (with the information or cues consulted in the order of their
validity). When deciding between alternatives characterized by
binary cues, take-the-best is remarkably accurate despite typically
using only a fraction of the cue information available. In the tests
conducted by Gigerenzer and Goldstein (1996, 1999), take-the-best
generally outperforms equal-weight models (which use all avail-
able cues as described above) and even regression models on
cross-validation.
The effectiveness of take-the-best-like models has also been

demonstrated in important applied areas, such as medical decision
making (Breiman, Friedman, Olshen, & Stone, 1993; Green & Mehr,
1997; see chapter 14). But it is not clear that the implications
have yet been realized to the advantage of both patients and physi-
cians (i.e., faster and more accurate diagnoses). In medicine, in
particular in a litigious environment such as the United States, pro-
fessionals would appear to want to be seen to examine all informa-
tion even if it is unnecessary (see Gladwell, 2005, and chapter 17
for related issues in simplifying medical decision making).
It is interesting to note that these results also contradict the
intuitions of researchers who study decision making. For example,
in a poster session on this topic at a professional conference
attended by many leading researchers in decision analysis (the
Behavioral Decision Research in Management conference held at
Duke University in 2004), I created a competition by asking people
to predict the performance of decision rules, including equal
weighting and take-the-best, applied to simple environments. The
prize for the best set of estimates was $20. However, the estimates
made, even by experienced decision analysts, did not match
reality: The effectiveness of the simple models was significantly
underestimated.
The Complexity of Accepting the Simple: The Case of Take-the-Best
To understand why people might find it difficult to accept the effec-

tiveness of simple decision rules, it is illuminating to consider the
factors that people commonly use when assessing the validity of
causal theories (Einhorn & Hogarth, 1986). One such factor is simi-
larity of cause and effect, which is often based on the congruity that
exists between the two in terms of length and strength. That is, we
expect large and complicated problems—or effects—to have com-
plex causes (e.g., poverty in developing countries does not have a
simple remedy) and are surprised when small causes have large
effects. Consider, for example, how Pasteur’s germ theory must have
seemed incredible to his contemporaries in suggesting that deaths
and plagues were deemed to be caused by miniscule (invisible)
creatures. Similarly, that complex decision problems can be resolved
satisfactorily by ignoring information or using simple aggregation
rules (or both) seems, a priori, an implausible proposition.
There is another reason why it may be difficult to accept simplicity.
To establish the validity of a simple solution, two conditions
seem necessary. One is repeated empirical verification. The other is
theoretical argument. The former requires time and opportunities;
the latter requires the development of a convincing explanation.
I now illustrate this by explicating some of the theoretical work that

explains why the simple take-the-best heuristic works so surpris-
ingly well.
Recall that take-the-best involves deciding between two alterna-
tives that are evaluated on binary (i.e., 0/1) cues. The key is that the
cues are ordered by their (unconditional) validity (see chapter 2)
and that the choice is made by the first cue that discriminates (i.e.,
the first cue for which the binary cue values for the two alternatives
differ). I first consider the performance of take-the-best in error-free
environments (i.e., where object criterion values are precise,
weighted combinations of cue values with no errors) and then in a
more general case that allows for error.
Analyzing the first situation, Martignon and Hoffrage (1999,
2002) discussed two classes of environments: noncompensatory
environments where the most important cues (in terms of their
impact on predicting criterion values) cannot be outweighed by any
combination of less important cues, and compensatory environments
where one or more cues can be outweighed—compensated for—by
a combination of less important cues. In noncompensatory envi-
ronments, they showed, take-the-best cannot be beaten by any
linear combination of cues in fitting the data. However, take-the-
best also performs well in compensatory environments. How does
this happen? Natalia Karelaia and I studied this by simply enumer-
ating the choices made by take-the-best—across wide ranges of
compensatory environments—for all possible patterns of between
three and five cues (Hogarth & Karelaia, 2006b). The questions we
asked were (a) when did take-the-best make mistakes, that is, for
which pairs of alternatives across all possible pairs, and (b) how
often did these mistakes occur? We also analyzed the performance
of another heuristic, a generalization of take-the-best called DEBA
(deterministic elimination-by-aspects; Hogarth & Karelaia, 2005b).
The results of these theoretical enumerations showed (surprisingly)
that, in error-free environments, take-the-best and DEBA were
remarkably accurate even in highly compensatory environments
(i.e., when more important cues are frequently outweighed by com-
binations of less important ones). The only way to make take-the-best
ineffective is to construct choice environments that are highly popu-
lated by precisely those few pairs of alternatives where take-the-best
makes incorrect choices. How such pairs are distributed in naturally
occurring situations is, of course, an open, empirical question.
Further insight into understanding the effectiveness of take-the-
best (and DEBA) in error-free environments was made by Baucells,
Carrasco, and Hogarth (2008), who exploited a concept known
as cumulative dominance. To illustrate, consider two alternatives,
A and B, with cue profiles of A = (1, 1, 0) and B = (1, 0, 1). Clearly,
A does not dominate B on a cue-by-cue basis (1 ≥ 1, 1 ≥ 0, but 0 ≤ 1),
but it does in the cumulative sense (across the cues); that is, 1 ≥ 1;
1 + 1 > 1 + 0; and 1 + 1 + 0 ≥ 1 + 0 + 1. Baucells et al. showed that
cumulative dominance is quite pervasive in choice situations
involving binary cues and that any decision rule that makes
choices in accordance with cumulative dominance will perform
well. Because weights in take-the-best (and DEBA) are ordered from
large to small, it follows that take-the-best and DEBA both comply
with cumulative dominance and this explains, in part, their effect-
iveness.3
In short, there is now ample theoretical analysis showing that
take-the-best will make effective choices in error-free environments
where the importance of cues is unknown. What happens in the
presence of error and when there is uncertainty about the true
importance of cues? And to what extent does the success of take-
the-best depend on the fact that it uses binary cues as inputs?
To study these questions, Karelaia and I developed theoretical
models that can be used to compare and contrast the performance
of different heuristics across different environments using cues
that are both binary and continuous in nature (Hogarth & Karelaia,
2005a, 2006a, 2007). We examined environments where a criterion
was generated by a linear function of several cues and asked to
what extent different simple models could be expected to choose
correctly (i.e., choose the highest criterion alternative) between
two or more alternatives. That is, by characterizing the statistical
properties of environments (see below), one should be able to pre-
dict when particular simple rules would and would not work
well.
The outcome of this work is that, given the statistical description
of an environment in terms of correlations between normally
distributed cues and the criterion, as well as correlations among the
cues themselves, precise theoretical predictions can be made as to
how well different heuristics will perform.4 What we find is that, in
general, when the characteristics of heuristics match those of the
environment, they tend to predict better. Indeed, our own summary
noting differences in performance between TTB (take-the-best), SV
3. The arguments provided by Baucells et al. (2008) are, in fact, more

sophisticated than the key idea presented here. In addition, they require
some auxiliary assumptions concerning weighting functions to reach their
conclusions. On the other hand, they are able to provide results for DEBA
with up to 10 cues and 10 alternatives.
4. The technical details involve properties of normal distributions, the
creation of binary data by median splits of continuous variables, and—in
the case of DEBA—extensive use of probability theory and partial correla-
tions to define the probability that the steps taken by DEBA will result in
appropriate eliminations of alternatives (Karelaia & Hogarth, 2006).
(a lexicographic model based on a single variable), EW (equal

weighting), and CONF (the confirmation model, see below) states:
First, the models all perform better as the environment

becomes more predictable. At the same time, differences in
model performance grow larger.
Second, relative model performance depends on both how
the environment weights cues (noncompensatory, compensa-
tory, or equal weighting) and redundancy. We find that when
cues are ordered correctly, (a) TTB performs best in non-
compensatory environments when redundancy is low; (b) SV
performs best in noncompensatory environments when redun-
dancy is high; (c) irrespective of redundancy, EW performs
best in equal-weighting environments in which CONF also
performs well; (d) EW (and sometimes TTB) performs best in
compensatory environments when redundancy is low; and
(e) TTB (and sometimes SV) performs best in compensatory
environments when redundancy is high. (Hogarth & Karelaia,
2007, p. 746)
Karelaia and I summarized our work by saying that people do not

need much computational ability to make good decisions (i.e., they
can use simple models), but they do need task-specific knowledge
or “maps” to know when a strategy is appropriate (Hogarth &
Karelaia, 2006a, p. 141). This, we believe, is what lies at the heart
of expertise in making decisions in specific domains. How people
acquire such expert knowledge is a very important issue.
Finally, noting that people often may not know precisely what to
do in a particular domain—they may not be experts—Karelaia
(2006) has suggested the use of strategies that hedge against one’s
lack of knowledge. Using both simulation and theoretical analyses,
she has shown that one such strategy that searches for two dis-
criminating cues in agreement performs quite well relative to
other rules such as take-the-best or equal-weighting across several
task environments (Hogarth & Karelaia, 2007; Karelaia, 2006). This
is the CONF rule, so called because it seeks confirmation after the
first discriminating cue (it is referred to as take-two in chapter 10).
The Road to Enlightenment
As the cases reviewed in this chapter indicate, people—both in

science and everyday life—are slow to accept evidence that chal-
lenges their beliefs, particularly when they have a stake in those
beliefs. Surprisingly, this resistance occurs even in situations
where the new beliefs would be simpler than the previously held
ones. At one level, I see this as the inevitable consequence of a

dilemma that has to be managed continuously by all living systems,
that is, the simultaneous need to adapt to change and yet maintain
continuity and stability across time. Moreover, adapting to per-
ceived change can involve two kinds of errors (i.e., adapting when
one should not, and not adapting when one should) and the costs
of error are not necessarily symmetric. Thus, without trying to
rationalize what might seem to be dysfunctional behavior, it is
legitimate to ask what conditions favor the adoption of new ideas
that challenge the status quo and what, if anything, can be done to
improve present practice.
Economic incentives may play an important role. For example, it
is clear from the forecasting case study that practitioners in indus-
try accept the implications of the time-series competitions even
though theoretical statisticians might not share their enthusiasm.
Perhaps other incentives could be used?
Two related proposals have been made. Some 25 years ago,
Hofstee (1984) suggested that scientists engage in a system of repu-
tational bets. That is, scientists with contradictory theories can
jointly define how different outcomes of a future experiment should
be interpreted (i.e., which theory is supported by the evidence). In
this scheme, the scientists assess probability distributions over the
outcomes (thereby indicating “how much” of their reputational cap-
ital they are prepared to bet) and a third, independent scientist runs
the experiment. The outcomes of the experiment then impact on the
scientists’ reputational capitals or “ratings.” However, I know of no
cases where this system has actually been implemented.
A similar scheme involves a proposal labeled “adversarial col-
laboration.” Here again, the disagreeing parties agree on what
experiments should be run. An independent third party then runs
the experiment, which all three publish jointly. Unfortunately, it is
not clear that this procedure resolves disputes. The protagonists
may still disagree about the results (see, e.g., Mellers, Hertwig, &
Kahneman, 2001).
One way to think about our topic is to use the analogy of the
marketplace for ideas where, when the market is efficient, ideas
that are currently “best” are adopted quickly. However, like real
markets, in the conduct of science people still find ways to circum-
vent regulations. In the final analysis, the market for scientific ideas
can only be efficient in a long-run sense. Unfortunately, as implied
in a famous statement by Lord Keynes, our lives do not extend that
far. This is not to suggest adopting a pessimistic cynicism. Each
generation does see scientific progress and the accessibility of infor-
mation has increased exponentially in recent years. The road to
enlightenment, and simplicity, however, is bumpy.
4
Rethinking Cognitive Biases as
Environmental Consequences
Gerd Gigerenzer
Klaus Fiedler
Henrik Olsson
The discovery of a general psychological law takes us

only halfway. We must now ask what general property of
the world is reflected in this general law.
Roger N. Shepard
I llusions have played a major role in shaping our understanding

of human perception. Consider the dots on the left-hand side
of Figure 4-1. They appear concave and recede into the surface. The
dots on the right side, however, appear convex and extend toward
the observer. If you turn the page upside down, the concave dots
will turn into convex and vice versa. What can we learn from this
illusion? The most important lesson is that the world, from the per-
spective of our mind, is fundamentally uncertain. Our brain does
not know for certain what is out there, but it makes a good bet,
based on the structure of its environment or what it assumes is
its structure. The brain assumes a three-dimensional world and
uses the shaded parts of the dots to guess in what direction into
the third dimension they extend. The two relevant ecological
structures are that light comes from above and that there is only one
source of that light (Kleffner & Ramachandran, 1992). This was true
in human (and more generally terrestrial) history, where the sun
and the moon were the only sources of light, and only one of
them shone at a time. The first regularity also holds approximately
for artificial lights today, which are typically placed above us.
This is only one example of many demonstrating that perceptual
illusions are consequences of a perceptual system that is adapted to
the structure of an uncertain world (Howe & Purves, 2005). The
illusion in Figure 4-1 is not a fallacy, or a sign of a deficient system,
80
RETHINKING COGNITIVE BIASES AS ENVIRONMENTAL CONSEQUENCES 81
Figure 4-1: The cognitive system infers that the dots in the left pic-
ture are curved inward (concave), away from the viewer, while the
dots in the right picture are curved outward (convex), toward the
viewer. If you turn the book upside down, the inward dots will
pop out and vice versa. The right picture is identical to the left but
rotated 180 degrees.
but rather the outcome of a highly intelligent system that goes

beyond the information given. Every intelligent system has to make
bets, and thus sometimes also mistakes (Gigerenzer, 2005).
Cognitive illusions (or cognitive biases) have also played a major
role in shaping current research in cognition, especially in judg-
ment and decision making (e.g., Kahneman, Slovic, & Tversky,
1982; Kahneman & Tversky, 1996). Research on cognitive biases
has the potential to illuminate the processes that underlie judg-
ment, just as perceptual illusions can inform us about perceptual
processes. But if we study biases without analyzing the structure of
their environment, we can end up proposing processes that gener-
ate cognitive fallacies where none actually exist.
In this chapter we argue that cognitive processes and their adap-
tive functions can hardly be understood if we look exclusively
inside the mind, searching for rules of global rationality or irratio-
nality. Rather, it is essential to analyze the adaptive match between
cognitive and ecological structures. Ecological structures have
shaped cognitive evolution in the past and impose constraints on
cognitive functions in the present (see chapter 1). At the same time,
these structures can enable cognition to make quick and smart
inferences, such as when perceptual mechanisms use long-term
stable facts about the world in which we have evolved (e.g., it is
three-dimensional and light comes from above) to make an infer-
ence (e.g., the dots are concave or convex—see Barlow, 2001;
Shepard, 1994/2001). At issue are the questions posed in cognitive
research, not only the answers found. Such questions as “Do
people overestimate low risks and underestimate high risks?” or
“Do people have prejudices against minorities?” which we will

consider in this chapter, are posed in an internalistic way, and so
are most answers given in the scientific literature. The fact that
there are often contradictory answers proposed for a given behav-
ioral question may well have to do with the neglect of the structure
of the environment and the key constraints it provides for under-
standing behavior. Finding the right answer to the wrong question
is known as a Type III error.
This chapter provides a review not of a particular phenomenon
or content area, but of a theoretical issue that covers various areas,
thereby linking apparently unrelated topics in psychology. It can be
read in two ways, as an ecological perspective on cognition, and as
a critique of the research paradigm cognition-without-environment,
which may repeatedly have misled us to ask the wrong questions.
We include both cases where an ecological analysis makes it evi-
dent that previous, purely internal cognitive explanations are
unsupported and cases where an ecological analysis provides
an interesting alternative explanation to be tested against purely
cognitive accounts.
An Ecological Perspective
The study of the mind as an entity embedded in its environment

has led to several research programs that differ in crucial respects
(e.g., Brunswik, 1955; Gibson, 1979; Shepard, 1987a, 1994/2001;
Simon, 1955a, 1956; see Todd & Gigerenzer, 2001). It is not our
intention to favor one of these ecological approaches. Rather, we
take up the common thread in all of them, which is that the mind
needs to be understood as embedded in its environment. The work
of most ecologically minded psychologists, such as Barlow, Gibson,
and Shepard, has focused on perception. We instead will address
so-called higher order cognition. Our aim is to show that ecologi-
cal structures contribute greatly to the explanation of phenomena
for which cognitive and, sometimes, motivational and emotional
causes have been traditionally proposed and widely accepted on
plausibility grounds. A complete ecological analysis would involve
not only the structure of the environment, but also an appreciation
of the structure of the mind, that is, the cognitive representations
and processes responsible for any observed behavior. In this chap-
ter, however, we restrict the discussion to the general argument that
an unbiased mind plus environment structure is sufficient to pro-
duce phenomena previously associated with internal factors
(but see chapter 2 for the advantages of a biased mind). Judgment
and behavior are not good or bad, rational or irrational, per se, but
can only be evaluated in relation to an environment––just as all
adaptation is, in principle, context bound. We will show that

the analysis of ecological structures can provide an alternative
description of behavior in various areas of psychology and a reeval-
uation of norms of nationality.
The study of ecological structures covers the real-world environ-
ments in which people and other animals live, the artificial task
environments of experiments, and the relation between the two
(e.g., Anderson & Schooler, 1991; Dhami, Hertwig, & Hoffrage, 2004;
Hammond & Wascoe, 1980; Juslin, Olsson, & Björkman, 1997;
McKenzie, 1994; Oaksford & Chater, 1994; Payne, Bettman, &
Johnson, 1993; Todd, 2001). An ecological analysis takes into
account the distributions of environmental properties, the amount
of experience an organism has with a certain environment, to what
degree the stimuli in a specific task are representative of the envi-
ronment, and the translation from internal representations to
observable overt behavior. Failure to appreciate these factors may
lead to erroneous and contradictory conclusions about cognitive
processes and representations. For example, in research on realism
of confidence judgments, which we will return to later in this
chapter, it has been shown that unbiased cognitive processes are
perfectly compatible with both over- and underestimation of one’s
own knowledge and abilities.
The ecological framework sketched here can be used as a general
guide in an ecological analysis, but it is not meant as a full descrip-
tion of the environment. We still lack a terminology with which we
can fully conceptualize structures of environments with respect to
higher order cognition. We limit our analysis here to three struc-
tures of environments: the basic statistical characteristics of envi-
ronmental distributions.
Three Moments of Statistical Distributions

We focus on a characterization of environment structure in terms
of the first three moments of statistical distributions of variables.
The environmental information in a distribution of values of
something––whether a simple frequency distribution or a sampling
distribution––can be represented by three statistical moments
(Figure 4-2): the mean or central tendency (first moment), the vari-
ance or variability (second moment), and the “skewness” or preva-
lent trend (third moment). Our working hypothesis is that patterns
of judgment that reflect the moments of environmental distribu-
tions have often been misattributed solely to internal factors, such
as motivational and cognitive deficits, because of lack of attention
to the person–environment system. Just as in Figure 4-1, where we
see a concave or convex shape although there is none, the logic of
human judgment is likely to be misunderstood as infested with a
Skewness
Mean
Variance
Figure 4-2: The three moments of a statistical distribution: the

mean or central tendency (first moment), the variance or variability
(second moment), and the skewness (third moment).
strange error unless one analyzes the structure of the environment,

or more precisely, the ecological structure that our brains expect.
First Moment: Mean

Let us illustrate the three moments of a distribution with a classic
study of risk perception (Lichtenstein, Slovic, Fischhoff, Layman,
& Combs, 1978; Slovic, Fischhoff, & Lichtenstein, 1982). Here, we
are interested in the distribution of the prevalence of causes of
death, and of people’s estimates of these values. In one experiment,
college students were asked to estimate the frequency of 41 causes
of death in the United States, such as botulism, tornado, and stroke.
Figure 4-3 shows the result, plotting the mean estimated frequen-
cies against actual frequencies from public health statistics. The
overall accuracy of judgments seems quite poor, revealing two
apparently systematic biases in people’s minds. The primary bias
was the tendency that the mean estimated frequencies for rare
causes of death were higher than the actual frequencies, and the
mean estimated frequencies for common causes were lower. For
instance, the average estimate for botulism was higher and that for
heart disease was lower than the actual numbers. This phenome-
non was interpreted as a systematic fallacy of overestimation and
underestimation. The secondary bias consisted in the over- or
underestimation of specific causes relative to the best-fitting (qua-
dratic) curve (see Figure 4-3). Lichtenstein et al. (1978) concluded
1,000,000
Estimated Number of Deaths per Year
ALL ACCIDENTS
ALL DISEASE
100,000 MOTOR VEH. ACC.
ALL CANCER
HEART DISEASE
STROKE
10,000 HOMICIDE
STOMACH CANCER
PREGNANCY
DIABETES
FLOOD TB
1,000 TORNADO ASTHMA
BOTULISM
ELECTROCUTION
100
SMALLPOX VACCINATION
10
1
1 10 100 1,000 10,000 100,000 1,000,000
Actual Number of Deaths per Year
Figure 4-3: Relationship between estimated and actual number of

deaths per year for 41 causes of death in the United States. Each
point is the mean estimate (geometric mean) of 39 students; verti-
cal bars show the variability (25th and 75th percentile) around the
mean estimates for botulism, diabetes, and all accidents. For low-
frequency causes of death, the mean estimated number is higher
than the actual frequency; for high-frequency causes this number is
lower. This pattern has been called the “primary bias.” The curved
line is the best-fitting quadratic regression line. (Adapted with per-
mission from Slovic, Fischhoff, & Lichtenstein, 1982.)
that “improved public education is needed before we can expect

the citizenry to make reasonable public-policy decisions about
societal risks” (p. 577). These biases became widely cited in the
debate over the public’s competence to participate in political
decision making with respect to nuclear power and other modern
technologies with low probabilities of high damages.
The two biases have been attributed to various cognitive and
motivational causes, with hypotheses ranging from availability to
affect to people’s pessimism (Shanteau, 1978). The “availability”
heuristic was invoked to account for the primary bias: “The best-fit
curve is too flat, relative to the perfect-prediction identity line.
That would occur if respondents used the heuristic briefly (as the
study required), allowing little opportunity to appreciate fully the
differences between very large and very small risks” (Fischhoff,
2002, p. 737). Availability was also invoked to account for the
secondary bias regarding particular risks: “Overestimated causes
were dramatic and sensational, whereas underestimated causes
tended to be unspectacular events, which claim one victim at a

time and are common in nonfatal form” (Slovic et al., 1982, p. 467).
It was also suggested that the phenomenon was due, at least in part,
to affect rather than cognition: “The highly publicized causes
appear to be more affectively charged, that is, more sensational,
and this may account for both their prominence in the media and
their relatively overestimated frequencies” (Slovic, Finucane,
Peters, & MacGregor, 2002, p. 414). The overestimation of negative
but statistically infrequent events has also been discussed as evi-
dence for people’s “genuine, psychologically meaningful pessi-
mism” (Armor & Taylor, 2002, p. 335), but at the same time as
evidence for unrealistic optimism, supposedly accounting for the
underestimation of negative high-frequency events.
Two additional, apparently unrelated factors have been called
“subjective” factors because they “shape lay definitions of risk”
(Fischhoff, 2002, p. 739), as opposed to the “objective” risk, defined
as the mean actual number of deaths (or other consequences). The
“unknown risk” factor refers to the attention people pay to the
uncertainty or unfamiliarity surrounding a technology’s risk, such
as when one is unfamiliar with the potential harms of a new tech-
nology. The “dread risk” factor refers to the catastrophic potential
of an event that has a low probability of occurrence but highly
severe consequences (Slovic, 1987), such as plane crashes. An eco-
logical analysis can provide a unified understanding of most of
these phenomena (albeit not of the secondary bias), relying on the
second and the third moment of the distribution.
Second Moment: Variance

The reported number of people killed by each of the 41 causes
varies over time or location, such as year or state, and also as a
result of measurement error. But the estimated frequencies actually
vary even more: The three vertical bars in Figure 4-3 indicate that
the estimates (25th to 75th percentiles) could differ by a factor of 10
or more. This variability is the conditional variance of the estimated
number Y given an actual number X of deaths. When the condi-
tional variances are larger than zero, a phenomenon occurs that is
known as regression toward the mean. Mathematically, the regres-
sion phenomenon can be derived in several different, essentially
equivalent ways (see Furby, 1973; Stigler, 1999). Informally, regres-
sion toward the mean is a property of any scatterplot where the
linear relationship between X and Y values is less than perfect, that
is, with a correlation less than 1. Under such circumstances the
standardized regression of Y on X will have a slope that is less than
45 degrees. As a result, the mean of the values of Y for a given value
of X will be closer to the mean of all values of Y than that value of

X is to the mean of all the values of X.
Variability in the environment alone is sufficient to produce the
primary bias, which may merely be due to regression toward
the mean. This regression reflects unsystematic error variance in
the environment rather than a systematic bias in the minds of
people. Figure 4-3 shows examples of nonzero conditional vari-
ance, illustrated by the three vertical error bars, which causes
imperfect correlations between actual and estimated frequencies of
death. These correlations ranged from 0.28 to 0.90 (median 0.66)
when calculated for each participant individually. When calculated
between the actual and mean estimated frequencies, the correlation
was 0.89 (Lichtenstein et al., 1978). Imperfect correlations in
turn cause regression toward the mean, so that the mean estimated
frequencies of death in Figure 4-3 regress to their overall mean.
Thus, this regression that has been interpreted as the participants’
primary bias can instead be deduced from the existence of unsys-
tematic (conditional) error variance without any systematic psy-
chological bias.
We can also demonstrate this argument empirically, adopting a
method of Erev, Wallsten, and Budescu (1994). If we estimate the
actual frequencies from the subjective frequencies rather than vice
versa, then we should get the mirror result: a pattern that looks like
the opposite of the primary bias, as if people underestimate low-
frequency causes and overestimate high-frequency causes. We do
not have the original data of the Lichtenstein et al. study, but there
exists a replication study with the same 41 causes of death (Hertwig,
Pachur, & Kurzenhäuser, 2005). In this replication, the correlations
between estimated and actual frequencies were imperfect, just as in
the original study, ranging from 0.12 to 0.99 (median 0.85) when
calculated for each participant individually. The correlation was
0.92 when calculated between the actual and mean estimated fre-
quencies (geometric means).
As Figure 4-4a shows, the result of the replication was quite
similar to the original depicted in Figure 4-3. Figure 4-4b shows
the regressions calculated in both directions. Let us first consider
the low-frequency causes on the left side of Figure 4-4b. When one
predicts the mean estimated number of deaths for each actual num-
ber––the U-shaped curve, as in Figures 4-3 and 4-4a––one finds that
the mean subjective estimates are higher than the actual values: the
primary bias. But now consider the data from the other direction:
For example, look at all the causes that participants said were low
frequency, at just 10 estimated deaths per year, and see how many
actual deaths were associated with each of those estimates. In this
contrasting case, when one looks at the mean actual numbers for
(a) 1,000,000
Estimated Number of Deaths per Year ALL DISEASE
ALL CANCER
100,000 ALL ACCIDENTS
MOTOR VEH. ACC.

HEART DISEASE
10,000
HOMICIDE
DIABETES
1,000
ASTHMA
TB
FIREARM ACC.
100 EXCESS COLD
10
1
1 10 100 1,000 10,000 100,000 1,000,000
(b) 1,000,000
Estimated Number of Deaths per Year
100,000
10,000
Estimated frequency predicted

1,000 by actual frequency
100
Actual frequency predicted
by estimated frequency
10
1
1 10 100 1,000 10,000 100,000 1,000,000
Figure 4-4: Replication of Lichtenstein et al.’s (1978) causes of

death study showing both the primary bias and its reverse. This
replication used the same 41 causes (7 are not shown because
their frequencies were zero in 1996–2000), 45 participants, and no
anchor. (a) When the data are displayed as in Figure 4-3, the results
show basically the same pattern. (b) The data and both best-fitting
quadratic regression lines, predicting mean estimates from actual
values and vice versa. One regression produces a pattern that looks
like the primary bias, whereas the other regression produces a pat-
tern that looks like the opposite bias. (Adapted with permission
from Hertwig, Pachur, & Kurzenhäuser, 2005.)
88
each estimated number––the second regression curve––one finds

that the subjective estimates are lower than the actual low-frequency
causes (e.g., for all causes estimated at 10 deaths per year, the
number of actual deaths was closer to 50). This is the opposite of
the primary bias—participants now seem to underestimate
low-frequency causes. A similar inversion can be shown for the
high-frequency causes. The first regression line would seem to sug-
gest the primary bias, and the second that people underestimate
low-frequency causes and overestimate high-frequency causes.
Both phenomena cannot be true at the same time. In fact, neither
of the two conclusions is justified, as are none of the specula-
tions about possible explanations in the human mind that disre-
gard the ecological structure. To sum, the present analysis shows
that the primary bias is largely a matter of regression stemming
from variance.1
This finding mirrors an observation by the British polymath
Sir Francis Galton, who discovered the regression phenomena in
the 1880s and called it reversion toward the mean. The sons of
small fathers were on average taller than their fathers, and the sons
of tall fathers were on average smaller than their fathers. However,
when Galton plotted that data the other way around, it appeared
that the fathers of small sons were on average taller then their sons,
and those of tall sons on average smaller. The first pattern seems to
suggest that the variability of the sons is smaller than that of the
fathers, the second that the variability of the fathers is smaller than
that of the sons. None of this can be concluded from the data.
The second moment can also account for the first of the two
“subjective” factors mentioned above, the observation that people
thinking about risks pay attention not only to the mean, providing
an estimate of the “objective risk,” but also to the “uncertainty”
or “ambiguity” of the risk, which corresponds to the variance
around the mean. For instance, the expected risks of new technolo-
gies tend to be given wide confidence intervals (i.e., people are
not sure just how high or low the risk is), whereas technologies
involving years of empirical studies are given smaller intervals or
variability. We must be careful not to make the common assump-
tion that people’s attention to variance is a subjective “aversion to
1. A close inspection of Figure 4-4b shows that the variance of the esti-
mated frequencies in the Hertwig et al. (2005) study is smaller than that of
the actual frequencies, unlike in the statistical model. This indicates that
regression accounts for most but not all of the primary bias. Stephen M.
Stigler (personal communication) suggested that the smaller variance of
subjective estimates could indicate that the participants were quite prop-
erly estimating the actual rates by a form of shrinkage estimation, which
has a firm Bayesian justification (Stigler, 1990).
uncertainty” or argue that “ambiguity and vagueness about proba-

bilities . . . are formally irrelevant in decision analysis” (Lichtenstein,
Gregory, Slovic, & Wagenaar, 1990, p. 96). This assumption is rooted
in a decision theory that pays attention only to the first moment,
but there are other possible theoretical stances. In portfolio theory
and in other areas of economics, risk refers to variance rather than
mean, as when the variability of the value of a stock is defined as its
risk. In foraging theory, too, mean total food gain is too crude a mea-
sure for fitness; the variance of the total food gain as well as its
mean can together predict the behavior of animals (Real & Caraco,
1986). Variance, like the mean, is a statistical property of the
environment. Thus, being sensitive to uncertainty need not be
seen as a psychological bias that interferes with people’s attention
to the “objective” risk defined by the expected mean, but may be
adaptive.
Third Moment: Skewness

Let us now consider the second “subjective” factor, dread risk, as
demonstrated in Slovic’s (1987) seminal work. The fear of cata-
strophic events has been evaluated as dissociated from rational
thinking, representing a “visceral response” (Fischhoff, Watson, &
Hope, 1984, p. 129). Again, an ecological analysis provides an alter-
native view. Catastrophe avoidance need not be seen as a socially
expensive “subjective” whim, but instead as adaptively appropri-
ate attention to the third moment of the frequency distribution
(Lopes, 1992). As Figure 4-2 illustrates, dread risk corresponds to
the skewness of the distribution. For high skew and high dread,
there is a small but appreciable probability of the death of a very
large number of people.
When and why should people attend to skewness? For insurers
and reinsurers, for example, the skewness of a distribution is as
important as variance and mean. Insurers work with a definition of
“catastrophic loss” as a loss of $100 million or more (Keykhah,
2002). For instance, the 10 years following 1997 represented a high-
water mark in catastrophic losses in the United States, given 35
natural events causing insured losses of over $239 billion, with a
$60 billion loss owing to Hurricane Katrina in 2005 at the top of
the list. For events with the potential of catastrophic losses like
these, insurers cannot only rely on the expected mean and hope
that the law of large numbers will take care of the variability of
losses. Rather, catastrophic risk is typically so infrequent in a
given area that there is little reliable data, making the expected
value difficult to compute in the first place (Taleb, 2007). Catastrophic
natural events tend to spread damage and affect a majority of prop-
erties insured in a region, which can make insurance companies
insolvent. Similarly, biologists argue that single deaths spread

over time have little damaging effect on a species or group of
individuals, whereas catastrophic losses may lead to the extinction
of a species or group if the population falls below a critical mass
(Wang, 1996). Thus, highly skewed distributions can demand
attention, and people’s attention may be perfectly reasonable. In
fact, skewness has been defined as an appropriate measure of
risk in a number of theories (Coombs & Lehner, 1981; Lopes, 1984;
Luce, 1980).
The three moments of statistical distributions do not cover all
ecological structures, nor are they all that are needed to account
for judgments of risk. For instance, it is hard to see how they
could explain why people tend to judge a risk lower if they believe
they are in control (Langer, 1982), or higher if strangers may be
behind it (Douglas, 1992). What we are claiming is that the three
moments provide a baseline account that is already sufficient to
explain many phenomena without postulating additional intrapsy-
chic influences. For cognitive and motivational accounts to be of
substantial importance, it has to be shown that they go beyond this
baseline produced by purely ecological forces. Thus, the ecological
analysis is a remedy against solely attributing behavior to internal
factors—such as cognition, motivation, or affect––an explanatory
strategy that is so prevalent that psychologists have labeled it the
fundamental attribution error in their participants (Ross, 1977),
while at the same time they often overlook it in their own theories.
In the rest of this chapter, we will present a set of examples
organized by the three distribution moments and how they can
explain phenomena previously attributed to cognitive biases. (In
some cases, factors associated with more than one of the moments
might help to explain a phenomenon.) These examples are by no
means exhaustive, but they illustrate the potential of the ecological
framework to explain other phenomena previously accounted for
by internal factors.
Explaining Biases With First Moments: Mean
Base-Rate Fallacy
Imagine there are two kinds of people in the world—say, engineers
and lawyers. When we encounter someone new, how can we decide
whether that person is an engineer or a lawyer? We can gather
and use some cues about the object that are associated with each
category, for example, style of dress, or we can use the mean of
the distribution of objects—here equivalent to the more common of
the two types, or the one with the higher base rate—or we can
combine both pieces of information. The typical rational bench-

mark for combining cue and base rate information is Bayes’s rule.
What determines whether people will use mean information—base
rates—with Bayes’s rule to make such inferences?
One factor that has been found to be crucial for whether people’s
reasoning follows or circumvents Bayes’s rule is how stimuli are
sampled from an environmental distribution. Some studies have
reported that people’s reasoning is largely consistent with Bayes’s
rule, while others say that people violate Bayes’s rule, specifically
by neglecting base rates, committing what is known as the base-rate
fallacy (Hoffrage, Lindsey, Hertwig, & Gigerenzer, 2000; Koehler,
1996a). There is no way to review the huge literature on the base-
rate fallacy here; the small point we want to make is that some of
the many inconsistent results can be accounted for by differences
in sampling stimuli from the environmental distribution.
Bayes’s rule shows how to calculate a posterior probability from
new evidence and a prior probability. For instance, in the classic
engineer–lawyer problem (Kahneman & Tversky, 1973), partici-
pants had to estimate the posterior probability that a person is an
engineer rather than a lawyer, given a written description (i.e., cues)
about a person that was said to be randomly chosen from 100
available descriptions, of which 30 (or 70) were engineers and the
other lawyers. The ratio of 30 or 70 out of 100 is the base rate of
engineers. The sampling process is essential to deciding whether
this base rate is relevant for calculating the posterior probability.
For instance, a necessary condition for the normative relevance
of the base rate is that the descriptions be randomly sampled from
the population to which the base rates refer; otherwise the base
rates might be rightly ignored.
In the engineer–lawyer problem, the descriptions were made up
and were not randomly sampled from a population having the
specified base rates—although the participants were told the con-
trary. (Various other studies similar to this problem also did not
even mention to the participants whether the stimuli were ran-
domly drawn from a distribution—see Gigerenzer, 2000, p. 254.)
The mean responses in both base rate groups (30% and 70% engi-
neers) were the same for the most part, so Kahneman and Tversky
(1973) concluded that the base rates were largely ignored, even
though they would have helped judgment accuracy. The proposed
explanation of this apparent fallacy was a cognitive one: People
judge the probability that the described person is an engineer by
the similarity between the description and their stereotype of an
engineer, a strategy that Kahneman and Tversky called the repre-
sentativeness heuristic.
Would participants pay attention to base rates if their relevance
was experienced rather than (falsely) asserted by the experimenter?
To check this, Gigerenzer, Hell, and Blank (1988) let participants

actually draw the descriptions randomly from an urn. As a conse-
quence of doing the sampling themselves, the participants’ base
rate use increased. This result was replicated by Baratgin and Noveck
(2000), who additionally showed that real sampling increased the
complementarity of probability judgments, that is, that the two
judged probabilities add up to 1. To summarize: Whether or not
stimuli are randomly sampled from a population and whether one
can believe and witness the sampling process make a difference
both normatively and for the judgments of ordinary people. This
example shows the importance of understanding the sampling
processes by which people estimate the first moment of a distribu-
tion, in this case the base rate. People tend to use the base rate if
they think it represents the mean of the population from which the
individual was drawn, or rightly ignore it if that is not the case.
Overconfidence
Confidence in one’s knowledge is typically studied using questions
of the following kind:
Which city has more inhabitants?

(a) Hyderabad (b) Islamabad
How confident are you that your answer is correct?
50% 60% 70% 80% 90% 100%
People choose what they believe to be the correct answer and then
rate their confidence that the answer is correct. The participants in
studies of such judgments are called “realistic” if the difference
between their mean confidence and their proportion of correct
answers is zero. The typical finding, however, is that mean confi-
dence tends to exceed the proportion of correct answers. For exam-
ple, if the mean of the confidence ratings assigned to the correctness
of all selected answers is 70%, but the mean proportion correct is
60%, the confidence judgments are higher than the proportion cor-
rect and the participants are said to be overconfident (the over/under-
confidence measure would in this case be 70% − 60% = 10 percentage
points). This systematic discrepancy between confidence judgments
and the proportion of correct answers has been termed the overcon-
fidence bias (e.g., Lichtenstein, Fischhoff, & Phillips, 1982).
Early explanations of this phenomenon were sought in deficient
cognitive processing, such as a confirmation bias in memory search
(Koriat, Lichtenstein, & Fischhoff, 1980). That is, after an alterna-
tive is chosen, the mind searches for information that confirms the
choice, but not for information that could falsify it. Despite the
plausibility of this account, Koriat et al.’s clever experiments with

disconfirming reasons showed only small and nonsignificant
effects, which totally disappeared in Fischhoff and MacGregor’s
later studies (1982). Other cognitivist accounts were that people are
victims of insufficient cognitive processing (Sniezek, Paese, &
Switzer, 1990) or of their overreliance on the strength of evidence
rather than on its weight (Griffin & Tversky, 1992). Alternatively, the
explanation was sought in motivational deficits, such as self-
serving motivational biases making people think highly of their
own abilities (Taylor & Brown, 1988), or in combinations of
motivational and cognitive biases (Mayseless & Kruglanski, 1987).
In a popular social psychology textbook, the student is told,
“Overconfidence is an accepted fact of psychology. The issue is
what produces it. Why does experience not lead us to a more real-
istic self-appraisal?” (Meyers, 1993, p. 50). Various kinds of eco-
nomic disasters, from the large proportion of start-ups that quickly
go out of business to the exaggerated confidence of financial inves-
tors, have been attributed to this alleged cognitive illusion. As
Griffin and Tversky emphasized, “the significance of overconfi-
dence to the conduct of human affairs can hardly be overstated”
(p. 432). Finally, in a Nobel laureate’s words, “some basic tendency
toward overconfidence appears to be a robust human character
trait” (Shiller, 2000, p. 142).
Instead of these many internal cognitive explanations, we pro-
pose a pair of environmentally informed explanations for how the
phenomenon interpreted as overconfidence bias emerges: (a) from
nonrepresentative stimulus sampling, and (b) from limited experi-
ence-based sampling. A crucial distinction lies between the distri-
bution of the stimuli in an environment and the distribution of the
stimuli used in the experimental task. Studies on cognitive pro-
cesses invariably involve sampling of stimuli from some class, and
how this sampling is done—and how participants believe it is
done—is crucial to normative claims. For instance, a person who is
asked whether New York or Rome is further south might use tem-
perature as a cue to make the inference. Since Rome has a higher
average temperature, she might infer that Rome is also further
south. But temperature is not a perfect cue, and Rome and New
York were selected precisely because they are among the relatively
few pairs of northern hemisphere metropolises for which the cue
leads to a wrong decision. When experimenters construct samples
that selectively increase the preponderance of such pairs, partici-
pants will make a disproportionately large number of mistakes,
resulting in dismal performance that seems to indicate poor compe-
tence. Again, judgments about cognitive competence can be mis-
leading when they are made without considering the kind of
sampling used.
Consider again research on confidence in general knowledge.

Inspired by Brunswik’s emphasis on the importance of properly
sampling stimuli (and not only participants), Gigerenzer, Hoffrage,
and Kleinbölting (1991) drew attention to the sampling process in
confidence studies (see also Juslin, 1994). Before then there was, to
the best of our knowledge, not a single study on confidence that used
random sampling from a defined class of problems. Rather, questions
were selected without a specified sampling procedure. According
to legend, for instance, a famous decision theorist constructed an
experimental sample of questions by reading through an almanac
and selecting all facts he found surprising—like the Rome versus
New York example. Testing predictions of their probabilistic mental
models (PMM) theory, Gigerenzer and colleagues showed that
“overconfidence bias” was obtained when selected questions were
used but disappeared when the questions were randomly sampled
from a defined class (such as comparisons of the sizes of pairs of all
cities in a country). Furthermore, depending on the sampling pro-
cess, frequency judgments of number of correct answers showed
either zero overconfidence or systematic underconfidence. The
general point is that combining a cognitive process model (here,
PMM theory) with an understanding of environment structure
(here, the kind of sampling process used) could enable predictions
of how to make overconfidence disappear, appear, or even invert.
This demonstrates that what is driving behavior is not a cognitive
or motivational deficit, but a cognitive system that is sensitive to
environment structure.
The ecological account of this supposed cognitive flaw, however,
was not received with enthusiasm, and many researchers went
on for years assuming that sampling played no role in confidence
judgments. For instance, the Journal of Behavioral Decision Making
in 1997 (vol. 10, no. 3), and the journal Organizational Behavior
and Human Decision Processes in 1996 (vol. 65, no. 5) devoted
entire issues to overconfidence, in which the contributors persisted
in claiming that random sampling would not affect overconfidence
bias and that the reason for it is some mental flaw. These authors
typically relied on the results of one study by Griffin and Tversky
(1992), who did not find an effect of random sampling. However,
Juslin, Winman, and Olsson (2000) analyzed 130 studies with and
without random sampling to see what the evidence really says.
They showed that overconfidence bias indeed disappeared across
all 35 studies with random sampling, with the difference between
mean confidence and mean proportion correct being indistinguish-
able from zero. Furthermore, they showed that this result cannot be
explained away by another cognitive bias, the so-called hard-easy
effect, as Griffin and Tversky had suggested. (We return to this in
the next section.)
A systematic positive difference between mean confidence and

proportion correct is, as we have seen, not the same as an overcon-
fidence bias and should not be labeled as such. In addition to selec-
tive sampling, this systematic difference can also result from
participants’ own experience-based sampling process. For exam-
ple, a physician may estimate the probability of a disease given a
pattern of symptoms to be 70% because the physician has treated
10 patients with similar symptoms, 7 of whom were correctly diag-
nosed with the disease. Although 70% might seem like a reasonable
guess about the true probability, the estimate may be more or less in
error due to the limited sample size. Juslin et al. (1997) demon-
strated that if we assume random sampling of stimuli, but the
samples are limited numbers of observations from real-world dis-
tributions (e.g., using a binomial sampling process), we will observe
what looks like an overconfidence bias even though no biased pro-
cessing has occurred (see also previous demonstrations by Pfeifer,
1994; Soll, 1996). It might be argued that people should be able to
correct for the effects of small sample sizes and thereby eliminate the
overconfidence bias. The correction factor is, however, difficult to
calculate, as the effect of sample size interacts with the inherent
unpredictability of the environment, which is itself difficult or even
impossible for people to know (Juslin et al., 1997).
To summarize, the overconfidence bias defined as a systematic
discrepancy between mean confidence and mean proportion cor-
rect can be explained by processes of sampling from a distribution,
both researchers’ stimulus sampling and participants’ experience-
based sampling. Overconfidence tends to disappear when ques-
tions are randomly sampled from the relevant statistical distribution
or when the experienced sample from that distribution is large
enough. These two ecological conditions are sufficient to account
for the observed phenomena. The base-rate fallacy and overconfi-
dence bias appear to be unrelated phenomena. Yet our ecological
analysis shows that both are a consequence of the same systematic
sampling processes. No mental flaw needs to be invoked to explain
these phenomena.
Explaining Biases With Second Moments: Variance
Miscalibration and the Hard–Easy Effect

Several different phenomena have been labeled overconfidence, not
only the positive difference between mean confidence and actual
accuracy discussed in the previous section. This lumping of dis-
tinct behaviors under one label is itself a problem; but furthermore,
the label misleadingly attributes the phenomena to an internal

source and gives it a negative connotation. We now consider another
phenomenon that has been labeled overconfidence, namely, miscal-
ibration, and a related phenomenon, the hard–easy effect. Informally,
miscalibration refers to the deviation between the proportion of
correct answers and the confidence level in each of the confidence
categories (for a decomposition of the formal calibration score,
see, e.g., Björkman, 1994). This is illustrated in Figure 4-5a as the
discrepancy between the diagonal (the identity line, x=y) and
the calibration curve. Note that miscalibration does not imply the
overconfidence bias discussed above, because even if the mean
confidence equals proportion correct (i.e., no overconfidence bias),
the two curves can still diverge. The hard–easy effect, also called
the difficulty effect, refers to a covariation between over/under-
confidence and task difficulty. Overconfidence is more common
when judgment problems are hard, whereas underconfidence is
more common when judgment problems are easy (for a review,
see Juslin et al., 2000). The hard–easy effect was again seen as a
major and stable reflection of the human mind: “The two major
substantive and pervasive findings are overconfidence and the
interaction between the amount of overconfidence and difficulty of
the task, the so-called hard–easy effect” (Keren, 1997, p. 269). One
proposed explanation of this stable phenomenon is that “people’s
confidence is determined by the balance of arguments for and
against the competing hypotheses, with insufficient regard for the
weight of the evidence” (Griffin & Tversky, 1992, p. 411). Several
other cognitive explanations have been suggested (e.g., Baranski &
Petrusic, 1994; Suantak, Bolger, & Ferrell, 1996; see also Juslin,
Olsson, & Winman, 1998). Both the hard–easy effect and miscali-
bration are, however, necessary consequences of the error variance
of distributions.
Several decades of attributing miscalibration (and overconfi-
dence bias) to people’s cognitive deficits passed before it was finally
pointed out that this phenomenon might be a direct reflection of
error variance in a regressive environment (Budescu, Wallsten, &
Au, 1997; Erev et al., 1994; Juslin et al., 1997, 2000; Pfeifer, 1994).
This result can be derived in the same way as with the primary bias
in judgments of causes of death. Confidence judgments tend to gen-
erate noisy data—that is, conditional variance is larger than zero,
which is equivalent to assuming that the correlation between confi-
dence and proportion correct is imperfect. Thus, an imperfect cor-
relation implies that when the reported confidence ratings are high,
the corresponding proportions correct will be smaller, looking like
miscalibration and overconfidence. For instance, when one looks at
all cases where people said that they were “90% confident that the
(a) 100
90
80
Accuracy
70
60
50
50 60 70 80 90 100
Confidence
(b) 100
90 Overconfidence
Underconfidence
80
70
60
Accuracy
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
Confidence
Figure 4-5: Discrepancy between errorless confidence ratings and

particular patterns of error. (a) Miscalibration as the systematic dis-
crepancy between the dotted identity line (x=y) and the calibra-
tion curve (black squares). (b) Examples of overconfidence (black
squares) and underconfidence (black circles). The overconfidence
line has been interpreted as due to systematic error, but it arises
from unsystematic error alone, via what is called regression to the
mean. This can be seen, just as in Figure 4-4b, by calculating the
reverse regression, which results in a line that looks like undercon-
fidence but is again a consequence of unsystematic error alone.
98
answer is correct,” the mean proportion of correct answers will be

lower, such as 80%, depending on the exact correlation between
confidence and proportion correct (see Figure 4-5a). Typically, for
general knowledge questions sampled randomly from a large
domain, the regression line is symmetrical around the midpoint of
the reported confidence scale (e.g., 50% when the confidence scale
is from 0 to 100% and 75% when the confidence scale is from 50 to
100%, Juslin et al., 1997, 2000—see Figure 4-5a).2 This result can be
deduced from the presence of conditional variance in the absence
of any bias in the data—just as sons of tall fathers are likely to be
smaller in height, and average judgments of high-frequency dan-
gers will be smaller than the actual frequencies. This is a normal
consequence of regression, not a cognitive bias. In these environ-
ments any intelligent system, human or computer, will produce
patterns that mimic what has been called miscalibration or over-
confidence.
If one estimates the confidence judgments from proportion
correct (rather than vice versa), then one should get the mirror
result: a pattern that looks as if there were underconfidence bias.
So, for instance, when one looks at all items that the participants
got 100% correct, one will find that the average confidence was
lower, such as 80%. This appears to be underconfidence. In con-
trast, when one looks at all items for which participants were 100%
confident, one finds that the average proportion correct was lower,
such as 80%. This appears to be overconfidence. Erev et al. (1994)
showed for three empirical data sets that regression toward the
mean accounted for practically all the effects that would other-
wise have been attributed to overconfidence or underconfidence,
depending on how one plotted that data. Dawes and Mulford (1996,
p. 210) reached the same conclusion for another empirical data
set. In general, one can determine whether there is under/overcon-
fidence beyond regression by plotting the data both ways, as in
Figure 4-4b. This is illustrated in Figure 4-5b (e.g., where 95%
accuracy is paired with 90% confidence on the underconfidence
line, but 100% confidence goes with 92% accuracy on the overcon-
fidence line). If the two resulting regression lines are symmet-
rical around the identity line, then the phenomena can be totally
accounted for by regression toward the mean; otherwise, there is
something else left to explain. This something else can reflect
a genuine cognitive bias, but it need not. It could reflect another
2. Note that regression has different effects on the overconfidence score

for the two scales. By using a scale from 0 to 100% the regression around the
midpoint of the scale induces more overconfidence than using a scale from
50 to 100%; see simulations in Juslin et al. (1997).
environmental property, such as the sampling process, as we saw in

the previous section.
When subjects were asked to do the opposite of what was done
in overconfidence research, estimating the subjective probabilities
(confidence) as a function of objective probabilities (proportion
correct), the other regression line—the underconfidence line, as
shown in Figure 4-5b—was the focus of research and the results
were also interpreted as a cognitive error, labeled conservatism.
Just as for miscalibration, the locus of the conservatism phenome-
non was never determined, although various cognitive explana-
tions have been proposed, including the miscalculation hypothesis
that the mind systematically miscalculates likelihood ratios but
combines them properly with prior probabilities, as prescribed by
Bayes’s rule (Peterson & Beach, 1967), as well as the misaggregation
hypothesis, that mental calculations of likelihoods are correct but
that likelihoods and priors are not properly combined (Edwards,
1968). The conservatism phenomenon was slowly abandoned,
whereas miscalibration is still in the headlines of psychology text-
books. Yet the two phenomena may be little more than different
ways of looking at the same data.
The hard–easy effect is also a direct consequence of conditional
variance that produces regression toward the mean. In demonstra-
tions of miscalibration and of the hard–easy effect, proportion cor-
rect is used as the dependent variable Y and confidence level X is
the independent variable. A systematic difference between mean
Y and X is interpreted as miscalibration, and a positive difference
between X and Y as overconfidence bias. In the absence of any
bias, regression toward the mean implies that the largest positive
difference will be found for easy items, that is, when proportion
correct is high. Regression also implies that this difference will
become smaller, and eventually turn into a negative difference,
when items become more and more difficult. In other words, regres-
sion toward the mean alone produces the pattern that has been
interpreted as a cognitive hard–easy effect (Juslin et al., 2000). In
addition, there are several other methodological problems associ-
ated with the hard–easy effect (Juslin et al., 2000).
An analysis of the information environment reveals that the phe-
nomena that have been labeled overconfidence bias, miscalibration,
and hard–easy effect are necessary consequences of two variables
being imperfectly correlated, resulting in a regression toward the
mean (see also Furby, 1973; Krueger & Mueller, 2002; Nesselroade,
Stigler, & Baltes, 1980). In such an uncertain environment, any sys-
tem––human or computer––will exhibit the consequences of this
regression, which should not be confused with cognitive process-
ing biases. Milton Friedman (1992) suspected that “the regression
fallacy is the most common fallacy in the statistical analysis of eco-

nomic data” (p. 2131). It would be a missed opportunity if the over-
confidence bias, miscalibration, and hard–easy effect were to be
simply taken off the hit list of cognitive illusions in a few years
without much comment and replaced by the new cognitive illu-
sions of the day, as was the case with conservatism, preventing the
next generation of researchers from learning the important lesson
of looking to environment structure before assuming that phenom-
ena lie solely in the mind.
Contingency Illusions
Contingencies quantify the degree to which an outcome is more
likely, given one condition rather than another. One frequently used
definition is the Δ rule, which states that the relative impact of a
cause (e.g., therapy) on an effect (e.g., healing) can be described by
the contingency p(healing | therapy)–p(healing | no therapy), that
is, the difference between the likelihoods of healing given therapy
and healing given no therapy. More generally, in hypothesis
tests, the degree of evidence in favor of a focal hypothesis can be
described by the contingency Δ = p(confirmation | focal hypothesis)–
p(confirmation | alternative hypothesis) (Fiedler, Walther, & Nickel,
1999). A contingency assessment may be distorted or misleading
when the samples used to estimate the two probabilities differ in
size and reliability. Thus, the confirmation rate for two hypotheses,
H1 and H2, may be equally high, but one researcher is mainly con-
cerned with H1 and is therefore exposed to larger samples of infor-
mation on H1, whereas another researcher is concerned with H2 and
is therefore exposed to denser information about H2. As a conse-
quence, the two researchers could end up with different estimates
of the overall contingency. Sample size is a crucial environmental
determinant of the variability of sampling distributions, which can
impact subsequent probability judgments.
The impact on contingency assessment of the number of obser-
vations or sample size was investigated by Fiedler et al. (1999). In
an active information search paradigm, participants were asked to
test the hypothesis that male aggression tends to be overt, whereas
female aggression tends to be covert. Participants could check, in a
computer database, whether a variety of behaviors representing
overt or covert aggression had been observed in a male (Peter) and
female (Heike) target person. The computer was programmed to
confirm all questions about overt and covert aggression in Peter
and Heike at the same constant rate of 75%. However, participants
typically asked more questions that matched the hypotheses to be
tested (male overt aggression/female covert aggression) than the
alternative hypotheses (male covert/female overt)––an information

search strategy called positive testing (Klayman & Ha, 1987).
Thus, given 75% confirmation of all queries, a participant might
hypothetically come up with the stimulus frequencies shown in
Table 4-1.
As a consequence of positive testing (i.e., enhancing the sample
size of information about the focal hypotheses), subsequent judg-
ments tended to verify the focal hypotheses because of the rich
evidence on Peter’s overt and Heike’s covert aggression. In contrast,
the impoverished samples on Peter’s covert and Heike’s overt
aggression led to less pronounced judgments. Participants’ conclu-
sions, which look like illusions, are, however, consistent with the
result from a binomial test, which would give a significant p-value
(.038) for the larger sample and a nonsignificant value (.34) for the
smaller.
In general, information search strategies that concentrate on par-
ticular hypotheses (e.g., positive testing) create skewed information
environments characterized by differential evidence densities.
Given two equally valid hypotheses, H1 and H2, but an environment
that provides more (or cheaper) information about H1 than H2, the
validity of H1 is more likely to be verified than the validity of H2.
This is because increasing sample size, or number of observations,
increases reliability and thereby statistical significance. A whole
variety of so-called confirmation biases may thus reflect an attri-
bute of the information ecology, namely, the differential sample
size of observations pertaining to different hypotheses, rather than
a cognitive processing bias within the individual’s mind. This
important determinant of hypothesis testing goes unnoticed when
environmental factors are ignored.
Further empirical evidence from the above experiment cor-
roborates the assumption that the crucial factor driving the “auto-
verification” of focal hypotheses is the sample size of information
input, rather than the participants’ biased expectancies or stereo-
typical beliefs. When the task focus was reversed––testing the
Table 4-1: Hypothetical Stimulus Frequencies From Positive

Testing of Overt Male Aggression/Covert Female Aggression
Queried behavior Confirmed Disconfirmed Sample
size
Overt aggression in Peter 12 4 Large
Covert aggression in Peter 6 2 Small
Overt aggression in Heike 6 2 Small
Covert aggression in Heike 12 4 Large
hypotheses that male aggression is covert and female aggression is

overt––sample sizes switched and resulting judgments became
counter-stereotypical (Fiedler et al., 1999). Likewise, when the task
focus was on the stereotype (male overt; female covert) but the
information input was manipulated to simulate negative testing
(i.e., larger sample sizes for male covert/female overt), the ecologi-
cal factor (sample size) overrode the internal cognitive expectation
(the task focus stereotype) and the resulting judgments tended to
disconfirm the stereotype. Mediational analyses supported that the
stimulus ecology (relative sample size) was the major determinant
of contingency judgments; judges’ cognitive expectancies contrib-
uted little. A long tradition of prior research on self-fulfilling proph-
ecies (Jussim, 1991; Kukla, 1993) and confirmation biases in
hypothesis testing (Snyder, 1984) that did not take the sample size
into account never discovered the important role played by this
simple and obvious environmental factor.
One might argue that judges themselves, rather than the environ-
ment, are to blame for the reported findings. If they refrained from
positive testing and gathered an equal number of observations about
all hypotheses, the whole problem could be avoided. However, this
argument also needs to be evaluated by an ecological analysis, for
at least two reasons. First, information cost and information gain
considerations show that positive testing is often ecologically ratio-
nal (McKenzie & Mikkelsen, 2000; Oaksford & Chater, 1994).
Concentrating on positive instances is economical and informative,
especially when the predicted event is rare (see chapter 12). Second,
regardless of whether judges engage in positive testing or not, the
environment will produce unequal samples, simply because
hypothesis targets differ in accessibility, visibility, and distribution
across time and space. For example, instances of overt aggression
may be intrinsically more visible and amenable to observation than
hard-to-perceive covert aggression.
Moral Judgments About Minorities

One particularly prominent phenomenon that reflects the genuine
influence of an environment with different information densities
is the devaluation of minority groups compared to majority groups
exhibiting the same desirable behavior. It is an ecological truism
that minorities are smaller than majorities, and a recurrent property
of social environments that the rate of positive, norm-conforming
behaviors is higher than the rate of negative, norm-violating behav-
iors (Fiedler, 1991; Parducci, 1968; Taylor, 1991). When these two
ecological assumptions are built into the stimulus distribution
presented in a social psychological experiment, participants may
be exposed to the following numbers of positive and negative

behavior descriptions of members from two different groups:
Group A (Majority): 18 positive and 8 negative behaviors

Group B (Minority): 9 positive and 4 negative behaviors
Note that the same ratio of positive to negative behaviors

(18:8 = 9:4) holds for both groups. Note also that the majority and
minority groups are assigned neutral labels, A and B, to rule out
any influence of prior knowledge. Nevertheless, although the
minority is uncontaminated with any preexisting stereotype or
stigma, it systematically receives less-positive impression ratings
than the majority (Hamilton & Gifford, 1976; Hamilton & Sherman,
1989). Given these differential sample sizes, and consistent with
participants’ judgments, a binomial test would find that there are
significantly more positive than negative behaviors in the majority
group (p = .037) but not in the minority group (.13). The same ten-
dency to associate the minority with fewer positive behaviors
is evident in participants’ subjective frequency estimates of the
negative versus positive behaviors they observed in both groups,
and in their cued-recall responses biased toward recalling too few
associations of positive behaviors with the minority group B.
This phenomenon of minority devaluation can be reconstructed
under experimental conditions that rule out any prior knowledge
or prejudice, simply as a consequence of differential sample size.
Due to smaller sample size, the actually existing prevalence of pos-
itive behavior is more likely to be seen as nonsignificant in the
minority than in the majority. Previous theoretical accounts that
have not considered sample size and environmental constraints
have explained this illusion in terms of an alleged memory advan-
tage of the absolutely rarest event class, that is, negative behaviors
of the minority (Hamilton & Gifford, 1976; Hamilton & Sherman,
1989). However, it is by now well established from numerous
experiments (Fiedler, 1991) using signal detection analyses (Fiedler,
Russer, & Gramm, 1993) and multinomial modeling (Klauer &
Meiser, 2000), as well as from computer simulations (Fiedler, 1996;
Smith, 1991), that a sample size difference between minorities and
majorities is sufficient to induce the different judgments. It is not
necessary to assume a memory bias, such as enhanced recall of neg-
ative behaviors in minorities.
These phenomena––miscalibration, the hard–easy effect, contin-
gency illusions, and moral judgments about minorities––are spread
across different subdisciplines of psychology and seem to be quite
unrelated on the surface. However, an ecological analysis reveals
that they have a common environmental determinant that helps to
explain them all: the degree of variability—the second moment—of
the distribution they are concerned with. We now consider the

third moment of statistical distributions in environments for
explaining purported cognitive illusions.
Explaining Biases With Third Moments: Skewness
Most Drivers Say They Are Safer Than Average

Garrison Keillor, whose humor has enchanted many radio listeners,
always ends his “News from Lake Wobegon” segments by referring
to the fictional town as a place where “all the women are strong, all
the men are good looking, and all the children are above average.”
Humor need not be realistic. However, when real people are asked
how safe their driving is, the majority respond that they, too, are
above average. As renowned researchers on risk perception com-
mented, “it is no more possible for most people to be safer than
average than it is for most to have above average intelligence”
(Svenson, Fischhoff, & MacGregor, 1985, p. 119), and “it is logically
impossible for most people to be better than the average person”
(Taylor & Brown, 1988, p. 195). It seems to follow that something
must be wrong with drivers’ self-perception. The fact that most
drivers say they are better than average is a favorite story in under-
graduate lectures and is attributed to some of the usual cognitive
suspects: people’s overconfidence, unrealistic optimism, or illusion
of control.
Let us have a second, ecologically informed look at this phenom-
enon. Could it be that most people have above average intelligence?
Understanding “average” as “mean,” the answer is no, because the
distribution of IQ points is, by definition, symmetrical, that is, the
number of people above the mean IQ is the same as the number
below. Is it possible that most people drive more safely than aver-
age? Yes, because safety in driving is not symmetrically distributed
around the mean (Lopes, 1992; Schwing & Kamerud, 1988).
To illustrate, take the number of accidents per person in a given
number of years as a measure of safe driving. For instance, in a
study of 440 drivers in Germany, the distribution of accidents was
so skewed that 57% of the drivers had fewer than the mean number
of accidents (Echterhoff, 1987). In a study of 7,842 American driv-
ers, 80% of the drivers had fewer accidents than the mean number
(Finkelstein & Levin, 2001, p. 144). Similarly, accidents are not
symmetrically distributed over time, and when one looks at all
hours in a week, one finds that “85% of all travel is safer than aver-
age” (Schwing & Kamerud, 1988, p. 133). Figure 4-6 illustrates a
hypothetical symmetrical distribution of driving accidents and a
skewed one with the same medians for 100 drivers each. If the
(a)
30
Number of Drivers
20 20
10 10
5 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Median
Mean Number of Accidents
(b)
Number of Drivers
20
15
10 10
8
7
6
5
4
3
2
1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Median Mean
Number of Accidents
Figure 4-6: Can it be that most drivers are better than average? When
the distribution of number of accidents is symmetrical as shown in
(a), where drivers below average (below the mean) are indicated by
gray bars, this cannot happen. However, when the distribution is
skewed as in (b), it is possible. As a result, most drivers (63 out of
100) have fewer accidents than the mean (Gigerenzer, 2002).
number of accidents were symmetrically distributed across drivers,

which it is not in reality, it would look similar to Figure 4-6a. The
mean and median number of accidents would be identical. The
“safer” 50% of drivers are shaded. For a symmetrical distribution, it
would be true that 50% of the drivers, and not more, are better than
the mean.
However, as the data above indicate, safety in driving is not sym-

metrically distributed, but skewed. Figure 4-6b shows the more
realistic situation of a few quite unsafe drivers on the right side of
the distribution, and many safe drivers with zero or one accident
on the left side. The median is still three accidents, but the mean
has shifted to the right, because of the presence of bad drivers. The
mean number of accidents is now 4.5. Here one can see that more
than 50% of the drivers are better than the mean—in fact, 63% have
fewer accidents than the mean.
The argument is that a skewed distribution with a long tail of
high values in itself implies the phenomenon that most drivers are
better than average. This is not to say that the skewness is the only
cause for the observed phenomenon; some drivers may exaggerate
their abilities when interviewed while others may understand
“better driving” in different ways, such as being more elegant,
faster, or showing greater adherence to traffic laws. Similarly, when
94% of college professors, for example, rate themselves as doing
“above average work,” this may in part reflect the ambiguity of the
very question asked. “Above average work” can refer to teaching,
research, committee work, and whatever a particular professor
excels at. The present analysis also does not account for cultural
differences; for instance, it does not explain why drivers in Ann
Arbor, Michigan assessed themselves as much safer, both in abso-
lute and relative terms, than did drivers in Valencia, Spain and in
Münster, Germany (Sivak, Soler, & Tränkle, 1989). An analysis of
the degree of skewness in each culture, however, could provide a
precise prediction of the expected proportion of drivers who are
actually above average and show what else needs to be explained in
drivers’ self-appraisals beyond this environmental factor.
Norms that apply to symmetrical distributions do not neces-
sarily apply to phenomena that are asymmetrically distributed.
Consequently, normative theories that are based on means, such as
expected utility theory, can generate conflicts with human intuition
when the outcomes are asymmetrically distributed, that is, when
means diverge from the medians and other relevant measures of
central tendency. The classical example is the St. Petersburg
paradox (Jorland, 1987): How much would you pay to play a game
with a 1/2 chance of winning $2.00, 1/4 chance of winning $4.00,
1/8 chance of winning $8.00, and so on? Here, the outcome distri-
bution is exponentially skewed and therefore the expected value
becomes infinite. Reasonable people, however, are not willing to
pay high amounts to play the game, which was called a paradox for
the expected value theory. The paradox can be resolved by focusing
on the expected median rather than mean of the gamble (Lopes,
1981). The skewness of the distribution determines what behavior
is adaptive in this environment.
Biased Correlations and the Magic Number 7 ± 2

Early detection and accurate assessment of correlations between
cues and outcomes in the environment are of great importance for
an organism because such correlations are fundamental for learn-
ing significant environmental relationships. This learning, though,
will be influenced by the fact that the distribution of correlations
calculated from environmental samples is highly skewed. When an
environmental correlation ρXY between two variables, X and Y, is
high, drawing repeated samples of the variables from the popula-
tion will result in a distribution of sample correlations rXY that is
skewed in such a way that most sample correlations are higher
than ρXY. This could lead a majority of organisms (including par-
ticipants in laboratory studies) to make correlation estimates that
are above the true value. But such “illusory correlations” (Chapman
& Chapman, 1967) are the direct result of environment structures
and sampling and should not be mistaken for cognitive failures.
Moreover, such exaggerated judgments may even be adaptive, for
the following reason.
Miller (1956) argued that the capacity of human working mem-
ory is limited to about seven (plus or minus two) chunks. This
“magic” number has since figured prominently in information-
processing theories. Rarely, however, have researchers asked why
we have this cognitive limitation in the first place. There is no
reason why humans could not have evolved a much larger capacity,
a possibility that is concretely illustrated by a few brilliant memory
artists and in the results of the testing-to-the-limits paradigm
(Staudinger & Lindenberger, 2003). Is there an adaptive function of
the 7 ± 2 memory limit? The asymmetrical distribution of environ-
mental correlations, discussed above, can provide a potential
answer. The degree of skewness of the sampling distribution is a
function not only of ρXY but also of the sample size N. Across a wide
range of N, the skew increases with decreasing sample size N. From
this premise, Kareev (2000) argued that small samples may afford
an adaptive advantage over larger samples precisely because the
small samples can exaggerate observed correlations: “a biased esti-
mate may better serve the functioning of the organism than an unbi-
ased one. By providing such a biased picture, capacity limitations
may have evolved so as to protect organisms from missing strong
correlations and to help them handle the daunting task of induc-
tion” (p. 401). Kareev also showed that the proportion of samples
that overstate the correlation in the population reaches a maximum
for sample sizes of about 5–9, or 7 ± 2, corresponding to the esti-
mated capacity of human short-term memory (see Cowan, 2001,
for a lower estimate of 4). Note that Kareev only considered hits
and misses and not the probability of false alarms, so that his
argument applies to environments in which false alarms and cor-

rect rejections have little adaptive consequence (Juslin & Olsson,
2005).
Did the limited capacity of short-term memory actually evolve
to make the organism maximally sensitive to correlations in the
environment? This is difficult to answer. But at the least, small
samples may sometimes be more informative and more useful for
adaptive behavior in social and physical environments than large
samples, quite in line with other less-is-more effects (Hertwig &
Todd, 2003).
Ecological Cognition
In his essay on human understanding, John Locke (1690/1959)

remarked that “God. . .has afforded us only the twilight, as I may so
say, of probability; suitable, I presume, to that state of mediocrity
and probationership he has been pleased to place us in here.” In
this chapter, we have distinguished three aspects of this twilight:
the means, variances, and skewness of statistical distributions. We
have argued that understanding behavior requires attention to these
three aspects of the information structure of the environment. That
is, when one studies how people solve a task, it is imperative to
first analyze what patterns of behavior the three moments of statis-
tical distributions in the task environment imply.
We demonstrated that phenomena from various areas of psychol-
ogy can partly or fully be accounted for by people’s sensitivity to
mean, variance, and skew, and also showed the implications of
these moments in terms of the effect of regression toward the mean,
the role of sample size, and the process of sampling. The behavioral
phenomena we discussed have usually been attributed to purely
cognitive or motivational causes without regard for the impact of
ecological factors. Thus, we are not aiming to critique specific
research, but rather the overarching way of accounting for psy-
chological data: the environmental poverty of purely cognitive
explanations.
An ecologically motivated cognitive psychology can avoid this
mental encapsulation by modeling (a) cognitive processes, (b) envi-
ronment structures, and (c) the match or mismatch between the
two, as the chapters in this book seek to demonstrate. These three
tasks have been rarely approached together. Cognitive encapsula-
tion has promoted theories that focus solely on the constraints of
the human mind, such as limited memory, with little analysis of
environment structure. But there is also the danger of environmen-
tal encapsulation, exemplified by behaviorist theories that focus
solely on the constraints in the environment, such as reinforcement

schedules, and treat the mind as a black box. The future of an eco-
logically motivated cognitive psychology lies, in our view, in under-
standing how these two sets of constraints work together to jointly
produce ecologically rational behavior.
Part III
CORRELATIONS BETWEEN RECOGNITION
AND THE WORLD
5
When Is the Recognition Heuristic
an Adaptive Tool?
Thorsten Pachur
Peter M. Todd
Gerd Gigerenzer
Lael J. Schooler
Daniel G. Goldstein
They’d seen his face before,

Nobody was really sure if he was from the House of Lords.
John Lennon and Paul McCartney
T he opportunity to be interviewed on live television in your

area of expertise may seem like the chance of a lifetime, at least
professionally speaking. It could establish your authority on the
topic and greatly increase your recognition among a broad audi-
ence. Australian psychologist Donald Thompson seized such an
opportunity many years ago, but the short-term effect in his case
was strictly negative: He was soon thereafter accused of rape.
However, Thompson was innocent and had a perfect alibi—he was
on live television when the crime occurred. Sifting through the
details of the case, investigators were later able to piece together
what happened. The victim had seen Thompson’s interview just
prior to being attacked and subsequently confused him with the
rapist (Schacter, 1999).
This case illustrates the impressive ability of the human cogni-
tive system to judge accurately whether we have experienced par-
ticular people or objects before. We refer to this ability to distinguish
previously encountered objects from novel ones as recognition.
Thompson’s case also indicates that the distinct process of recall—
retrieving further facts about a recognized person or object, such as
where one had the encounter—is not accomplished as readily and
reliably as recognition. The victim accurately judged that she had
seen Thompson before. She failed, however, to attribute the source
of this recognition correctly.
113
114 CORRELATIONS BETWEEN RECOGNITION AND THE WORLD
The apparent proficiency and robustness of human recognition

led Goldstein and Gigerenzer (2002) to view it as “a primordial psy-
chological mechanism” (p. 77). Recognition not only helps us keep
track of our previous encounters; it can also tell us something more
about the objects in question. Specifically, if we have heard of one
object but not another, this can be an indication that the objects
may differ in other respects, as well. Recognition would then allow
us to make inferences about these other characteristics. To illus-
trate, imagine a culturally interested American tourist who, when
planning her visit to Germany, needs to make a quick guess whether
Heidelberg or Erlangen has more museums. Having heard of
Heidelberg but not Erlangen, she could exploit her partial ignorance
to make the (correct) inference that because she has heard of
Heidelberg, it is probably more famous and hence is the city with
the higher number of museums. In other words, one can exploit
the patterns of recognition information in memory arising from
encounters with natural environments to make adaptive decisions.
One strategy that uses recognition to make inferences from
memory about the environment is what Goldstein and Gigerenzer
(1999, 2002) called the recognition heuristic. For two-alternative
choice tasks, where one has to decide which of two objects scores
higher on a criterion, the heuristic can be stated as follows:
Recognition heuristic: If one object is recognized, but not the

other, then infer that the recognized object has a higher value
on the criterion.
The starkly minimalist recognition heuristic has led to some pro-

test in the psychology literature, with the argument that such a model
would be too simple to capture human decision making. However,
many of the controversies can be resolved by understanding the
domain of the heuristic, which we will cover at length in the coming
sections: The recognition heuristic is used to make inferences under
uncertainty, rather than when there is certain knowledge (Gigerenzer
& Goldstein, 1996; Gigerenzer, Hoffrage, & Kleinbölting, 1991); it is
used for inference from memory, not from givens (Gigerenzer, Todd,
& the ABC Research Group, 1999; see also chapter 9 on this distinc-
tion); and it is likely to be used in situations where recognition
validity is substantial, not small. Feeding on recognition memory,
this heuristic thus piggybacks on a highly efficient cognitive ability
that lets it exploit the presence of a particular information structure,
namely, that recognition knowledge about natural environments is
often systematic rather than random. In environments with this
structure, the recognition heuristic is ecologically rational, exempli-
fying Herbert Simon’s vision of rationality as resulting from the
close fit between two components, the mind and the environment
WHEN IS THE RECOGNITION HEURISTIC AN ADAPTIVE TOOL? 115
(Simon, 1990; see also chapter 1). One condition that should govern
whether this strategy will be used is whether the environment is
appropriately structured (meaning, as we will define later, that there
is a high recognition validity). When the environment is not appro-
priate for using the recognition heuristic, decision makers may
ignore recognition, oppose recognition, or factor in sources of infor-
mation beyond recognition, as we will see later in this chapter.
The exploitable relation between subjective recognition and
some other (not directly accessible) criterion results from a process
by which the criterion influences object recognition through
mediators, such as mentions in newspapers, on the Internet, on
radio or television, by word of mouth, and so on. This process
applies primarily to the proper names of objects, and consequently
most studies of the recognition heuristic have involved name rec-
ognition; however, it could also apply to visual or aural images of
individual objects, locations, or people. To illustrate, the size of a
city (the criterion) is typically correlated with recognition of the
city because large cities are mentioned more often in the media.
Frequent mentions increase the likelihood that a city name will be
recognized, and as a result, recognition becomes correlated with
the size of a city. In line with these assumed connections, Goldstein
and Gigerenzer (2002) found a high correlation between the number
of inhabitants of particular German cities and how often each city
was mentioned in the American media. This, in turn, was highly
correlated with the probability that the city would be recognized
by Americans. This two-step chain can thus explain how and why
American recognition rates of German cities were highly correlated
with city size. Pachur and Hertwig (2006) and Pachur and Biele
(2007), looking at domains of diseases and sports teams, provided
further support for the assumption that the correlation between a
criterion and recognition is mediated through the quantity of men-
tions in the media.
Our goal in this chapter is to give an overview of empirical
research on the recognition heuristic since Goldstein and Gigerenzer
(1999, 2002) first specified it (see also Gigerenzer & Goldstein, 2011;
Pachur, Todd, Gigerenzer, Schooler, & Goldstein, 2011). We start by
describing and clarifying the basic characteristics and assumptions
of the heuristic. For this purpose, we trace how the notion
of the heuristic developed, and we locate recognition knowledge
in relation to other knowledge about previous encounters with
an object, such as the context of previous encounters, their
frequency, and their ease of retrieval from memory—that is, their
fluency. Next, we provide an overview of empirical evidence sup-
porting answers to two important questions: In what environments
is the recognition heuristic ecologically rational? And do people
rely on the recognition heuristic in these environments? We then
review evidence for a bold prediction of the recognition heuristic,

namely, that when recognition knowledge discriminates between
two objects, further cues are ignored and only recognition is used to
make the decision. We close with a discussion of findings that
appear problematic for the mechanism, as well as possible ways it
can be extended, and relations to other judgment phenomena influ-
enced by a previous encounter with an object.
The Foundations and Implications of the Recognition Heuristic
The Noncompensatory Use of Recognition

The recognition heuristic makes a strong claim. It assumes that if
people recognize one object but not the other, all other cue knowl-
edge is ignored and an inference is based exclusively on recogni-
tion. In other words, recognition is used in a noncompensatory
fashion: No other cues can reverse the judgment indicated by recog-
nition (as elaborated below, the heuristic does not apply to situa-
tions in which people already have definite criterion knowledge
about the objects). To appreciate this claim, let us trace the develop-
ment of the notion of the recognition heuristic. In an early article
that can be considered the basis for the fast-and-frugal heuristics
program, Gigerenzer et al. (1991) discussed the potential role of
recognition in making bets about unknown properties of the envi-
ronment. When facing a task in which one has to decide which of
two objects scores higher on some criterion (e.g., which of two soccer
coaches has been more successful in the past), Gigerenzer et al. pro-
posed that people first try to solve the problem by building and
using a local mental model. A local mental model can be success-
fully constructed if (a) precise criterion values can be retrieved
from memory for both objects, (b) intervals of possible criterion
values for the two objects can be retrieved that do not overlap, or
(c) elementary logical operations can compensate for missing knowl-
edge. If no such local mental model can be constructed, people acti-
vate from declarative knowledge a probabilistic mental model. Such
a model consists of probabilistic cues, that is, facts about an object
that are correlated with the criterion for a clearly defined set of
objects. Subjective recognition of an object (which Gigerenzer et al.
referred to as the “familiarity cue”) was held to be one such cue.
While Gigerenzer et al. (1991) assumed that recognition func-
tions similarly to objective cues (e.g., that a city has an international
airport), this view was later revised. Gigerenzer and Goldstein
(1996) put forth the thesis that recognition holds a special status,
because if an object is not recognized, it is not possible to recall cue
values for that object from memory, and in this sense recognition
precedes cue recall. Recognition therefore serves as an initial

screening step (if it correlates with the criterion, as used in the
take-the-best heuristic and others) that precedes the search for fur-
ther cue information; further cues are searched for only if both
objects are recognized. If only one of two objects is recognized,
the inference is based solely on recognition. The thesis that recog-
nition gives rise to noncompensatory processing was given promi-
nence when the recognition heuristic was proposed (Goldstein &
Gigerenzer, 2002): “The recognition heuristic is a noncompensa-
tory strategy: If one object is recognized and the other is not, then
the inference is determined; no other information can reverse the
choice determined by recognition” (p. 82). “Information” here
means cue values, not criterion values; in contrast, when a solution
can be derived from criterion knowledge, local mental models can
be applied, and the recognition heuristic does not come into play.
For this reason, Goldstein and Gigerenzer did not even discuss
local mental models, because their focus was on uncertain infer-
ences as made by the recognition heuristic. This issue led to some
misunderstandings: Oppenheimer (2003), for instance, argued that
because people seem to make judgments against recognition when
they have criterion knowledge contradicting it, the recognition
heuristic is not descriptive of how people make decisions. But as
mentioned before, this would not be a situation in which the recog-
nition heuristic or any other inductive strategy would be used.
How could such a mechanism that bases a decision solely on
recognition and ignores other cue knowledge be beneficial? First,
recognition seems to have a retrieval primacy compared to other
cue knowledge (Pachur & Hertwig, 2006). Recognition information
is available to make an inference earlier than other information and
enables one to make a quick and effortless decision, which is clearly
beneficial when time is of the essence. Second, in some situations,
information beyond recognition does not allow one to discriminate
between options. For instance, customers are often unable to dis-
tinguish the taste of different beers or other products once the labels
have been removed (e.g., Allison & Uhl, 1964), so information
beyond name recognition, which would take more time and effort
to gather and process, may sometimes simply be useless. Third, it
has been shown that the noncompensatory use of recognition can
lead to more accurate inferences than mechanisms that integrate
recognition with further cues (Gigerenzer & Goldstein, 1996). One
reason for this is that in situations where the recognition heuristic
can be applied there is an information asymmetry, in that addi-
tional information is usually known about recognized objects, but
not about unrecognized ones. As a consequence, if what is known
about a recognized object is a set of negative cue values, this can
lead to the object’s unjustified rejection.
Fourth, in important decision tasks during our evolutionary past,

searching for information beyond recognition, even if it could be
useful, may often have been dangerous. Take, for instance, foraging
for food. The cost of being poisoned by sampling from unrecog-
nized mushrooms was probably considerably higher than the cost
of rejecting an unrecognized but harmless mushroom. As a conse-
quence, an avoidance of searching for information beyond recogni-
tion could have evolved in some domains. And some animals
indeed often seem to choose food based on recognition and ignore
other, potentially relevant information. For instance, Galef,
McQuoid, and Whiskin (1990) observed that Norway rats preferred
food they recognized from smelling other rats’ breath over food
they did not recognize, irrespective of whether the other rat was ill
(see Noble, Todd, & Tuci, 2001, for a model of how this ignoring of
further information may have evolved).
Adaptive Use of the Recognition Heuristic

Gigerenzer et al. (1999) assumed that the recognition heuristic is
one of a set of strategies—the adaptive toolbox—that decision
makers have at their disposal. To solve a decision problem, a strat-
egy is selected from the adaptive toolbox that fits the current task
environment, thus allowing flexible and ecologically rational
strategy use. One of the conditions in which the recognition heuris-
tic should be applied is when it is likely to be successful—which is
when recognition is (strongly) correlated with the criterion. To
quantify the accuracy achievable by using the recognition heuristic
to make criterion comparisons among a class of objects (e.g., com-
paring the populations of Swedish cities), Goldstein and Gigerenzer
(2002) proposed the recognition validity α. It is calculated as
R
α= ,
R +W
where R and W equal the number of correct (right) and incorrect

(wrong) inferences, respectively, that are made on all object pairs
when one object is recognized and the other is not and the recog-
nized object is judged to have the higher criterion value. If α = .5,
recognition is not correlated with the criterion, and if α = 1, recog-
nition is perfectly correlated with the criterion and always leads
to a correct inference in the particular environment. We can
also assess the validity of object knowledge beyond recognition,
which can be used to make a decision when both objects are
recognized. This knowledge validity β is defined as the proportion
of correct inferences among the cases where both objects are
recognized.
The recognition and knowledge validities are defined relative to a

reference class (Brunswik, 1943; Gigerenzer et al., 1991), which
clearly specifies the population of objects that are to be judged
(e.g., predicting the outcome of tennis matches at a Grand Slam tour-
nament in 2003, or comparing the population sizes of the 50 largest
British cities). To be able to make a reasonable prediction of whether
people will use recognition in a particular judgment task, it is neces-
sary to know the reference class from which participants think the
objects are drawn. One way to achieve this in an experimental setting
is to use objects drawn from a clearly specified reference class.
The Less-Is-More Effect

The recognition heuristic can lead to a surprising phenomenon in
which less knowledge can lead to more accurate decisions. Let us
illustrate this phenomenon by going back to how the recognition
heuristic was serendipitously discovered. Testing a completely dif-
ferent theory, Hoffrage (2011), Gigerenzer, and colleagues stumbled
upon a puzzle when they presented German students with two sets
of geographic knowledge questions—one comparing the popula-
tion sizes of German cities, the other comparing American cities.
They expected that the first set would be relatively easy for the stu-
dents, whereas the second would be relatively hard. After all, the
students knew much more about German cities than about American
ones. As it turned out, however, the students performed slightly
better on the American set. How could such a less-is-more effect
arise, where the overall accuracy is greater when only about half of
the objects are even recognized (as for the American cities) than
when almost all of the objects are known (as for the German cities)?
At first this result seemed inexplicable. After some pondering,
however, the researchers proposed that in the set of American cities
the participants apparently followed a simple rule, which became
known as the recognition heuristic. For the German cities, the stu-
dents could not apply this rule because, ironically, they knew too
much: When they had heard of both cities to be compared they had
to fall back on knowledge beyond recognition to discriminate
between the two. Moreover, the recognition heuristic seemed to be
rather powerful, as it often yielded the correct answer, whereas
deciding between the cities on the basis of further knowledge
(which the German students often had to do for German cities) was
less accurate. Examining the recognition heuristic analytically,
Goldstein and Gigerenzer (2002) later showed that a less-is-more
effect will emerge in a comparison task whenever the recognition
validity (α) is higher than the knowledge validity (β). In addition,
Pachur (2010) showed that an important condition for the effect
seems to be that the recognition and knowledge validities do not
vary systematically across different numbers of objects in the refer-

ence class that are recognized (although the validities do not seem
to have to be constant; Goldstein & Gigerenzer, 2002, p. 80).
To illustrate the less-is-more effect, when no objects are recog-
nized (and no other information can be gleaned from the name or
image), a decision maker comparing all possible pairs of the objects
can only guess which object has the greater criterion value. With an
increasing number of recognized objects, there will be more and
more pairs in which only one object is recognized, but also more
cases in which both objects are recognized. The proportion of pairs
with only one recognized object is highest when half of the objects
are recognized and decreases again thereafter as a majority of
objects are recognized. Now, if the recognition validity is higher
than the knowledge validity, the expected accuracy of the resulting
decisions reaches a maximum when at least half, but fewer than all,
objects are recognized (see Figure 7-1 in chapter 7). When all objects
are recognized, all choices have to be made based on knowledge
beyond recognition, if available (because in this case the recogni-
tion heuristic is no longer applicable). As a consequence, the accu-
racy of choices is lower than when at least some objects are not
recognized and decision makers can benefit from the recognition
heuristic’s greater accuracy in this environment.
Information About Previous Encounters: What Recognition Is and What It Is Not

The recognition heuristic uses information about previous encoun-
ters with an object. There are multiple dimensions of information
about such encounters that can be stored (e.g., frequency, context
knowledge), and even characteristics of the process of retrieving
this information can be exploited for an inference (e.g., the time
required to recognize an object—see chapter 6). The recognition
heuristic uses only one of these various types of information:
whether or not an encounter occurred. But the term “recognition”
has been applied in the literature to conceptually rather different
things. Therefore, it is useful to clearly distinguish the information
that the recognition heuristic employs from other forms of informa-
tion about object encounters, and our intended meaning of the term
recognition from other meanings.
First, “recognition” as Goldstein and Gigerenzer (2002) used it
refers to the distinction “between the truly novel and the previ-
ously experienced” (p. 77). It thus differs from episodic recogni-
tion, which is commonly studied in research on recognition memory
(though both might arise through some of the same underlying
processes). In a typical recognition memory experiment, partici-
pants first study a list of items (usually existing words such as
chair) and are later asked to go through a new list composed of
previously studied plus unstudied items and pick out the ones that
were on the original list. In other words, in these experiments
typically none of the items are actually novel, because they are
commonly used words. Therefore, the “mere” (or semantic) recog-
nition that the recognition heuristic employs is insufficient to
identify the correct items in this task, and knowledge about the
context (i.e., episodic knowledge) in which the previously studied
items were originally presented is required. The recognition heu-
ristic does not require such episodic knowledge, because semantic
recognition alone differentiates novel from previously encountered
objects. Moreover, recognition in Goldstein and Gigerenzer’s sense
is not independent of a reference class. A German participant may
know that she has heard of Paris, France but not Paris, Tennessee
(population ca. 10,000), and not treat Paris as recognized on a test
of U.S. cities. In addition to recognition being sensitive to a per-
son’s conception of the reference class, recognition validity
and even the decision to apply the recognition heuristic hinge on it,
as well.
A second important distinction is between (semantic) recogni-
tion and frequency information, that is, knowledge about the
number of times an object has been encountered in the past (e.g.,
Hintzman & Curran, 1994). The recognition heuristic does not dis-
tinguish between objects one has encountered 10 times and those
encountered 60 times (as long as both are recognized or unrecog-
nized). This is one element that makes the recognition heuristic
different from the availability heuristic (Tversky & Kahneman,
1973), which makes use of ease of retrieval, quality of recalled
items, or frequency judgments (for a discussion of the different
notions of availability see Hertwig, Pachur, & Kurzenhäuser, 2005).
To make an inference, one version of the availability heuristic
retrieves instances of the target events, such as the number of people
one knows who have cancer compared to the number of people
who have suffered from a stroke. The recognition heuristic, by con-
trast, bases an inference simply on the ability (or lack thereof) to
recognize the names of the event categories (cf. Pachur & Hertwig,
2006). In addition, the recognition heuristic is formally specified as
an algorithm and so can make precise predictions (such as the less-
is-more effect), while the availability heuristic in its original form
was too loosely defined for such predictions (for formal approaches
to different forms of the availability heuristic, see Dougherty, Gettys,
& Ogden, 1999; Hertwig et al., 2005; Pachur, Hertwig, & Rieskamp,
in press; Sedlmeier, Hertwig, & Gigerenzer, 1998).
A recognition assessment, which feeds into the recognition heu-
ristic, unfolds over time. The speed with which this recognition
assessment is made—fluency—can itself be informative and can be
used to infer other facts, for instance, how frequently an object has
been encountered in the past.1 The recognition heuristic does not

draw on fluency information and only considers whether an object
is recognized or not. The notion of inferences based on recognition
speed, however, has been elaborated in the fluency heuristic
(Schooler & Hertwig, 2005; see also chapter 6), which uses recogni-
tion speed to distinguish between two recognized objects (i.e.,
where the recognition heuristic does not apply). In other words,
fluency is one of the types of information that can be recruited
when recognition does not discriminate between two objects.
Finally, collective recognition—the proportion of people in some
population who recognize an object—has been used to examine the
ecological rationality of the recognition heuristic. Collective recogni-
tion has been found to be correlated with environmental quantities
such as stock profitability (Borges, Goldstein, Ortmann, & Gigerenzer,
1999; Goldstein & Gigerenzer, 2002) and sports success (Pachur &
Biele, 2007; Serwe & Frings, 2006). Nevertheless, these tests are not
direct implementations of the recognition heuristic, which is restricted
to the use of individual recognition. Of course, an individual could
use collective recognition information (assuming he or she knows it)
to make inferences about the world. However, the cognitive processes
involved would be different from the recognition heuristic (e.g.,
including recall of the collective recognition rates or their estima-
tion in other ways, such as by the number of people observed to have
chosen some option—see Todd & Heuvelink, 2007).
To summarize, the recognition heuristic models a strategy for
carrying out memory-based inferences. It is a precisely defined
algorithm that gives rise to a number of specific predictions: First,
recognition is correlated with some objective quantities in the
world. Second, people are likely to apply the recognition heuristic
only in those environments where recognition is strongly corre-
lated with the criterion. Third, it can produce a less-is-more effect
where less knowledge can lead to higher accuracy. And fourth, it
predicts that recognition knowledge determines choices even when
further probabilistic cues contradict it (i.e., noncompensatory rec-
ognition use). We describe empirical tests of these predictions in
the next two sections.
Ecological Analyses of Recognition
The claim that the recognition heuristic is a potentially ecologi-

cally rational tool in our mental adaptive toolbox hinges on a
1. Fluency could thus function as a proxy for frequency information,

but there is also evidence that people use both types of information inde-
pendently (e.g., Schwarz & Vaughn, 2002).
crucial assumption: that subjective recognition is correlated with

objective quantities in at least some environments. In what domains
does this assumption hold? Before answering this question, we
need a means of measuring the correlation. The degree to which
recognition predicts a criterion in a given domain can be assessed
in two ways. The first is to determine for a group of people their
individual recognition validities α (based on their individual rates
of recognizing the objects in a reference class) and then take the
average recognition validities as an estimate of recognition’s pre-
dictive value. A second possibility is to use the recognition
responses of the group to calculate the correlation between the
objects’ collective recognition rates (defined as the proportion of
people recognizing each object) and their criterion values, yielding
the recognition correlation (Goldstein & Gigerenzer, 2002). When
deviations from a perfect association between recognition rates and
the criterion are due to unsystematic error (i.e., when objects with
higher criterion values are as likely to be unrecognized as objects
with lower criterion values are likely to be recognized), the two
measures are related as follows (Pachur, 2010):
1 + rs
α= ,
2
where rs is the recognition correlation expressed as a Spearman

rank correlation.
When Is Recognition a Good Predictor?

Goldstein and Gigerenzer (2002) gave an initial overview of domains
where recognition is a good predictor of particular criteria. Across
a broad set of geographic domains, such as deserts, cities, lakes,
and rivers, with criterion values corresponding to size or length,
they found average recognition validities ranging between .64 and
.95. Since then, high recognition validities in geographic domains
have been replicated repeatedly and across a number of different
countries (e.g., Pachur, Bröder, & Marewski, 2008; Pohl, 2006). For
instance, in an analysis of the 50 largest cities of four European
countries (Italy, France, England, and Spain), Pachur et al. (2008)
found recognition validities between .72 and .78.
Geographic domains are relatively stable, as the criterion values
of the objects do not change much over time2 or only very slowly—
2. An ironic exception to this statement is the fact that in the pair San
Diego and San Antonio, a commonly used example (Goldstein & Gigerenzer,
1999), San Diego now has fewer inhabitants than San Antonio within their
respective city limits, though by metropolitan area, San Diego remains
much larger.
at least aside from desertification or increasing landscape changes

wrought by global warming. Also, new objects are rarely added to
these domains. But other domains are more dynamic. For instance,
consider sports, where previously very successful athletes remain
well known and publicly visible (and recognizable) long after
their sports performance has passed its peak, or even after their
retirement (e.g., Boris Becker). At the same time, new stars can rise
quickly and dominate the field. As it takes some time for a new
player to become widely known, recognition might often be mis-
leading when one tries to decide, for example, which of two con-
tenders will win a match. Is recognition thus doomed to failure in
dynamic domains?
Surprisingly, the answer seems to be no. Trying to disprove rec-
ognition’s ability to stand the test in this environment, Serwe and
Frings (2006) assessed how well the recognition heuristic was able
to forecast the winners of the tennis matches at the 2003 Wimbledon
tournament. This is a difficult problem: The two Association of
Tennis Professionals (ATP) rankings, which consider detailed
accounts of the players’ past performance, predicted only 66% and
68% of the matches correctly, and the seedings of the Wimbledon
experts only predicted 69%. Serwe and Frings asked German tennis
amateurs to indicate which of the tournament players they recog-
nized. Although some of the players that the amateurs recognized
were no longer very successful or were highly recognized primarily
because they were also German, the recognition heuristic, using
the individual recognition of players by the tennis amateurs, none-
theless correctly predicted 73% of the matches in which it could be
applied and collective recognition similarly predicted 72% (for a
replication see Scheibehenne & Bröder, 2007). The knowledge of
Wimbledon experts thus produced fewer correct forecasts than the
systematic ignorance of tennis amateurs.
Further analyses have confirmed the accuracy of recognition in
the sports domain. In a study on forecasts of the matches of the
European Soccer Championship, Pachur and Biele (2007) asked
laypeople which of the participating national teams they had heard
of before. Using this collective recognition, they then found that
strictly following the recognition heuristic would have led, on aver-
age, to 71% correct forecasts. However, while this was significantly
better than chance performance, the authors could not replicate the
finding by Serwe and Frings (2006) that recognition enables better
forecasts than expert information: Fédération Internationale de
Football Association (FIFA) rankings and rankings based on the
previous performance of the teams achieved 85% and 89% correct
forecasts, respectively. Finally, Snook and Cullen (2006) found in a
study with Newfoundland students that their recognition led to an
average of 85% correct judgments for the task of determining which

of two National Hockey League (NHL) players had more career
points.
In addition to sports, recognition has been shown to be useful in
other competitive domains, such as political elections (Marewski,
Gaissmaier, Dieckmann, Schooler, & Gigerenzer, 2005), quality of
U.S. colleges (Hertwig & Todd, 2003), wealth of individual persons
(Frosch, Beaman, & McCloy, 2007), and performance of stocks
(Borges et al., 1999; Ortmann, Gigerenzer, Borges, & Goldstein,
2008; but see Boyd, 2001, for a possible restriction of that domain
to rising stock markets). Thus, even in some environments
where objects can change their values on the criterion dimension
rather quickly, recognition can prove to be a powerful predictor.
Furthermore, forgetting can play a crucial role in maintaining an
effective level of ignorance in such environments (see chapter 6;
Schooler & Hertwig, 2005).
When Is Recognition Not a Good Predictor?

Despite the apparent breadth of domains in which recognition can
be exploited to infer a criterion, recognition, of course, does not
predict everything. Where does it fail? First, recognition will not be
correlated with criteria where people or the media talk about every-
thing along the criterion dimension equally often (or equally rarely)
or talk primarily about both ends of the dimension (e.g., very large
and very small countries, or tiny and giant animals). In such cases,
more mentions of an object (and hence greater recognition) do not
uniquely imply a high criterion value. To illustrate, Pohl (2006)
found that recognition of Swiss cities among Germans was unre-
lated to the criterion of their distance from the city Interlaken, a
quantity that should have little impact on how prominently a city
features in the German media.
Second (and relatedly), item recognition does not seem to be a
good predictor for criteria where the frequency of item mentions in
the media is driven by two (or more) factors that are themselves
negatively correlated. Figure 5-1 illustrates this situation. For
instance, frequent diseases are often discussed and written about
because they can affect many people. At the same time, deadly or
otherwise severe diseases are also often talked about—but severe
diseases tend to be rather rare (Ewald, 1994). Mentions in the media
and recognition of diseases are thus driven by factors that are nega-
tively correlated (i.e., frequency of occurrence and severity). As a
result, recognition is a relatively weak predictor of the frequency
of occurrence of diseases: A recognized disease is more common
than an unrecognized one only about 60% of the time (Pachur &
High
Mentions in the Media
Low
Rare/severe Frequent/mild
Disease Frequency
Figure 5-1: Hypothetical plot for a task environment in which the

recognition heuristic is not ecologically rational: predicting the fre-
quency of diseases. Here, the number of mentions of a disease in
the media (and thus its recognition) increases toward both extremes
of the criterion dimension, for negatively correlated reasons (fre-
quency vs. severity). As a consequence, recognition is uncorrelated
with the criterion, and α is around .5.
Hertwig, 2006). Similarly, Richter and Späth (2006) examined the

recognition heuristic without determining recognition validity in
an environment where they admit the validity may be low: infer-
ring the relative population sizes of animal species. Recognition
does at first seem as though it would be a useful cue in this environ-
ment because animal species with a large population (e.g., pigeons)
are often well known. At the same time, however, endangered—
and thus rare—species are also often well known, either because
they once used to be more frequent (e.g., wolves), or because they
have come to public awareness through a media campaign high-
lighting their imminent extinction (e.g., panda bears), or both (e.g.,
buffalo).
In sum, there is evidence that recognition is highly informative
in particular domains and thus exploitable by mechanisms that use
recognition to make inferences in these domains. Importantly, this
seems to hold also for other information extracted from previous
encounters with objects in real-world domains, such as fluency
(Hertwig, Herzog, Schooler, & Reimer, 2008) and availability
(Hertwig et al., 2005; see chapter 6 for more on both of these pos-
sibilities). Environmental analyses are a first step in understanding
the ecological rationality of all of these decision mechanisms.
When Do People Base Decisions on Recognition?
In the previous section we reviewed findings showing that recogni-

tion can indeed be a useful guide for making inferences about some
environments, thus supporting the notion of the recognition heu-
ristic as an ecologically rational inference tool in those envi-
ronments. But do people actually use recognition knowledge in
decision making—and in the way described by the recognition heu-
ristic? Moreover, is there evidence that people adjust their use of
the recognition heuristic appropriately in different environments?
In this section we give an overview of studies that have investi-
gated how well the predictions of the recognition heuristic accord
with actual human decision behavior. The recognition heuristic
has been tested in a wide variety of domains, making it possible to
begin to map more systematically the conditions under which the
heuristic is used and when it is not used. We will start with evi-
dence showing that, as predicted by the recognition heuristic, many
decisions align with recognition. This will be followed by a discus-
sion of conditions under which people seem to systematically avoid
basing their decisions on recognition. In the third part of this sec-
tion, we turn to tests of the recognition heuristic’s bold prediction
of noncompensatory processing, that is, that all other cues beyond
recognition are ignored.
When Do People’s Decisions Follow Recognition?

The Recognition Heuristic in Inference Tasks In general, in domains where
recognition is a good predictor (i.e., when the recognition validity
α is high), a large proportion of people’s judgments in laboratory
experiments are in line with the recognition heuristic (typically
around 90%). Goldstein and Gigerenzer (2002) observed that when
American students were asked which of two German cities is larger
(a domain for which Gigerenzer & Goldstein, 1996, reported a rec-
ognition validity of .80) and they recognized one city but not the
other, they picked the recognized one in 89% of the cases (and were
consequently correct 71% of the time). Similarly high rates of rec-
ognition use were found for Swiss, Belgian, Italian (Pohl, 2006),
and British cities (Pachur et al., 2008), all of which are domains
where the recognition validity is high. Pohl (2006; Experiment 4)
found evidence for a frequent use of the recognition heuristic for
other geographic materials, such as mountains, rivers, and islands.
In addition, Reimer and Katsikopoulos (2004; see chapter 7)
reported that when people make inferences about the city size
domain in groups, lack of recognition knowledge by even a minor-
ity of group members can guide the group decisions and thereby
increase their overall accuracy.
In their application of the recognition heuristic to the sports

domain, Snook and Cullen (2006) asked their participants to judge
the relative number of career points achieved by different NHL
players. As mentioned above, recognition is a highly useful piece of
information for this task, and accordingly, a recognized player was
chosen over an unrecognized one 95% of the time, even when par-
ticipants had no further knowledge about the recognized player.
This also led them to correct inferences 87% of the time.
The Recognition Heuristic in Forecasting Tasks One objection to early tests of

the recognition heuristic was the unknown extent to which recog-
nition knowledge was confounded with criterion knowledge in
inference tasks (Oppenheimer, 2003). In forecasting, by contrast,
where the task is to judge a criterion that lies in the future, one cannot
know the criterion for sure, making it possible to test the heuristic
against this objection. Subsequently, it has been shown for predicting
tennis matches (Scheibehenne & Bröder, 2007; Serwe & Frings, 2006),
soccer games (Ayton & Önkal, 2004; Pachur & Biele, 2007), and polit-
ical elections (Marewski et al., 2005) that people choose a recognized
object over an unrecognized one even when making comparative
forecasts (around 80–90% of the time). Similarly, though not a direct
test of the recognition heuristic, Weber, Siebenmorgen, and Weber
(2005) found that name recognition of a stock was associated with
less perceived future riskiness, which, in turn, led to a higher ten-
dency to decide to invest in the stock.
The Less-Is-More Effect What about the less-is-more effect predicted

(under specific conditions) by the recognition heuristic? Goldstein
and Gigerenzer (1999, 2002) found a between-participants less-is-
more effect when they tested American and German students on
German cities. They also observed a within-participant less-is-more
effect as German participants exhibited lower accuracy after four
experiment sessions in which they came to recognize more and
more U.S. cities. Snook and Cullen (2006) analyzed participants’
judgment accuracy as a function of the number of hockey players
they recognized. Among those who did not recognize the majority
of the players, an increase in the number of recognized players was
associated with increased accuracy, reaching up to 86% when
around half of the 200 players were recognized. As recognition
increased beyond that, however, accuracy leveled off and fell again,
down to 76% when more than 140 players were recognized—an
instance of the less-is-more effect. A less-is-more effect has also been
found between groups of individuals making decisions with recog-
nition knowledge, as described in chapter 7: Groups who collec-
tively know more can be less accurate than those who know less (see
also Reimer & Katsikopoulos, 2004).
Better forecasting accuracy with less knowledge has also been

observed (e.g., Andersson, Edman, & Ekman, 2005; see also chapter
3 for a discussion of the advantages of simplicity in forecasting).
But manifestations of such a less-is-more effect for forecasts may
not be so common. In a study of the European Soccer Championships
for 2004, experts made better forecasts than laypeople, and also
within the group of laypeople, participants who had heard of all
teams made better forecasts than participants who had heard of
fewer teams (Pachur & Biele, 2007). The latter finding was some-
what unexpected, as the first condition stipulated by Goldstein and
Gigerenzer (2002) for a less-is-more effect to occur was fulfilled
(i.e., the recognition validity was higher than the validity of other
knowledge, α > β). Pachur and Biele speculated that any less-is-more
effect may have been cancelled out because both the recognition
and the knowledge validities were positively correlated with the
number of recognized teams (violating Goldstein and Gigerenzer’s
second condition, that α and β are independent of the number of
recognized objects; for a systematic analysis, see Pachur, 2010). In
other words, people who had heard of more teams tended to recog-
nize teams that were more successful and also had additional
knowledge that allowed them to make more correct forecasts, a pat-
tern of associations counteracting a less-is-more effect. (See chapter
7 for a definition and example of weak less-is-more effects, which
are probably more common.)
When Do People Not Follow Recognition?

The evidence just reviewed shows that in particular environments
people exploit the fact that they have heard of one object but not
another to infer further differences between the objects. There is
also evidence from other environments that people do not always
follow recognition. Both sets of results fit with the hypothesis that
the recognition heuristic is part of the mind’s adaptive toolbox and
is preferentially selected as a tool in appropriate task environments.
We now consider characteristics of task environments that make
them inappropriate for the application of the recognition heuristic
and examine whether people consequently turn to other decision
strategies.
Conclusive Criterion Knowledge As pointed out earlier, the recognition

heuristic has been proposed as a mental tool for situations when
judgments have to be made by inductive inference. In other words,
it is meant to describe possible decision processes when no direct
solution (i.e., a local mental model) can be found. A study by
Oppenheimer (2003, Experiment 1) suggests that, indeed, people do
not use recognition information when they can construct a local
mental model. He presented Stanford students with decisions com-

paring the population sizes of nearby cities that were highly recog-
nized but rather small (e.g., Sausalito) with fictitious cities (a diverse
set of fictitious names: Al Ahbahib, Gohaiza, Heingjing, Las Besas,
Papayito, Rhavadran, Rio Del Sol, Schretzburg, Svatlanov, and
Weingshe). In deciding which city was larger, participants chose
the recognized city in only 37% of the cases. Participants presum-
ably often deduced, given their knowledge that the nearby cities
were small (Sausalito has around 7,000 inhabitants) and that
Chinese cities (for instance) are usually rather large, that the unrec-
ognized Chinese-sounding city must be larger. That is, although
participants knew the size of only the recognized city, this knowl-
edge along with knowledge about the class of the other city allowed
them to use a local mental model to deduce that the recognized city
cannot be larger.
Another example of the suspension of the recognition heuristic
when a local mental model can be constructed comes from an
experiment on judging the relative frequencies of pairs of infec-
tious diseases (Pachur & Hertwig, 2006). Participants systematically
chose the unrecognized disease when they knew that the recog-
nized disease was practically eradicated—in other words, when
they had criterion knowledge that allowed them to locate the recog-
nized object at the extreme low end of the criterion dimension. To
illustrate, most participants recognized leprosy, but they also indi-
cated that they knew that leprosy is nearly eradicated. As a conse-
quence, when leprosy was compared with an unrecognized disease,
participants judged that the unrecognized disease was more fre-
quent in 85% of the cases.
Unknown Reference Class Mixing real objects with artificial ones in an

experiment or using objects from an amalgam of reference classes
makes it impossible to calculate the recognition validity and diffi-
cult to predict what participants base their use of recognition on.
For instance, Pohl (2006, Experiment 2) used a mixed set consisting
of the 20 largest Swiss cities and 20 well-known but small ski resort
towns. Whereas recognition is usually highly correlated with city
size, the recognition of ski resorts is mainly driven by factors other
than the size of the city (e.g., skiing conditions), so recognition will
be useful for the set of large cities, but not for the ski resorts (and
consequently, decisions in this mixed set followed recognition in
only 75% of the possible cases, compared to 89% in Pohl’s
Experiment 1 using a consistent set of large cities).
Similarly, people may adopt strategies based on whether they
believe that they are dealing with a representative or a biased sample
of items. For instance, in addition to Oppenheimer’s (2003) tests of
fictional cities being compared to recognized towns near Palo Alto,
other tests compared the fictional cities to places known for specific
reasons, such as Nantucket (limerick), Chernobyl (nuclear disaster)
or Timbuktu (expression). Since a reference class was not provided,
and because it is hard to think of a natural reference class from which
places like these would constitute a representative sample, partici-
pants may correctly infer that they are in an artificial environment. In
a clearly manipulated environment, such as that of trick questions,
recognition validity may be unknown, unknowable, or inestimable.
Unable to assess the ecological validity of the recognition heuristic,
people may elect alternative response strategies.
Low Recognition Validity Another condition for the adaptive use of

the recognition heuristic is if recognition accurately predicts the
criterion in a given environment. Consistent with this notion of
adaptive use, the heuristic seems to be applied considerably less
in domains where the recognition validity is very low or nil (Pachur
& Hertwig, 2006; Pohl, 2006)—see Figure 5-2. For instance, Pohl
directly contrasted the use of the recognition heuristic in two natu-
ral environments, one in which the recognition validity was high
(α = .86 for size of Swiss cities) and the other in which it was low
(α = .51 for distance of Swiss cities to Interlaken). The proportion of
choices in line with the recognition heuristic was dramatically
lower in the domain with low recognition validity (89% for high
1.0
A B C
.9 E F G
Proportion of Choices in Line
D H
With Recognition Heuristic
.8
I
.7 A Hertwig et al. (2007, music artists)
B Serwe&F rings (2006, amateurs)
C Snook&Cullen (2006)
D Serwe&F rings (2006, laypeople)
.6 J E Pachur& Biele (2007)
F Hertwig et al. (2007, companies)
G Goldstein & Gigerenzer (2002)
K H Pohl (2006, Exp.1)
.5 I Pohl (2006, Exp.2)
J Pachur& Hertwig (2006, Study 1)
K Pohl (2006, Exp.1)
.4
.4 .5 .6 .7 .8 .9 1.0
Recognition Validity
Figure 5-2: Association between recognition validity in the envi-

ronments of 11 different studies and the observed proportion of
inferences following the recognition heuristic.
validity vs. 54% for low). These results suggest that the overall rec-
ognition validity in a particular domain is an important factor for
whether the heuristic is applied or not.3 However, both Pohl
(Experiments 1 and 4, but see Experiment 2) and Pachur and
Hertwig (2006) found that, looking across participants in the same
domain, participants did not seem to match their recognition heu-
ristic use directly to their individual recognition validity for that
domain (specifically, the individual proportions of choices in line
with the heuristic were not correlated with the individual α). This
interesting result suggests that people know about validity differ-
ences between environments, but not about the exact validity of
their own recognition knowledge in particular environments.
Supporting this conclusion, Pachur et al. (2008) found that although
the mean of participants’ estimates of the validity of their own rec-
ognition knowledge (to predict the size of British cities) matched
the mean of their actual recognition validities perfectly (.71 for
both), the individual estimates and recognition validities were
uncorrelated (r = −.03).
Discrediting Source Knowledge According to Goldstein and Gigerenzer

(2002), the rationale for the recognition heuristic’s performance is
the natural mediation process through which a distal criterion vari-
able (e.g., the size of a city) increases the likelihood that the object
is talked about, which, in turn, increases the likelihood that the
object is recognized. Under these conditions, one can “exploit the
structure of the information in natural environments” (Goldstein &
Gigerenzer, 2002, p. 76). When recognition is due to an experimen-
tal manipulation, that is, when people recognize an object from the
experiment they are in, this natural mediation process is disrupted
and people might use recognition differently, or not use it at all. To
be sure, such induced recognition can be phenomenologically sim-
ilar to natural recognition. Nevertheless, the additional knowledge
of the specific context in which an object has been encountered
(i.e., source knowledge) might lead people not to depend on recog-
nition to make inferences. This has already been shown for other
assessments of memory. When people believe that their memory
has been manipulated experimentally, they rely considerably less
on ease of retrieval or the number of recalled instances to infer
quantities than when they do not suspect such manipulations (e.g.,
Jacoby, Kelley, Brown, & Jasechko, 1989; Schwarz et al., 1991).
3. Some results, however, suggest that people only decide not to follow
recognition in domains with low recognition validity when they have
alternative knowledge available that has a higher validity than recognition
(Hertwig et al., 2008; Pachur & Biele, 2007).
Similarly, when people know their memory is affected by other fac-

tors that are completely unrelated to the criterion dimension, they
discount ease of retrieval. For instance, people do not rely on how
easily they can retrieve instances of displaying assertive behavior
to infer their own assertiveness when they are told to attribute ease
of retrieval to ambient music (Schwarz et al., 1991).
There is some indication that this is also the case for recogni-
tion. Specifically, reliance on recognition is considerably weaker
when participants can attribute their sense of recognition to the
experimental procedure (Bröder & Eichler, 2006; Newell & Shanks,
2004; see discussion by Pachur et al., 2008). Furthermore, it has
been shown that knowledge that an object is recognized for a reason
that has nothing to do with the object’s criterion value can also
reduce the reliance on recognition. For instance, only around 30%
of participants’ choices in a city-size comparison task followed
the recognition heuristic when the recognized city was known
because of a nuclear disaster or a popular limerick unrelated to
the city size criterion (Oppenheimer, 2003, Experiment 2). In sum,
knowledge about the source of one’s recognition that indicates its
validity for a given decision seems to be considered—if available—
when people decide whether to follow the recognition heuristic
or not.
Assessing the validity of recognition based on whether one has
specific source knowledge might itself be done heuristically
(cf. Johnson, Hastroudi, & Lindsay, 1993). Specifically, one might
infer simply from one’s ability to retrieve specific knowledge about
the source of an object’s recognition—for instance, that a city is
recognized from a limerick, or from an earlier experiment—that
recognition is an unreliable cue in this case. Why? One indication
that recognition is a potentially valid predictor is when an object is
recognized after encountering it multiple times in many different
contexts (e.g., hearing a name in several conversations with differ-
ent people, or across various media), rather than through one par-
ticular, possibly biased source. Now, if an object has appeared in
many different contexts, retrieving information about any specific
context is associated with longer response times than when an
object has appeared in only one particular context (known as the
“fan effect”—Anderson, 1974). In other words, the fluency of
retrieving a specific source might indicate whether recognition is
based on a (single) biased source or not. Correspondingly, difficulty
in retrieving detailed information concerning a particular context
in which an object was encountered could also be informative, as it
might indicate that recognition has been produced by multiple
sources and is therefore an ecologically valid cue.
Given the evidence that people systematically employ the recog-
nition heuristic in some classes of environments and not others, its
use seems to involve (at least) two distinct processes. The first is an
assessment of whether recognition is a useful indicator in the given
judgment task, and the second is judging whether an object is rec-
ognized or not. A brain imaging study by Volz and colleagues (2006)
obtained evidence for the neural basis of these two processes. When
a decision could be made based on recognition, there was activa-
tion in the medial parietal cortex, attributed to contributions of rec-
ognition memory. In addition, there were independent changes in
activation in the anterior frontomedial cortex (aFMC), a brain area
involved in evaluating internal states, including self-referential
processes and social-cognitive judgments (e.g., relating an aspect of
the external world to oneself). The processes underlying this latter
activation may be associated with evaluating whether recognition is
a useful cue in the current judgment situation. Moreover, the aFMC
activity deviated more from the baseline (i.e., reflected more cogni-
tive effort) when a decision was made against recognition, suggest-
ing that making a decision in line with recognition is the default.
Does Recognition Give Rise to Noncompensatory Processing?

So far, we have reviewed evidence concerning when people
follow one central prerequisite of the recognition heuristic: making
decisions in line with recognition. The finding that people often
choose a recognized over an unrecognized object is, however, only
a necessary but not a sufficient condition indicating the use of the
recognition heuristic, as there are alternative mechanisms such as
adding up multiple cues that would also predict choice of a recog-
nized object. In this section, we review studies that have specifi-
cally tested the unique further prediction of the recognition
heuristic—that recognition is used noncompensatorily, that is, that
all other cues are ignored. Here, we focus on studies that have
examined inferences from memory, the context for which fast and
frugal heuristics were originally proposed. Other experiments in
which recognition “knowledge” was given to people along with
other cues on a computer screen in an inferences-from-givens setup
were not appropriate tests of this prediction (e.g., Newell & Shanks’s
2004 study, in which participants were told that they recognized an
imaginary company).
We find it curious that many critics have taken objection to the
fact that the recognition heuristic is a noncompensatory model.
Noncompensatory choices are commonly observed. As the authors
of one classic review of 45 process studies put it, “the results firmly
demonstrate that noncompensatory strategies were the dominant
mode used by decision makers” (Ford, Schmitt, Schechtman, Hults,
& Doherty, 1989, p. 75). Perhaps more striking is that the predic-
tions of another memory-based heuristic, availability, are also
noncompensatory (based on just a single variable, e.g., speed of

recall), but this seems to have bothered no one.
The paradigm used by most of the following studies is that for
some objects that an individual participant already recognizes prior
to the experiment, he or she is trained on additional cue knowledge
that indicates that those objects have a small criterion value. This
new knowledge beyond recognition should not affect inferences
made with the recognition heuristic. That is, the recognized object
should be chosen irrespective of whether the additional knowledge
indicates that the object has a high or a low criterion value. But do
people confirm this prediction?
An experiment by Goldstein and Gigerenzer (2002) suggests that
they do. The authors informed their U.S. participants that in about
78% of cases, German cities that have a soccer team in the premier
league are larger than cities that do not. In addition, participants
learned whether certain recognized cities had a soccer team. When
later asked to pick the larger of two German cities, participants
chose a recognized city over an unrecognized city in 92% of all
cases even when they had learned that the recognized city had no
soccer team and the additional cue information thus contradicted
recognition.
Richter and Späth (2006), Newell and Fernandez (2006; Exper-
iment 1), and Pachur et al. (2008) conducted experiments that are
direct extensions of Goldstein and Gigerenzer’s (2002) original
study. Participants learned new information about objects that con-
tradicted recognition (e.g., the additional cue indicated that the rec-
ognized city was small). Richter and Späth (Experiment 3) asked
their participants to judge the relative size of American cities in
190 pairs, replacing the soccer cue used in Goldstein and Gigerenzer’s
study with whether the city has an international airport. Without
the contradictory airport cue, 17 of 28 participants followed the
recognition heuristic with zero or one exception in the 32 relevant
decisions, and 95% (median 97%) of the judgments across all par-
ticipants followed the recognition heuristic—see Figure 5-3. When
the airport cue contradicted recognition, still exactly 17 of 28 par-
ticipants made the inferences predicted by the recognition heuris-
tic: 9 exclusively and 8 all but once (31 out of 32 times). The median
percentage of judgments in line with the recognition heuristic
remained unchanged at 97%. The mean dropped to 82%, but as
Figure 5-3 shows, this does not mean that all individuals decreased
in recognition heuristic adherence. Group means mask individual
strategy selection (for similar results, see Figure 5 in Pachur et al.,
2008). If we define a change as increasing or decreasing adherence
by more than 1 in 32 questions, then even when facing contradic-
tory information 43% of participants did not change, 39% con-
formed to the recognition heuristic less often, and 18% conformed
(a) 100
Percentage of Inferences in Accordance

With Recognition Heuristic 75
50
25
0
28 Individual Participants, With No Additional Cues
(b) 100
Percentage of Inferences in Accordance
With Recognition Heuristic
75
50
25
0
28 Individual Participants, With One Contradicting Cue
Figure 5-3: Reanalysis of Richter and Späth’s (2006) Experiment 3

based on individual data on use of recognition heuristic. The task
was to infer which of two U.S. cities has the larger population.
(a) Percentage of inferences made in accordance with the recogni-
tion heuristic when no contradicting cues were provided for the
recognized city (with participants ordered left to right by amount of
use). (b) Percentage of inferences made in accordance with the rec-
ognition heuristic when participants learned one contradicting cue
(that the recognized city does not have an international airport).
Even when participants learned a valid cue that contradicted the
recognition heuristic, a majority (17 of 28) made inferences consis-
tent with the recognition heuristic with zero or one exceptions out
of 32 decisions. (We are grateful to Richter and Späth for providing
their individual data.)
136
more often. Again, individual differences can be clearly seen; only

4 of 28 participants did not follow the recognition heuristic in the
majority of judgments, and no participant adopted an anti-recogni-
tion strategy.
Newell and Fernandez (2006) taught participants that German
cities they recognized either did or did not have a soccer team and
subsequently asked them to judge the relative size of these and
other, unrecognized cities. Although no cue knowledge about
unrecognized cities was taught directly, the authors manipulated
knowledge of the probability that an unrecognized city had a soccer
team (which would indicate that the city is large). If recognition
were used in a noncompensatory manner, participants’ additional
knowledge about whether a city has a soccer team should not affect
their judgments. On the aggregate level, however, it did. The mean
percentage of judgments where participants picked the recognized
city was, overall, smaller when participants had learned the addi-
tional soccer team cue for that city that contradicted recognition
(than when the cue supported recognition: 64% vs. 98%), and also
smaller when the probability that an unrecognized city had a soccer
team was high (than when the probability was low: 77% vs. 86%).
However, as in Richter and Späth’s (2006) Experiment 3, the group
means mask individual differences: Overall, 23% of participants
always chose the recognized city, irrespective of contradicting cue
information (see Pachur et al., 2008).
In the studies of Richter and Späth (2006; Experiment 2) and
Pachur et al. (2008), recognition was contradicted by not just one,
but by up to three cues.4 Would people still follow recognition in
this situation, as predicted by the recognition heuristic? In Richter
and Späth’s experiment, the task was to decide which of two air-
lines was less safe (in terms of fatality rates). It should be noted,
however, that in this domain recognition was a rather poor predic-
tor. Here, recognition was assumed to point in the direction of safer
airlines. Before making these inferences, participants were taught
between zero and three additional cues about recognized airlines
that indicated that the airline had a high or low fatality rate. The
median participant judged a recognized airline as safer than an
unrecognized one in 100% of the cases when there were no nega-
tive cues conveyed about the recognized airline in a training ses-
sion, 94% when there was one negative cue, 91% with two, and
75% with three. The corresponding means are 98%, 88%, 80%, and
67%, with the difference from the medians again illustrating strong
4. The experiment by Bröder and Eichler (2006) followed a similar

methodology but involved experimentally induced rather than natural
recognition and so is not discussed here.
individual differences as in Richter and Späth’s Experiment 3. In

addition, 6 of 32 people picked the recognized airline as safer with
zero or one exception, irrespective of the number of contradicting
cues. As Richter and Späth observed, the finding that most people
think an unrecognized airline is less safe than a recognized airline
with three strikes against it speaks to the surprising strength of
brand recognition. Pachur et al. (2008), whose participants were
taught up to three additional cues about British cities and subse-
quently asked to judge the cities’ relative sizes, observed even
higher proportions of participants ignoring further cue knowledge
than using it: Between 48% and 60% of their participants picked
the recognized city with zero exception. That is, a very large pro-
portion of participants followed the choice indicated by recogni-
tion even when it was contradicted by three additional cues.
In summary, a number of studies have shown that a large propor-
tion of people make decisions consistent with a noncompensatory
use of recognition, as predicted by the recognition heuristic. That
is, even when participants have cue knowledge available that con-
tradicts recognition, this knowledge often does not reverse the judg-
ment determined by recognition. Some people, however, do seem
to switch to different strategies, and these can be either compensa-
tory or noncompensatory.
What are those other strategies? The conclusion that many
researchers draw, that not using the recognition heuristic implies
compensatory cue use, is incorrect. One cannot conclude the
presence of a compensatory strategy from the observation that some
individuals do not follow a particular noncompensatory strategy.
Nor can one conclude the opposite. The reason is that there are
many noncompensatory strategies—such as conjunctive, disjunc-
tive, and lexicographic rules, elimination-by-aspects, take-the-best,
and others—just as there are numerous compensatory ones, includ-
ing equal weight models (Dawes’s rule), naïvely weighted linear
models (Franklin’s rule), and multiple regression (Ford et al., 1989;
Gigerenzer & Goldstein, 1996; Hogarth, 1987; Hogarth & Karelaia,
2005b; Tversky, 1972). Few studies have tested the recognition
heuristic against alternative models that integrate recognition with,
or rely exclusively on, additional knowledge (but see Marewski,
Gaissmaier, Schooler, Goldstein, & Gigerenzer, 2010; Marewski &
Schooler, 2011; Pachur & Biele, 2007). This is what is needed to
uncover which strategies people select from their adaptive toolbox
when they are not using the recognition heuristic.
One possible approach to identifying the different strategies
people might use when they recognize one object but not the other
is to also examine process data (e.g., response times). Bergert and
Nosofsky (2007) and Bröder and Gaissmaier (2007) have shown the
potential of supplementing analyses of outcomes with process data
to discriminate between take-the-best and weighted additive strate-

gies. Process data would also allow one to test the possibility that
some people who decide in line with the recognition heuristic do
not completely ignore additional information but rather retrieve it
without considering it in their judgment. This can be tested by
comparing the response times of choices where additional knowl-
edge beyond recognition is available with those of choices where
no such additional knowledge is available. If, ceteris paribus, addi-
tional knowledge is indeed not retrieved, the response times should
not differ (for a critical discussion, see Pachur, 2011).
Another direction for better understanding individual differ-
ences in how recognition is used in decision making is to compare
decision making by younger and older adults, whose cognitive sys-
tems usually differ in ways potentially relevant for the use of recog-
nition. Such age-comparative studies on fast and frugal heuristics
have begun to provide intriguing results concerning the adaptive
use of these heuristics and their role in older adults’ decision
making (e.g., Mata, Schooler, & Rieskamp, 2007; Pachur, Mata, &
Schooler, 2009). For instance, Pachur et al. (2009) found that
although younger and older adults equally reduce their reliance on
recognition in environments with low (compared to high) recogni-
tion validity, older adults show deficits in their ability to selectively
suspend their use of the recognition heuristic for particular deci-
sions. This reduced adaptivity was mediated by age differences in
cognitive speed, supporting Pachur and Hertwig’s (2006) results
that mental resources are necessary to make decisions contradict-
ing recognition information.
Other Judgment Phenomena Based on Memory of Previous Encounters
Several different lines of research have explored how memory of

past exposure to objects can be exploited to make inferences
about unknown aspects of the environment. As we describe in this
section, recognition-like phenomena underlie a number of classical
findings in judgment research, such as the reiteration effect and
the mere exposure effect. What is different about the recognition
heuristic, however, is its precise account of the process involved
in making an inference. Further research on the recognition heuris-
tic may be inspired by considering these other research traditions,
and vice versa.
Inferences About the Truth of a Statement

An important aspect of the world that we often are unable to assess
for certain and therefore have to infer is whether a statement we
encounter is true or false. What is the role of recognition, or more

generally, memory traces created by previous encounters with a
statement, in making such inferences? Hasher, Goldstein, and
Toppino (1977) presented participants, over a period of 3 weeks,
with statements that were either true or false (e.g., “the People’s
Republic of China was founded in 1947”) and after each presenta-
tion participants indicated their confidence that the statement was
true. Most of the statements appeared only once, but some were
presented repeatedly across the three sessions. Hasher and col-
leagues found that for repeated statements, as repetition frequency
went up participants expressed an increasing confidence in the
veracity of the statement. This reiteration effect (or frequency-
validity effect) can be taken to indicate that participants used the
strength of the memory traces of the statements as an indication of
how likely the statement was to be true. As in the recognition heu-
ristic, people here apparently exploited memory of previous
encounters with a stimulus as a cue to make an inference about an
inaccessible criterion (i.e., the truth of a statement).
The reiteration effect is closely related to findings by Gilbert
and colleagues, who presented their participants with a series of
statements followed by information as to whether each statement
was true or false (Gilbert, 1991; Gilbert, Krull, & Malone, 1990;
Gilbert, Tafarodi, & Malone, 1993; but see Hasson, Simmons, &
Todorov, 2005). In one experiment, the processing of the additional
information was interrupted (e.g., by a tone to which the partici-
pants had to respond) and as a consequence, participants had an
uncertain basis for assessing the statement’s veracity (Gilbert et al.,
1990). Unsurprisingly, when later asked to categorize studied and
unstudied statements as true, false, or novel, participants misclas-
sified some of the statements. Compared to uninterrupted par-
ticipants, those who were interrupted showed a much stronger
tendency to classify false statements as true than to classify true
statements as false. In other words, one presentation of a statement
seemed to suffice to increase the tendency to believe that the state-
ment is true. In contrast, participants tended to classify previously
unseen statements as false. So even single previous encounters may
be used by people to infer something about a statement, namely,
that it is true. Although this default to believe a previously seen
statement can be overturned, making such a switch appears to
require additional cognitive resources: When under time pressure
or cognitive load, participants tended to treat even statements they
were previously informed were false as true (Gilbert et al., 1993).
This parallels the recognition heuristic finding that under time
pressure people tend to ascribe recognized objects a higher crite-
rion value than unrecognized objects even when recognition is a
poor cue (Pachur & Hertwig, 2006).
Interestingly, Gilbert et al. (1990) also mentioned that the initial

belief in the truth of statements that one encounters “may be eco-
nomical and…adaptive” (p. 612), thus offering a potential link to
the concept of ecological rationality. Specifically, they proposed
that the propositional system of representation that underlies
cognition (i.e., assessments of whether a statement is true or false)
might be an evolutionary outgrowth of the representational system
that underlies perception. What we see in the world is usually also
what exists, and we also tend to believe that statements we encoun-
ter are true. However, such an evolutionary argument may not be
necessary, as lifelong experience might equally teach us that most
statements we hear are true. This intuition could be tested by eco-
logical analyses examining the proportion of true and false state-
ments in the world (or in specific environments) that people
encounter.
Gilbert’s work could also inspire further research on the recogni-
tion heuristic. Gilbert (1991) argued that “acceptance is psycho-
logically prior” to rejection of the truth of a statement (p. 116).
Given that recognition also has (temporal) priority as an inferential
cue, do people have a general tendency to accept a recognized
object, even when their task is to reject the recognized object? In
one unique study, McCloy, Beaman, Frosch, and Goddard (2010)
compared the use of recognition in different framings of a judgment
task (i.e., “Which object is larger?” vs. “Which object is smaller?”).
They found that recognition was used somewhat more heteroge-
neously when the task was to pick the object with the lower crite-
rion value compared to when it was to pick the one with the higher
criterion value, with some participants systematically picking the
recognized (but wrong) object even in the former situation. But a
general tendency to choose recognized objects does not mean that
the recognition heuristic is maladaptive. Rather, our tasks may usu-
ally be to find an object with a high criterion value. Under such
circumstances, it would make sense to have a mechanism with a
default to “accept” a recognized object. This hypothesis could again
be tested via ecological analyses of common task environments.
Estimation
The decisions considered so far involved simple categorical judg-
ments about the environment, such as, Which is larger: A or B? Is
the statement X true or false? But often we have to make an absolute
estimate regarding some aspect of an object and come up with a
numerical value (e.g., the number of inhabitants of a city). Is infor-
mation about whether one has heard of an object also used for esti-
mation? This possibility has been discussed by Brown (2002), who
observed in studies on estimation of dates of events and country
sizes that participants estimated unrecognized events as occurring

in the middle past5 and unfamiliar countries as having rather small
populations. Brown’s results suggest that, as in the recognition heu-
ristic, people take their ignorance as useful information for where
to locate an object on a quantitative dimension even in absolute
estimation. He also points to the ecological rationality of this
strategy: “As it turns out, this assumption is a reasonable one, and
as a consequence, guessed estimates tended to be fairly accurate”
(pp. 339–340). Compared to the recognition heuristic, the processes
involved in estimation are probably more complex, using metric
and distribution knowledge to convert ignorance into a quantita-
tive estimate (but see chapter 15 for a simple heuristic approach to
estimation). Lee and Brown (2004) proposed a model describing
how people make date estimates of unknown events by combining
the fact that they are not recognized with other information pro-
vided by the task.
Preference and Ascription of Positive Meaning

So far we have looked at recognition-based inferences about objec-
tive characteristics of the environment. What about the effects of
previous encounters on preferences, for which there is no objective
criterion? As shown in the mere exposure effect (Zajonc, 1968),
repeatedly encountering an object results in an increased liking or
preference for the object. In addition, objects such as symbols are
generally ascribed a more positive meaning the more often they
have been encountered. This indicates that memory traces of previ-
ous encounters are also used for constructing one’s affective
responses to the environment. However, it is important to stress
that in contrast to the recognition heuristic, these effects do not
require that the object is recognized as having been seen before.
Hence, the recognition heuristic cannot account for the mere expo-
sure effect. Zajonc (1980) postulated that in the construction of
preferences, cognitive and affective processes are relatively inde-
pendent, which might explain how an object can be preferred with-
out a cognitive basis (e.g., without being recognized). The fluency
heuristic (see chapter 6) is one possible mechanism by which (con-
sciously) unrecognized objects may gain preference through
repeated exposure (and the same process may also apply to infer-
ences between unrecognized objects).
5. A similar observation was made by Pachur and Hertwig (2006): In an

estimation task, people assigned unrecognized diseases to intermediate,
rather than extremely low, frequency categories.
Conclusion
The recognition heuristic was proposed as a model of how the

experience of recognition, indicating a particular statistical struc-
ture in the environment, can be exploited by a smart and simple
mechanism to make inferences about the environment. By virtue of
its precise formulation that allows clear-cut predictions, the recog-
nition heuristic has been the focus of a large number of studies in a
relatively short time. The studies indicate that a majority of people
consistently rely on the recognition heuristic when it is ecologi-
cally rational. Furthermore, the higher recognition validity is, the
more people rely on it, signaling its adaptive use. It thus offers per-
haps the simplest realization of Herbert Simon’s notion that bound-
edly rational decision making can arise from simple mental tools
that are matched to the structure of the environment.
6
How Smart Forgetting Helps
Heuristic Inference
Lael J. Schooler
Ralph Hertwig
Stefan M. Herzog
“You see,” he [Sherlock Holmes] explained, “I consider

that a man’s brain originally is like a little empty attic,
and you have to stock it with such furniture as you
choose. A fool takes in all the lumber of every sort that he
comes across, so that the knowledge which might be
useful to him gets crowded out, or at best is jumbled up
with a lot of other things so that he has a difficulty in
laying his hands upon it. Now the skilful workman is
very careful indeed as to what he takes into his brain-
attic. He will have nothing but the tools, which may help
him in doing his work, but of these he has a large assort-
ment, and all in the most perfect order. It is a mistake to
think that that little room has elastic walls and can dis-
tend to any extent. Depend upon it—there comes a time
when for every addition of knowledge you forget some-
thing that you knew before. It is of the highest impor-
tance, therefore, not to have useless facts elbowing out
the useful ones.”*
Arthur Conan Doyle
I n The Mind of a Mnemonist, Luria (1968) examined one of the

most virtuoso memories ever documented. The possessor of this
memory—S. V. Shereshevskii, to whom Luria referred as S.—reacted
to the discovery of his extraordinary powers by quitting his job as a
reporter and becoming a professional mnemonist. S.’s nearly per-
fect memory appeared to have “no distinct limits” (p. 11). Once, for
Portions of this chapter are adapted from Schooler & Hertwig (2005)
and Hertwig, Herzog, Schooler, & Reimer (2008), with permission from the
American Psychological Association.
144
HOW SMART FORGETTING HELPS HEURISTIC INFERENCE 145
instance, he memorized a long series of nonsense syllables that

began “ma, va, na, sa, na, va, na, sa, na, ma, va” (Luria, 1968, p. 51).
Eight years later, he recalled the whole series without making a
single error or omission. This apparently infallible memory did not
come without costs. S. complained, for example, that he had a poor
memory for faces: “People’s faces are constantly changing; it is the
different shades of expression that confuse me and make it so hard
to remember faces” (p. 64). “Unlike others, who tend to single out
certain features by which to remember faces,” Luria wrote, “S. saw
faces as changing patterns. . ., much the same kind of impression a
person would get, if he were sitting by a window watching the ebb
and flow of the sea’s waves” (p. 64). One way to interpret these
observations is that cognitive processes such as generalizing,
abstracting, and classifying different images of, for example, the
same face require forgetting the differences between them. In other
words, crossing the “‘accursed’ threshold to a higher level of
thought” (Luria, 1968, p. 133), which in Luria’s view S. never did,
may require the ability to forget.
Is forgetting a nuisance and a handicap or is it essential to the
proper functioning of memory and higher cognition? Much of the
experimental research on memory has been dominated by ques-
tions of quantity, such as how much information is remembered
and for how long (see Koriat, Goldsmith, & Pansky, 2000). From this
perspective, forgetting is usually viewed as a regrettable loss of
information. Some researchers have suggested, however, that for-
getting may be functional. One of the first to explore this possibility
was James (1890), who wrote, “In the practical use of our intellect,
forgetting is as important a function as recollecting” (p. 679). In his
view, forgetting is the mental mechanism behind the selectivity of
information processing, which in turn is “the very keel on which
our mental ship is built” (p. 680).
A century later, Bjork and Bjork (1988) argued that forgetting pre-
vents out-of-date information—say, old phone numbers or where
one parked the car yesterday—from interfering with the recall of
currently relevant information. Altmann and Gray (2002) make a
similar point for the short-term goals that govern our behavior; for-
getting helps us to keep from retrieving the speed limit that was
appropriate in town when we return to the freeway. From this per-
spective, forgetting prevents the retrieval of information that is
likely obsolete. In fact, this is a function of forgetting that S. para-
doxically had to do consciously. As a professional mnemonist, he
committed thousands of words to memory. Learning to erase the
images he associated with those words that he no longer needed to
recall was an effortful, difficult process (Luria, 1968).
How and why forgetting might be functional has also been the focus
of an extensive analysis conducted by Anderson and colleagues
(Anderson & Milson, 1989; Anderson & Schooler, 1991, 2000; Schooler
& Anderson, 1997). On the basis of their rational analysis of memory,
they argued that much of memory performance, including forgetting,
might be understood in terms of adaptation to the structure of the envi-
ronment. The rational analysis of memory assumes that the memory
system acts on the expectation that environmental stimuli tend to
reoccur in predictable ways. For instance, the more recently a stimu-
lus has been encountered, the higher the expectation that it will be
encountered again and information about that stimulus will be needed.
Conversely, the longer it has been since the stimulus was encountered,
the less likely it is to be needed soon, and so it can be forgotten.
A simple time-saving feature found in many word processors can
help illustrate how recency can be used to predict the need for infor-
mation. When a user prepares to open a document file, some pro-
grams present a “file buffer,” a list of recently opened files from which
the user can select. Whenever the desired file is included on the list,
the user is spared the effort of either remembering in which folder
the file is located or searching through folder after folder. For this
mechanism to work efficiently, however, the word processor must
provide users with the files they actually want. It does so by “forget-
ting” files that are considered unlikely to be needed on the basis of
the assumption that the time since a file was last opened is negatively
correlated with its likelihood of being needed now. The word proces-
sor uses the heuristic that the more recently a file has been opened,
the more likely it is to be needed again now. In the rest of this chap-
ter, we show how human memory bets on the same environmental
regularity, and how this bet can enable simple heuristics, including
the recognition and fluency heuristics, to operate effectively.
Forgetting: The Retention Curve
The rational analysis of memory rests on the assumption that envi-

ronmental stimuli make informational demands on the cognitive
system that are met by retrieving memory traces associated with the
stimuli. Consequently, memory performance should reflect the pat-
terns with which environmental stimuli appear and reappear in the
environment. An implication is that statistical regularities in the
environment can be used to make predictions about behavior, say,
performance in memory experiments. Conversely, performance on
memory tasks can provide predictions about the environment. One
such prediction follows from the retention function, an iconic
manifestation of the regularity behind forgetting in human memory.
This function is studied by exposing people to an item and then
testing performance at various lags, known as retention intervals.

Squire (1989), for example, presented people with the names of real
and made-up TV shows. They had to decide whether the names
were of real shows. Figure 6-1 plots people’s recognition perfor-
mance as a function of the number of years since the show’s cancel-
lation. The more time has passed since a TV show was cancelled,
the lower the memory for that show.
From the perspective of the rational analysis of memory, perfor-
mance falls as a function of retention interval because memory
performance reflects the probability of encountering a particular
environmental stimulus (e.g., a name), which in turn falls as a
power function of how long it has been since the stimulus was last
encountered. For instance, the probability that you will encounter
the TV show name “The Mary Tyler Moore Show,” a hit in the
1970s, should currently be much lower than the probability that
you will encounter the name “Grey’s Anatomy,” a top-rated show
as we write this chapter. Anderson and Schooler (1991) tested the
link between memory performance and environmental regularities
in environments that place informational demands on people (see
also Anderson & Schooler, 2000; Schooler & Anderson, 1997). One
such environment involves the daily distribution of people who
sent electronic mail messages, capturing aspects of a social envi-
ronment. Another environment, linguistic in nature, involves word
usage in speech to children. A third environment is that of New
York Times headlines. Figure 6-2 shows the probability of a word
.80
.75
.70
Recognition Rate
.65
.60
.55
.50
0 2 4 6 8 10 12 14 16
Years Since Cancellation
Figure 6-1: Mean recognition rates of television shows as a function

of years since the show was canceled (data from Squire, 1989).
.18
.16
.14
Probability of Occurrence
.12
.10
.08
.06
.04
.02
0
0 20 40 60 80 100
Days Since Word Last Occurred
Figure 6-2: Probability of a word being used in New York Times

headlines as a function of number of days since it was last used
(data from Anderson & Schooler, 1991).
occurring in the headlines as a function of the number of days since

that word had previously occurred.1 Just as memory performance
falls as a function of retention interval, so too does the probability
of a word appearing—that is, it falls as a function of the time since
it was last mentioned. Consistent with Anderson and Schooler’s
predictions, the memory retention function reflects statistical regu-
larities in the world, and vice versa. The rational analysis of memory
framework accounts for a variety of memory phenomena (see
Anderson & Schooler, 2000, for a review), including spacing effects,
to which we turn now.
Spacing Effects in Memory

Nearly all laboratory memory experiments involve the presentation
of material to participants that must be retrieved later. When mate-
rial is presented multiple times, the lag between these presenta-
tions is known as spacing, and the lag between the final presentation
and test is again called the retention interval. The spacing effect
1. The actual predictions of the rational analyses are in terms of odds,

where odds equal p/(1− p). However, when probability p is very small,
odds and p are quite similar. For example, a p of 0.05 corresponds to odds
of 0.0526. As most people are more comfortable thinking in terms of prob-
abilities, we use them here.
involves the interaction of the spacing between presentations and

the retention interval. For verbal material, one tends to observe that
at short retention intervals performance is better for tightly massed
presentations (i.e., separated by short intervals), but at longer reten-
tion intervals performance is better for widely spaced presenta-
tions. Consider two groups of students preparing for a foreign
language vocabulary test. What is the most efficient use of the lim-
ited time they have? The cramming students would do all of their
studying on the Wednesday and Thursday before the exam on
Friday. The conscientious students would study a little each week,
say, the Thursday in the week preceding the exam and again on the
Thursday before the Friday exam. The stylized result is that the
cramming students, whose study spacing matched the one-day
retention interval, would do better on the Friday exam than the
conscientious ones. This would seem to vindicate all those procras-
tinators in college who put off studying for their exams until the
last minute.
But there is a catch. If the material were tested again later, say, in
a pop quiz on the following Friday, the conscientious students
would outperform the crammers. That is, the forgetting rate for
material learned in a massed way is faster than for material learned
in a more distributed fashion. Plotting the performance of the
two groups of students on the two Fridays would be expected to
reveal the crossover interaction typically found in experiments
that manipulate study spacing and retention lag. The results from
one such experiment are graphed in Figure 6-3, illustrating this
interaction at timescales of days. Participants in Keppel (1967)
studied pairs of words a total of eight times. People in the massed
condition studied the material eight times in 1 day, while those
in the distributed condition studied the material twice on each
of 4 days. Immediately after studying the material, people in the
massed condition performed best, but after 8 days those exposed to
distributed presentations performed best.
Spacing Effects in the Environment

What pattern in the environment would correspond to spacing
effects in memory performance? Figure 6-4 shows the spacing
analysis from Anderson and Schooler (1991), which was restricted
to those words in New York Times headlines that occurred exactly
twice in a 100-day window. For purposes of illustration, consider
the uppermost point that corresponds to a word that, say, was
mentioned on January 26 and then again on January 31. The y-axis
plots the chances (probability) that it would be mentioned yet again
on, say, February 5. The other labeled point represents words that
were mentioned on, say, October 1 and not again until December 1,
8
Mean Correct Responses 6 Distributed Practice
2 Massed Practice
0
0 1 2 3 4 5 6 7 8
Retention Interval In Days
Figure 6-3: Memory performance as a function of whether learn-

ing followed a massed or distributed practice regimen (data from
Keppel, 1967).
.030
.025 Word on Jan 26 & Jan 31

Is it on Feb 5?
Probability of Occurrence
.020 Massed
Occurrences
.015 Word on Oct 1 & Dec 1

Is it on Feb 5?
Distributed
.010 Occurrences
.005
0
0 10 20 30 40 50 60 70 80 90
Days Since Word Last Occurred
Figure 6-4: Probability of a word being used in the New York Times
headlines as a function of number of days since it was last used,
given that the word was used just twice in the previous 100 days.
The steeper curve shows words whose two uses in the headlines
were massed near in time to each other, and the shallower curve
shows words whose occurrences were distributed farther apart
(data from Anderson & Schooler, 1991).
150
but with the interval from the last mention to February 5 now being
66 days. One way to characterize the results in Figure 6-4 is that
when words are encountered in a massed way there is an immedi-
ate burst in the likelihood of encountering them again, but that this
likelihood drops precipitously. In contrast, words encountered in a
more distributed fashion do not show this burst, but their likeli-
hood of being encountered in the future remains relatively con-
stant. The difference is akin to that between the patterns with which
one needs a PIN (personal identification number) for the safe in a
hotel room and the PIN for one’s bank account. While on vacation,
one will frequently need the safe’s PIN, but over an extended period
one is more likely to need the PIN for the bank account. The idea is
that the memory system figures the relative values of the codes over
the short and long run, based on the pattern with which they are
retrieved. So one can think about cramming for an exam as an
attempt to signal to the memory system that the exam material will
likely be highly relevant in the short term, but not so useful further
in the future.
These isomorphisms between regularities in memory and in the
statistical structure of environmental events exemplify the thesis
that human memory uses the recency, frequency, and spacing with
which information has been needed in the past to estimate how
likely that information is to be needed now. Because processing
unnecessary information is cognitively costly, a memory system
able to prune away little-needed information by forgetting it is
better off. In what follows, we extend the analysis of the effects of
forgetting on memory performance to its effects on the performance
of simple inference heuristics. To this end, we draw on the research
program on fast and frugal heuristics (Gigerenzer, Todd, & the ABC
Research Group, 1999) and the ACT-R research program (Adaptive
Control of Thought–Rational—see Anderson & Lebiere, 1998). The
two programs share a strong ecological emphasis.
The research program on fast and frugal heuristics examines
simple strategies that exploit informational structures in the envi-
ronment, enabling the mind to make surprisingly accurate deci-
sions without much information or computation. The ACT-R
research program also strives to develop a coherent theory of cog-
nition, specified to such a degree that phenomena from perceptual
search to the learning of algebra might be modeled within the
same framework. In particular, ACT-R offers a plausible model of
memory that is tuned, according to the prescriptions of the rational
analysis of memory, to the statistical structure of environmental
events. This model of memory will be central to our implementa-
tion of the recognition heuristic (Goldstein & Gigerenzer, 2002)
and the fluency heuristic (Hertwig, Herzog, Schooler, & Reimer,
2008), both of which depend on phenomenological assessments of
memory retrieval. The former operates on knowledge about whether

a stimulus can be recognized, whereas the latter relies on an assess-
ment of the fluency, or speed, with which a stimulus is processed.
By housing these memory-based heuristics in a common cognitive
architecture, we aim to provide models that allow us to analyze
whether and how loss of information—that is, forgetting—fosters
the performance of these heuristics. We begin by first describing the
recognition heuristic, the fluency heuristic, and the ACT-R archi-
tecture; then we turn to the question of whether the recognition and
the fluency heuristic benefit from smart forgetting.
How Recognition Enables Heuristic Inference: The Recognition Heuristic
The recognition heuristic illustrates the interplay between the

structure of the environment and core capacities of the human
mind (Goldstein & Gigerenzer, 2002; see chapter 5 for a detailed
discussion). In short, the recognition heuristic uses the information
about whether objects are recognized or not to make inferences
about their values on some quantitative criterion dimension. Its
policy goes like this:
Recognition heuristic: If one of two objects is recognized and

the other is not, then infer that the recognized object has the
higher value with respect to the criterion of interest. (Goldstein
& Gigerenzer, 2002, p. 76)
To successfully apply the recognition heuristic, the probability

of recognizing objects needs to be correlated with the criterion to
be inferred. This is the case, for example, in many geographical
domains such as city or mountain size (Goldstein & Gigerenzer,
2002) and in many competitive domains such as predicting the suc-
cess of tennis players (Serwe & Frings, 2006). One reason why
objects with larger criterion values are more often recognized is that
they are more often mentioned in the environment (see chapter 5).
To be applied, the recognition heuristic requires that a person
does not recognize too much or too little: One of the alternatives
needs to be recognized, but not the other. If a person recognizes
too few or too many objects, then recognition will be uninformative
because it will rarely discriminate between the objects. Consider a
die-hard fan of the National Basketball Association who will not
be able to use the recognition heuristic to predict the outcome of
any game, simply because she recognizes all of the teams. In con-
trast, an occasional observer of basketball games may recognize
some but not all teams, and thus can more often use the recognition
heuristic. The fact that the recognition heuristic feeds on partial
ignorance implies the possibility that forgetting may boost this

heuristic’s performance. Before we investigate this odd possibility,
let us consider what a person does who recognizes all the teams.
In this case, more knowledge-intensive strategies, such as the
take-the-best heuristic, can be recruited (Gigerenzer et al., 1999).
Take-the-best sequentially searches for cues that are correlated with
the criterion in the order of their predictive accuracy and chooses
between the objects on the basis of the first cue found that discrim-
inates between them (Gigerenzer & Goldstein, 1996). But there is a
potentially faster alternative to this knowledge-based strategy—
namely, the fluency heuristic.
How Retrieval Fluency Enables Heuristic Inference: The Fluency Heuristic
When two objects to be decided between are both recognized, the flu-
ency heuristic (see, e.g., Jacoby & Brooks, 1984; Toth & Daniels, 2002;
Whittlesea, 1993) can be applied. It can be expressed as follows:
Fluency heuristic: If one of two objects is more fluently pro-

cessed, then infer that this object has the higher value with
respect to the criterion of interest.
Like the recognition heuristic, the fluency heuristic considers

only a single feature of the objects: the fluency with which the
objects are processed when encountered. In numerous studies, this
processing fluency, mediated by prior experience with a stimulus,
has been shown to function as a cue in a range of judgments. For
example, more fluent processing due to previous exposure can
increase the perceived fame of nonfamous names (the false fame
effect; Jacoby, Kelley, Brown, & Jasechko, 1989) and the perceived
truth of repeated assertions (the reiteration effect; Begg, Anas, &
Farinacci, 1992; Hertwig, Gigerenzer, & Hoffrage, 1997).
In the literature, one can find many different variants of fluency,
including absolute, relative, conceptual, and perceptual fluency, to
name a few. Fluency has also been invoked in explaining a wide
range of judgments, including evaluative and aesthetic judgments
(e.g., Winkielman & Cacioppo, 2001; see Reber, Schwarz, &
Winkielman, 2004, and Winkielman, Schwarz, Fazendeiro, & Reber,
2003, for reviews), and confidence and metacognitive judgments
(e.g., Kelley & Lindsay, 1993; Koriat & Ma’ayan, 2005). One can
also, although less frequently, come across the notion of a fluency
heuristic, prominently in the work of Kelley and Jacoby (1998),
Whittlesea (1993), and Whittlesea and Leboe (2003). Abstracting
from the different meanings of the term fluency heuristic across
articles, the gist appears to be that people attribute the fluent
processing of stimuli to having experienced the stimuli before. The

ACT-R fluency heuristic, as proposed by Schooler and Hertwig
(2005; see also Hertwig et al., 2008; Marewski & Schooler, 2011),
aims to exploit the subjective sense of fluency in the process of
making inferences about objective properties of the world.
The fluency heuristic, in contrast to the recognition heuristic,
does not exploit partial ignorance but rather graded recognition.
Nevertheless, it may also benefit from forgetting because fluency is
more easily applicable if there are large detectable differences in
fluency between objects—and forgetting could create such differ-
ences. To investigate the role of forgetting in memory-based heuris-
tics and to model the relation between environmental exposure and
the information in memory on which heuristics such as recognition
and fluency feed, we implement them within the ACT-R architec-
ture, which we now describe.
A Brief Overview of ACT-R
ACT-R is a theory of cognition constrained by having to account for

a broad swath of human thought. The core of ACT-R is constituted
by a declarative memory system for facts (knowing that) and a pro-
cedural system for rules (knowing how). The declarative memory
system consists of records that represent information (e.g., facts
about the outside world, about oneself, about possible actions).
These records take on activations that determine their accessibility,
that is, whether and how quickly they can be retrieved. A record’s
activation Ai is determined by a combination of the base-level
strength of the record, Bi, and the Sji units of activation it receives
from each of the j elements of the current context:
Ai Bi + ∑ S ji
j
A record’s base-level strength is rooted in its environmental pat-

tern of occurrence. The activation of a record is higher the more
frequently and the more recently it has been used; activation
strengthens with use and decays with time. Specifically, Bi is deter-
mined by how frequently and recently the record has been encoun-
tered in the past (e.g., studied) and can be stated as follows:
⎛ n d⎞
Bi l ∑t ,
⎝ k =1 k ⎟⎠
where the record has been encountered n times in the past at lags of
t1, t2,. . .,tn. Finally, d is a decay parameter that captures the amount
of forgetting in declarative memory and thus determines how much
information about an item’s environmental frequency is retained

in memory over time, as reflected in the corresponding record’s
activation. Typically, d is set to –0.5, which has been used to fit a
wide range of behavioral data (Anderson & Lebiere, 1998).
The procedural system consists of if–then rules that guide the
course of action an individual takes when performing a specific
task. The if side of a production rule specifies various condi-
tions, which can include the state of working memory, changes in
perceptual information such as detecting that a new object has
appeared, and many other inputs. If all the conditions of a pro-
duction rule are met, then the rule fires, and the actions specified in
the then side of the rule are carried out. These actions can include
updating records, creating new records, setting goals, and initiating
motor responses. This combination of components makes ACT-R
a good framework within which to implement decision-making
strategies, in cognitively plausible ways (Todd & Schooler, 2007).
Do the Recognition and Fluency Heuristics Benefit From Smart Forgetting?
Bettman, Johnson, and Payne (1990) explored the relative cognitive

complexity and effort that various decision strategies require by
representing them in production rules consisting of simple cogni-
tive steps, such as read, add, and compare. They termed them ele-
mentary information processes. Building on this work, we show
how implementing the recognition and fluency heuristics in ACT-R
enables us to explore how properties of the cognitive system,
such as forgetting, affect the heuristics’ performance in specific
environments. According to Goldstein and Gigerenzer (2002), the
recognition heuristic works because there is a chain of correlations
linking the criterion (e.g., the strength of an NBA basketball team),
via environmental frequencies (e.g., how often the team is men-
tioned in the media), to recognition. ACT-R’s activation tracks just
such environmental regularities, so that activation differences
reflect, in part, frequency differences. Thus, it would be possible in
principle that inferences—such as deciding which of two players is
better or which of two cities is larger—could be based directly on
the activation of associated records in memory (e.g., player or city
representations). However, this possibility is inconsistent with
the ACT-R framework for reasons of psychological plausibility:
Subsymbolic quantities, such as activation, are assumed not to be
directly accessible, just as people presumably cannot make deci-
sions by directly observing differences in their own neural firing
rates. Instead, though, the system could capitalize on activation
differences associated with various objects by gauging how it
responds to them. The simplest measure of the system’s response is
whether a record associated with a specific object can be retrieved

at all, and we use this to implement the recognition heuristic in
ACT-R.
First, our model learned about large German cities based on arti-
ficial environments that reflected how frequently the cities were
mentioned in an American newspaper (see Schooler & Hertwig,
2005, for details). Second, recognition rates for the model were
calibrated against the empirical recognition rates that Goldstein
and Gigerenzer (2002) observed. In accordance with previous
models of recognition in ACT-R (Anderson, Bothell, Lebiere, &
Matessa, 1998), recognizing a city was considered to be equivalent
to retrieving the record associated with it. Third, the model was
tested on pairs of German cities. The model’s recognition rates from
the second step defined the probability that it would successfully
recognize a city. The production rules for the recognition heuristic
dictated that whenever one city was recognized and the other
was not, the recognized one was selected as being larger. Such a
decision rule closely matched the observed human responses. In all
other cases (both cities recognized or unrecognized), the model
made a guess. With this model in hand, we can ask whether forget-
ting can boost the accuracy of the memory-based inferences made
by the recognition heuristic.
Does Forgetting Benefit the Recognition Heuristic?

To address this question, we varied the decay rate d and observed
how the resulting changes in recognition affect inferences in the
city population task. The upper bound of the decay rate, 0, means
no forgetting, so that the strength of a memory record is strictly a
function of its frequency. Negative values of d imply forgetting, and
more negative values imply more rapid forgetting. Using a step
size of 0.01, we tested d values ranging from 0 to −1, the latter being
twice ACT-R’s default decay rate. In Figure 6-5, the solid line shows
the recognition heuristic’s average level of accuracy on pairwise
comparisons of all German cities it knew, including pairs in which
it had to guess because both cities were recognized or unrecognized.
Three aspects of this function are noteworthy. First, the recognition
heuristic’s performance assuming no forgetting (56% correct) is
substantially worse than its performance assuming the “optimal”
amount of forgetting (63.3% correct). Second, ACT-R’s default
decay value of –0.5 yields 61.3% correct, only slightly below the
peak performance level, which is reached at a decay rate of –0.34.
Third, the accuracy curve has a flat maximum, with all decay
values from –0.13 to –0.56 yielding performance in excess of
60% correct. These results demonstrate that forgetting enhances
the performance of the recognition heuristic, and the amount of
.95 Recognition Heuristic

Proportion of Correct Inferences Fluency Heuristic
.90
.85
.80
.75
.70
.65
.60
.55
.50
−1 −0.8 −0.6 −0.4 −0.2 0
(more forgetting) (less forgetting)
Memory Decay Rate
Figure 6-5: Performance of the recognition and fluency heuristics

as a function of memory decay rate, d. Maxima are marked with
dots. (Adapted from Schooler & Hertwig, 2005.)
forgetting can vary over a substantial range without compromising

the heuristic’s good performance. However, as d approaches −1
and there is too much forgetting (resulting in a situation in which
most cities are unrecognized), the performance of the recognition
heuristic eventually approaches chance level.
How Does Forgetting Help the Recognition Heuristic’s Performance?

Two quantities shed more light on the link between forgetting
and the recognition heuristic. The first is the proportion of com-
parisons in which the recognition heuristic can be used as the basis
for making a choice, that is, the proportion of comparisons in which
only one of the cities is recognized. In Figure 6-6, the solid line
shows that for the recognition heuristic this application rate
peaks when d equals –0.28, an intermediate level of forgetting. The
second quantity is the proportion of correct inferences made by the
recognition heuristic in those choices to which it is applicable. As
shown in Figure 6-7, this recognition validity generally increases
with the amount of forgetting, peaking when d equals −1. The per-
formance (Figure 6-5) and application rate (Figure 6-6) peak at
nearly the same forgetting rates of −0.34 and −0.28, compared to
the peak of −1 for the validity curve (Figure 6-7). So, the decay rate
of −0.34 can be thought of as the best trade-off between the effects
1

Fluency Heuristic
.80
Proportion of Comparisons
.70
.60
.50
.40
.30
.20
.10
0
−1 −0.8 −0.6 −0.4 −0.2 0
Memory Decay Rate
Figure 6-6: The application rate of the recognition heuristic (the

proportion of all comparisons in which one city is recognized but
the other is not) and of the fluency heuristic (the proportion of all
comparisons in which both cities are recognized), as a function of
memory decay rate, d. Maxima are marked with dots. (Adapted
from Schooler & Hertwig, 2005.)

Fluency Heuristic
Proportion of Correct Inferences
.90
.85
.80
.75
.70
.65
.60
.55
.50
−1 −0.8 −0.6 −0.4 −0.2 0
Memory Decay Rate
Figure 6-7: The validity of the recognition heuristic and of the flu-
ency heuristic (the proportion of correct inferences that each heu-
ristic makes when it can be applied) as a function of memory decay
rate, d. Maxima are marked with dots. (Adapted from Schooler &
Hertwig, 2005.)
158
of forgetting on application rate and validity, with the application

rate having the greater sway over performance. Thus, intermediate
amounts of forgetting increase the performance of the recognition
heuristic mostly by sharply increasing its applicability and, to a
lesser extent, by increasing its validity.
Does Forgetting Help the Fluency Heuristic?

Loss of some information—a loss that is not random but a function
of a record’s environmental history—fosters the performance of
the recognition heuristic. But is this benefit of forgetting limited
to the recognition heuristic? To find out whether an inference
strategy that makes finer distinctions than that between recognition
and nonrecognition can benefit from forgetting, we now turn to the
fluency heuristic. The recognition heuristic (and accordingly its
ACT-R implementation) relies on a binary representation of recog-
nition: An object is simply either recognized (and retrieved by
ACT-R) or unrecognized (and not retrieved). But this heuristic
essentially passes up information (for better or worse) whenever
two objects are both recognized but the record associated with one
has a higher activation than the other. The recognition heuristic
ignores this difference in activation. But could this activation dif-
ference be used to decide between the two objects? Within ACT-R,
recognition could also be assessed in a continuous fashion, namely,
in terms of how quickly an object’s record can be retrieved.
Differences in retrieval time are a proxy of differences in the sub-
symbolic quantity of activation. The fluency heuristic exploits dif-
ferences in retrieval time by inferring that if one of two objects is
more swiftly retrieved, this object has the higher value with respect
to the criterion.
The predictive accuracy of the fluency heuristic turns out to be
influenced by forgetting in much the same way as the recognition
heuristic, as shown by the upper (dashed) line in Figure 6-5. At the
same time, the fluency heuristic provides an overall additional gain
in performance above the recognition heuristic. Figure 6-6 (dashed
line) shows that the applicability of the fluency heuristic does
not benefit from forgetting but rather decreases as forgetting
increases. Part of the explanation for how the fluency heuristic does
benefit from forgetting is illustrated in Figure 6-8, which shows the
exponential function that relates a record’s activation to its retrieval
time. To appreciate the explanation, let us first point out that
neither our ACT-R model of the fluency heuristic nor actual people
can reliably discriminate between any minute difference in two
retrieval times. In fact, the difference in retrieval times needs to be
at least 100 ms for people to be able to reliably discriminate between
them (Hertwig et al., 2008). The beneficial impact of forgetting on
400
350
300
Retrieval Time (ms)
250 100ms
200
150
100 100ms
50
small large
0
1.0 1.5 2.0 2.5 3.0 3.5 4.0
Activation
Figure 6-8: The relationship between a memory record’s activation

and its retrieval time. (Adapted from Schooler & Hertwig, 2005.)
the fluency heuristic is related to this just noticeable difference

(JND). Specifically, forgetting lowers the range of activations to
levels that correspond to retrieval times that can be more easily
discriminated. For illustration, consider retrieval times of 200 and
300 ms, which correspond to activations of 1.99 and 1.59, respec-
tively. For these relatively low activations, only a small difference
of 0.4 units of activation suffices to yield the 100 ms JND in retrieval
time. In contrast, the same 100 ms difference in retrieval time
between 50 and 150 ms corresponds to a difference of 1.1 units of
activation. Thus, by shifting the activation range downward, forget-
ting helps the system settle on activation levels corresponding to
retrieval times that can be more easily discriminated. In other
words, a given difference in activation at a lower range results in a
larger, more easily detected difference in retrieval time than the
same difference at a higher range. In the case of the fluency heuris-
tic, memory decay prevents the activation of (retrievable) records
from becoming saturated.
Both the recognition and the fluency heuristic can be under-
stood as means to indirectly tap the environmental frequency infor-
mation locked in the activations of records in ACT-R. These
heuristics will be effective to the extent that the chain of correla-
tions—linking the criterion values, environmental frequencies,
activations and responses—is strong. By exploring the sensitivity
of the recognition and fluency heuristics to changes in the rate
of memory decay within ACT-R, we demonstrated that forgetting

actually serves to improve the performance of these heuristics
by strengthening the middle links of the chain of correlations on
which they rely.
Do People Use the Fluency Heuristic?
Up to this point, our analysis of the fluency heuristic has been

mostly theoretical in nature. Is there empirical evidence that the
retrieval fluency is a valid indicator of environmental quantities,
and that fluency guides people’s inferences about those quantities?
To find out, we performed ecological and empirical analyses of
fluency. We first analyzed the validity of the fluency heuristic in
five real-world environments by measuring actual retrieval fluency
(as recognition speeds) and using a quantitative criterion (Hertwig
et al., 2008, Study 1): (a) the 118 U.S. cities with more than 100,000
inhabitants in 2002; (b) the 100 German companies with the high-
est revenue in 2003; (c) the 106 most successful music artists in the
United States, in terms of the cumulative U.S. sales of recordings
from 1958 to 2003; (d) the 50 richest athletes in the world in 2004;
and (e) the 100 wealthiest people in the world in 2004. The validity
of retrieval fluency in each environment was defined as the mean
proportion of pairs where the object with the smaller mean retrieval
time scored higher on the respective criterion (averaged across
40 participants, excluding pairs where the difference in mean
retrieval times was below the JND of 100 ms). In all five environ-
ments, fluency validity exceeded chance level (.50), ranging from
.66 in the cities environment to .58 in the companies and music
artists environments. In addition, fluency validity was related to
the size of the differences in mean retrieval time. Figure 6-9 shows
that there is a clear tendency, manifest across all five environments,
for larger differences to be associated with higher fluency validity.
This tendency can also be explained within the ACT-R framework:
Objects with larger criterion values tend to occur more frequently
in the environment, and thus their memory records tend to have
higher activations and be more quickly retrieved. Consequently,
large differences in retrieval times are likely to correspond to pairs
of objects in which one object has a large criterion value and the
other has a small value. For such pairs, fluency can be expected to
be quite valid. In an extensive ecological analysis of fluency, we
replicated and extended these results across more than 20 diverse
domains (Herzog & Hertwig, in press).
Thus, using fluency could lead to valid decisions—but to
what extent do people’s inferences actually agree with its use in the
fluency heuristic? Across three of the five environments listed
.75
Cities
Athletes
.70
Proportion of Correct Inferences
Companies
Music artists
Billionaires
.65
.60
.55
.50
.45
0–99 100–399 400–699 ≥700
Differences in Recognition Latencies (ms)
Figure 6-9: The validity of the fluency heuristic (the proportion

of correct inferences that the rule makes when it can be applied)
as a function of increasing differences in recognition latencies.
(Adapted from Hertwig et al., 2008.)
above, cities, companies, and music artists, we asked participants

to infer which of two objects scored higher on a quantitative dimen-
sion (Hertwig et al., 2008, Study 3). In addition, participants’
retrieval times for objects in these environments were measured.
Then, for each participant, the percentage of inferences that were in
line with the fluency heuristic (among all pairs in which both
objects were recognized) was determined. The mean accordance
with the fluency heuristic was .74, .63, and .68 in the cities, com-
panies, and music artists environments, respectively. The extent to
which people’s inferences conformed to the fluency heuristic was a
function of differences in recognition speeds, as shown in Figure
6-10, even rising to around .8 accordance when these differences
exceeded 700 ms in the cities and music artists environments. This
appears to be ecologically rational use of the fluency heuristic,
insofar as retrieval fluency is more likely to yield accurate infer-
ences with larger differences in retrieval times (Figure 6-9).
To summarize, retrieval fluency can be a valid predictor of
objective properties of the world, and to different degrees in differ-
ent environments. Moreover, we found that a large proportion of
people’s inferences conformed to the decisions made by the fluency
heuristic using this predictor. In a related analysis, Marewski and
Schooler (2011) showed that the use of the fluency heuristic appears
.90
Cities
.85 Music artists
Accordance to Fluency Heuristic
Companies
.80
.75
.70
.65
.60
.55
.50
.45
0–99 100–399 400–699 ≥700
Differences in Recognition Latencies (ms)
Figure 6-10: Proportion of decisions made in accordance with the

fluency heuristic as a function of increasing differences in recogni-
tion latencies (bars show 95% confidence intervals of proportions
aggregated across subjects). (Adapted from Hertwig et al., 2008.)
particularly pronounced when people recognize both objects but

cannot retrieve any additional cue knowledge about them.
The Importance of Forgetting
Some theorists have argued that forgetting is indispensable to

the proper working of memory. Building on the notion of bene-
ficial forgetting, we demonstrated that ecologically smart loss of
information—loss that is not random but reflects the environ-
mental history of the memory record—may not only foster memory
retrieval processes but may also boost the performance of inferen-
tial heuristics that exploit mnemonic information such as recogni-
tion and retrieval fluency. If human recognition memory were so
lossless and exquisitely sensitive to novelty that it treated as
unrecognized only those objects and events that one has truly never
seen before (and not also those that were experienced long ago and
since forgotten), then extensive experience could eventually render
the recognition heuristic inapplicable (see Todd & Kirby, 2001).
By implementing inferential heuristics within an existing cog-
nitive architecture, we were able to analyze in detail how para-
meters of memory such as information decay affect inferential
accuracy.
This analysis also revealed two distinct reasons for why forget-
ting and heuristics can work in tandem. In the case of the recogni-
tion heuristic, intermediate amounts of forgetting maintain the
systematic partial ignorance on which the heuristic relies, increas-
ing the probability that it correctly picks the higher criterion
object. In the case of the fluency heuristic, intermediate amounts of
forgetting boost the heuristic’s performance by maintaining activa-
tion levels corresponding to retrieval latencies that can be more
easily discriminated. In what follows, we discuss how the
fluency heuristic relates to the availability heuristic and whether it
is worthwhile to maintain the distinction between the fluency and
recognition heuristics, and we conclude by examining whether for-
getting plausibly could have evolved to serve heuristic inference.
The Fluency and Availability Heuristics: Old Wine in a New Bottle?

The fluency heuristic feeds on environmental frequencies of occur-
rences that are related to criterion variables such as population
size. It thus can be seen as another ecologically rational cognitive
strategy belonging to the adaptive toolbox of fast and frugal heuris-
tics (Gigerenzer et al., 1999). The fluency heuristic also shares an
important property with one of the three major heuristics investi-
gated in the heuristics-and-biases research program, namely, avail-
ability (Kahneman, Slovic, & Tversky, 1982): Both the availability
heuristic and the fluency heuristic capitalize on a subjective sense
of memory fluency. Tversky and Kahneman (1973) suggested that
people using the availability heuristic assess the probability and
the frequency of events on the basis of the ease or the frequency
with which relevant instances of those events can be retrieved from
memory. Thus, they proposed two notions of availability (Tversky
& Kahneman, 1973, pp. 208, 210), one that depends on the actual
frequencies of instances retrieved and one that depends on the ease
with which the operation of retrieval can be performed (for more on
the distinction between these two notions of availability, see
Hertwig, Pachur, & Kurzenhäuser, 2005, and Sedlmeier, Hertwig, &
Gigerenzer, 1998).
If one understands availability to mean ease of retrieval, then the
question arises of how ease should be measured. Sedlmeier et al.
(1998), for example, proposed measuring ease in terms of speed of
retrieval of an instance (e.g., words with a letter “r” in the third
position). Interpreted in this way, availability becomes nearly
interchangeable with fluency as we use it, although the fluency
heuristic retrieves the event itself (e.g., the name of a disease),
whereas the availability heuristic retrieves instances from the class
of events (e.g., people who died of a heart attack vs. people who
died of lung cancer to estimate which of the two diseases has a
higher mortality rate). We have no objection to the idea that the flu-
ency heuristic falls under the broad rubric of availability. In fact,
we believe that our implementation of the fluency heuristic offers a
definition of availability that interprets the heuristic as an ecologi-
cally rational strategy by rooting fluency in the informational struc-
ture of the environment. This precise formulation transcends the
criticism that availability has been only vaguely sketched (e.g.,
Fiedler, 1983; Gigerenzer & Goldstein, 1996; Lopes & Oden, 1991).
In the end, how one labels the heuristic that we have called fluency
is immaterial because, as Hintzman (1990) observed, “the explana-
tory burden is carried by the nature of the proposed mechanisms
and their interactions, not by what they are called” (p. 121).
What Came First: The Forgetting or the Heuristics?

One interpretation of the beneficial effect of forgetting as identified
here is that the memory system loses information at the rate that it
does in order to boost the performance of the recognition and flu-
ency heuristics and perhaps other heuristics. One could even
hypothesize that a beneficial amount of forgetting has evolved in
the cognitive architecture in the service of memory-based inference
heuristics. Though such a causal link may be possible in theory, we
doubt that evolving inferential heuristics gave rise to a degree of
forgetting that optimized their performance, because memory has
evolved in the service of multiple goals. It is therefore problematic
to argue that specific properties of human memory—for instance,
forgetting and limited short-term memory capacity—have opti-
mally evolved in the service of a single function. Although such
arguments are appealing—for an example, see Kareev’s (2000) con-
jecture that limits on working memory capacity have evolved “so as
to protect organisms from missing strong correlations and to help
them handle the daunting tasks of induction” (p. 401)—they often
lack a rationale for assuming that the function in question has pri-
ority over others. We find it more plausible that the recognition
heuristic, the fluency heuristic, and perhaps other heuristics have
arisen over phylogenetic or ontogenetic time to exploit the existing
forgetting dynamics of memory. If this were true, a different set
of properties of memory (e.g., different forgetting functions) could
have given rise to a different suite of heuristics.
Conclusion
Analyses of cognitive limits, a well-studied topic in psychology,

are usually underpinned by the assumption that these limits, such
as forgetting, pose a serious liability. In contrast, we demonstrated
that forgetting might facilitate human inference by strengthening

the chain of correlations that link the decision criteria, environ-
mental frequencies, memory record, activations, and the speed
and accuracy of fundamental memory retrieval processes with the
decision that is ultimately made. The recognition and fluency
heuristics, we argued, use the characteristics of basic retrieval pro-
cesses as a means to indirectly tap the environmental frequency
information locked in memory activations. In light of the growing
collection of beneficial effects ascribed to cognitive limits (see
Hertwig & Todd, 2003), we believe it timely to reconsider their often
exclusively negative status and to investigate which limits may
have evolved to foster which cognitive processes and which pro-
cesses may have evolved to exploit specific limits—as we propose
in the case of heuristic inference and forgetting.
7
How Groups Use Partial Ignorance to
Make Good Decisions
Konstantinos V. Katsikopoulos
Torsten Reimer
The most significant fact about this [market] system is the

economy of knowledge with which it operates, or how
little the individual participants need to know in order to
be able to take the right action.
Friedrich von Hayek
I magine a three-member search committee that has to decide

which of two candidates to invite for a faculty interview. The com-
mittee operates as follows: First, each member individually selects
a favored candidate. Then, all three members attempt to reach con-
sensus. The two candidates are Ms. Unknown and Ms. Known,
and there are funds to invite just one of them. Two committee mem-
bers are familiar with both candidates, and each proposes that
Ms. Unknown be invited. The third committee member, however,
has never heard of Ms. Unknown. Despite being slightly embar-
rassed, he admits his ignorance and is relieved when his colleagues
point out that it might mean something: If he has never heard of Ms.
Unknown, she might well not be as good for the job as Ms. Known,
of whom he has heard. How do they settle on whom to pick? The
majority rule specifies Ms. Unknown. But could it be possible that
a committee would take the potential wisdom of ignorance into
account and put the votes together differently? And if so, would
that also be wise?
To find out, we developed a mathematical model of group deci-
sion making applicable to situations such as the search committee
example and tested it with groups of people interacting in an
experimental setting. The model consists of two components. The
first, for individual inference, is the recognition heuristic (Goldstein
& Gigerenzer, 1999, 2002). The second component combines the
167
individual inferences and captures the impact of different group

members. For this component, we test several group decision rules,
starting with the majority rule studied in group psychology
(Sorkin, West, & Robinson, 1998) and proceeding to new lexico-
graphic rules that take name recognition into account.
We first use analysis to investigate the ecological rationality
of different group rules with respect to two aspects of environmen-
tal structure: individual knowledge and group make-up. We study
how the validity of recognition and of further knowledge affect
the accuracy of the rules. In addition, we study how the composi-
tion of the group—in terms of the number of members who use
recognition versus the number who use other cues—affects rule
accuracy. We then test experimentally whether real groups pay
attention to lack of recognition when it is ecologically rational to
do so (Reimer & Katsikopoulos, 2004). The exercise allows us a
glimpse of how it can be that groups succeed in reasoning simply
and smartly, by considering who knows—or does not know—what.
How Individuals Can Exploit Recognition
Consider the task of an individual wanting to find out which of

two objects has a larger value on a quantitative dimension of inter-
est, or criterion. The prototypical example comes from the domain
of geography: How can an individual decide which of two cities
has a larger population? Different cues can be used to infer this,
such as the presence of a university or a soccer team. Even more
simply, mere recognition can be used in conjunction with a simple
heuristic.
The recognition heuristic for making inferences from memory
follows this rule: If one of the two objects is recognized and the
other is not, then infer that the recognized object has the higher
criterion value. This heuristic is likely to be selected from the
adaptive toolbox when it is ecologically rational, that is, when the
recognition validity is substantial (see chapter 5). For example,
someone who has not heard of Modena but has heard of Milan
would infer that Milan has more inhabitants, which happens to be
true. Goldstein and Gigerenzer (1999) found that people appear to
use the recognition heuristic in this task. In 90% of the inferences
in which an individual recognized only one city, the individual
inferred that the recognized city was more populous. People even
seem to stick to the recognition heuristic when they receive addi-
tional information on a high-validity cue that is in conflict with the
recognition heuristic: Participants in Goldstein and Gigerenzer’s
study were first provided with examples of large cities with a soccer
team and of small cities with no soccer team. Thereafter, in 92% of
HOW GROUPS USE PARTIAL IGNORANCE TO MAKE GOOD DECISIONS 169
the comparisons, they still inferred that a recognized city with no

soccer team was larger than an unrecognized city with a soccer
team. Thus the recognition information was not compensated
(or decreased in impact) by the high-validity soccer-team cue (see
chapter 5).
How frequently can an individual use the recognition heuristic
in a given situation? Assume that an individual recognizes
n—called the recognition frequency—out of a total of N objects. The
recognition heuristic can only be applied to those pairs of objects
where only one object is recognized—an event that occurs with
probability r(n) = 2n(N–n)/[N(N–1)]. How do people make infer-
ences when neither or both objects are recognized? Goldstein and
Gigerenzer (1999) proposed that individuals guess when they do
not recognize either object—this event occurs with probability g(n)
= (N–n)(N–n–1)/[N(N–1)]—and use their general knowledge when
they recognize both objects—this event occurs with probability
k(n) = n(n–1)/[N(N–1)]. Note that it is assumed that individuals do
not use any further cue knowledge (about the recognized object)
when the recognition cue discriminates between the objects. As
Goldstein and Gigerenzer showed, under certain circumstances
this pattern of applicability can produce a surprising effect in which
individuals with less knowledge can be more accurate than indi-
viduals knowing more. We present an example of this less-is-more
effect for individuals, and we will see it reappear later among
groups.
Goldstein and Gigerenzer (2002) introduced three studious sis-
ters from Paris who, as part of their rather intense geography train-
ing, perform all pair-wise population comparisons between the
100 largest German cities. Each girl has accuracy α—called the
recognition validity—when using the recognition heuristic, accu-
racy β—called the knowledge validity—when using other knowl-
edge, and accuracy ½ when guessing. We assume that α, β > ½. A
sister who recognizes n cities has an accuracy of r(n)α + k(n)β +
g(n)(½). The only variable in which the girls differ is the number of
German cities, n, they recognize—the youngest sister recognizes
none, the middle sister 50, and the eldest sister all 100 cities.
We set α = .8 and β = .6 and predict the accuracy of each girl.
The younger sister guesses for every pair of German cities and is
correct 50% of the time. The elder sister uses her extra knowledge
in all pairs and has an accuracy of .6. The middle sister is the
only one who can use the recognition heuristic and her accuracy
can be computed as r(50)α + k(50)β + g(50)(½) which is .68. That
is, a less-is-more effect is predicted: The recognition frequency
n1 = 50 leads to greater accuracy than a larger recognition frequency,
n2 = 100. The above prediction refers to the special case of n2 = N,
where having full recognition leads to less accuracy than some
range of partial recognition. We call this surprising condition the

strong less-is-more effect, in contrast to the weak less-is-more effect,
for any n2 > n1, where someone who recognizes, say, two-thirds of
the alternatives does worse than someone who recognizes half.
We also define the prevalence, p, of less-is-more effects as the
proportion of pairs (n1, n2) with n1 ≠ n2 for which a less-is-more
effect occurs. When the recognition and knowledge validities are
known, the prevalence of the effect across the whole range of recog-
nition frequency can be determined by simple enumeration. For
example, if α = .8 and β = .6, then p = 33%. To see how this is
determined, imagine a class of 101 Parisian girls where the first girl
recognizes zero cities, the second girl one city, and so on, with the
last girl recognizing all 100 cities. They all take the monstrous quiz
consisting of all 4,950 city population comparisons. Think of all
possible pairs of girls and ask if the girl recognizing more cities in
the pair will get the higher grade. The prevalence value of p = 33%
indicates that in one-third of the pairs the girl who gets a higher
grade is the one who recognizes fewer cities! Of course, different
amounts of ignorance give different amounts of benefit. For exam-
ple, the girl who recognizes half of the cities is much better off than
the girl who recognizes none.
In Figure 7-1 we show the whole individual accuracy curve for
n ranging from 0 to 100 and for α = .8 and β = .6 (see Goldstein &
100
95 β = 1.0
Individual Accuracy (% Correct)
90 β = .9
85
β = .8
80
75 β = .7
70
β = .6
65
60 β = .5
55
50
0 10 20 30 40 50 60 70 80 90 100
Number of Objects Recognized (n)
Figure 7-1: Predicted individual accuracy as a function of the

number of objects recognized for different levels of knowledge
validity β and recognition validity α = .8. The curve is concave for
α > β and increasing for α ≤ β.
Gigerenzer, 2002). There are five more curves in this figure

corresponding to different values of β. As β increases from .5 to
1.0, the shape of the accuracy curve changes in an orderly way,
from concave to monotonically increasing. The bottom three curves
show a strong less-is-more effect, because the accuracy in the
full (100%) recognition situation of each is below the accuracy for
some lower levels of recognition.
We can be more precise about just when the less-is-more effect
occurs. In particular, we can prove the following result given that
α and β are both independent of n (Goldstein & Gigerenzer, 2002;
Reimer & Katsikopoulos, 2004):
Result 1: The prevalence of the less-is-more effect is larger

than zero if and only if the recognition validity is larger than
the knowledge validity (α > β). The prevalence increases as
either the recognition validity increases or as the knowledge
validity decreases.
The less-is-more effect has also been empirically observed, in

tasks from general knowledge quizzes to Wimbledon tennis match
predictions (chapter 5). For example, Goldstein and Gigerenzer
(1999) asked American students to compare the populations of all
pairs of the 22 largest U.S. cities and all pairs of the 22 largest
German cities. The students recognized all U.S. cities and only
about half of the German cities. But accuracy was larger on the
German cities (73%) than on the U.S. cities (71%). The size of this
less-is-more effect can be precisely predicted (e.g., as shown in
Figure 7-1) given observed values of α and β. Note that this is a
point prediction that could be disconfirmed by an observed less-is-
more effect that is too large as well as by one that is too small.
At first glance, less-is-more effects appear unbelievable. One
might argue that whatever reasoning can be done with less data
can also be done with more data. Logically this sounds true, but
psychologically it may not be. Different amounts of information
might allow, or even promote, different cognitive processing. This
is what happens when partial ignorance fosters the use of recogni-
tion (and the recognition heuristic), which in the real world can be
more accurate than other kinds of knowledge.
Rules for Modeling the Impact of Group Members
The recognition heuristic models how individuals can exploit

their partial lack of knowledge. But much decision making is done
by groups rather than individuals. Can a number of people inter-
acting to make an inference also capitalize on their systematic
ignorance to improve their accuracy? To develop the second build-

ing block of our group model, we focus on inference tasks with
two alternatives, such as the search committee example or the
city population comparison. Such tasks have been studied a great
deal by group psychologists (Hinsz, Tindale, & Vollrath, 1997). It
has been found that the rules that groups use for combining
individual inferences depend on task characteristics (Davis, 1992).
If a task has a correct solution that can be demonstrated in a dis-
cussion, as in an arithmetic problem, group behavior often follows
a truth-wins scheme in which the group adopts the answer of one
member who is correct and can demonstrate or prove it. In contrast,
in tasks with solutions for which correctness cannot be demon-
strated through discussion, group behavior can often be better
described by some type of a majority rule that adopts the most
common answer (Gigone & Hastie, 1997; Laughlin & Ellis, 1986;
Sorkin et al., 1998). Because it cannot really be “proven” (without
looking up the answer) in a group discussion which of two cities is
more populous, we assume that in our task, groups would combine
individual decisions through some kind of majority rule.
We can construct a number of majority combination rules that
model in different ways the impact of those group members who
use the recognition heuristic and those who use other knowledge.
Here we introduce those rules, and in the next section, we will test
their performance and analytically check if and when each rule
predicts less-is-more effects.
For simplicity, we state the rules without guessing. That is, mem-
bers who guess, according to our model of individuals, are not
considered in these rules. Furthermore, the rules do not make pre-
dictions when there is a tie among the voters. The motivation for
both restrictions is that, when evaluating the rules on empirical
data, we want to measure the predictive accuracy of the rules with-
out the influence of chance.
In the following rules, we refer to the object inferred by the group
to have the larger criterion value as the group choice. Also, the size
of the smallest majority of a group with m members equals (m+1)/2
if m is odd and (m/2 + 1) if m is even. That is, the (minimal) size of
the majority of a three-member group is two people, and so on.
Majority rule: The group choice is the object inferred to have

the larger criterion value by the majority of group members.
In modern Western societies, the simple majority rule is well

known and often used. The rule has also been extensively studied
and there are many arguments for using it, such as fairness (Hastie
& Kameda, 2005). But it is not always used. The important ques-
tion from the perspective of ecological rationality is, in what
environments is it a reasonable rule?
In the search committee example, the simple majority rule

would lead the committee to invite Ms. Unknown. But it may be
that a remark such as “I have never heard of this applicant” goes a
long way in eliminating a candidate, even if that candidate is sup-
ported by the majority. We next develop rules that give prominence
to members who partially lack recognition.
Recognition-based majority rule: The group choice is deter-

mined by the simple majority rule applied to those group
members who can use the recognition heuristic.
If the search committee in our initial example applied this rule,

it would invite Ms. Known because she is the candidate selected
by those who can use the recognition heuristic (here, just one com-
mittee member). Thus, this rule can predict that a minority trumps
a majority. But it does not specify what to do if recognition alone
cannot be used. For this reason, we also tested the following
lexicographic rule where the group first attempts to combine the
inferences of those members who use the recognition heuristic
and then if that is not possible, to combine the inferences of those
members who use knowledge. The rule is lexicographic in that it
considers pieces of information in a strict order and stops searching
for further information as soon as a decision can be reached, akin
to the take-the-best heuristic for individual decision making dis-
cussed in other chapters.
Recognition-first lexicographic rule: If there are members

who can use the recognition heuristic, the group uses the
recognition-based majority rule. If no members can use the
recognition heuristic (or in the rare case of recognition-based
ties), but there are members who can use general knowledge,
the group choice is determined by the simple majority rule
applied to those group members.
Note, however, that just as for the majority rule, this rule does
not necessarily describe the entire process leading to a group deci-
sion. For example, a minority may speak up before the majority
finally overwhelms them, and similarly, people who cannot use the
recognition heuristic may do the same before name recognition
ultimately gets its way.
We now propose two rules that assume that members who can
use their knowledge are more influential in the combination of
inferences than members who can use the recognition heuristic.
Knowledge-based majority rule: The group choice is deter-

mined by the simple majority rule applied to those group
members who can use general knowledge.
Knowledge-first lexicographic rule: If there are members who

can use general knowledge, the group uses the knowledge-based
majority rule. If no members can use general knowledge (or in
the case of knowledge-based ties), but there are members who
can use the recognition heuristic, the group choice is determined
by the simple majority rule applied to those group members.
In sum, beyond the simple majority rule, we developed four vari-

ants: two restricted majority rules and two lexicographic rules.
None of the five rules uses any free parameters. All except the
simple majority rule are noncompensatory in the sense that some
particular members’ votes cannot be traded off with other mem-
bers’ votes—as a consequence, they predict that just one individual
can outvote a majority. They differ, however, in which individuals
are assumed to have a larger influence in the decision process. We
will now analytically determine what happens when groups of
individuals use these decision rules, seeing which are more accu-
rate under different distributions of information across individuals,
before examining which rules are used by real people put together
into groups.
How Accurate Are the Group Decision Rules?
We can analytically derive predictions of the group accuracy for

the five decision rules if each group is assumed to satisfy assump-
tions of homogeneity and independence. That is, we assume that
the values of α, β, and n are constant across all members of a group.
And we assume that the recognition and inference processes of
each member are independent of these processes for other mem-
bers: Whether one member recognizes a city or not does not say
anything about whether other members recognize this city, and
which one of two cities one member infers to be larger does not say
anything about which one of the cities another member infers to
be larger.
Basic probability theory can be used for deriving the predictions
of the rules (Reimer & Katsikopoulos, 2004). For all rules, we first
determine the distribution of the number of correct votes (individ-
ual decisions) given values of α, β, and n for a group of a particular
size. The number of correct votes is binomially distributed. In gen-
eral, a binomial random variable counts the number of times a
target event out of two possible events occurred in a sequence of
independent trials. The parameters of a binomial variable are the
number of trials and the probability of obtaining the target event on
each trial. For example, the number of times a fair coin lands
“heads” when flipped 10 times is binomially distributed with
parameters 10 and .5. To determine the distribution of the number

of correct votes, the number of trials equals the number of group
members and the probability of a correct vote equals individual
accuracy. Group accuracy, which equals the probability that at least
the majority of votes are correct, is a sum of probabilities involving
this binomial variable (see Reimer & Katsikopoulos, 2004, for
details).
There are a couple of subtleties in the above: Because of the
assumed homogeneity, individual accuracy is constant across
group members. But this does not mean that, for a given pair of
objects, all group members make the same correct decision or the
same wrong one. Rather, because of independence, members in
general can recognize different objects and make different deci-
sions; thus, some members are correct and some are wrong. It is to
this distribution of correct and wrong answers that the group rules
are applied.
Skipping the remaining technicalities, we present some intu-
itions for the curves of group accuracy as a function of recognition
frequency for the different rules applied to the city comparison
task in Figure 7-2. All of the rules use guessing to break any ties
between cities. We model three-member groups where all individuals
100
95 Recognition-first Rule
Recognition-based Majority Rule
Group Accuracy (% Correct)
90 Knowledge-first Rule
Simple Majority Rule
85 Knowledge-based Rule
80
75
70
65
60
55
50
0 10 20 30 40 50 60 70 80 90 100
Number of Objects Recognized (n)
Figure 7-2: Predicted accuracy of three-member groups using dif-

ferent decision rules, as a function of the number of objects rec-
ognized, n. All members in a group have α = .8 and β = .6 and
the same n. All rules exhibit a strong less-is-more effect except
for the knowledge-based majority rule. (Adapted from Reimer &
Katsikopoulos, 2004.)
have α = .8 and β = .6. Imagine that there are 101 triplets of girls,
each triplet with its own n, and each girl in the triplet
recognizing n cities. Note that for n = 0 the predictions of all rules
coincide because no city is recognized by any sister in the triplet
and the group guesses on all pairs. For n = 100, the predictions
of all but the recognition-based majority rule coincide because all
cities are recognized by all sisters and the group ends up choosing
the knowledge-based majority. The recognition-based majority
rule falls behind in accuracy in this situation because it guesses on
all pairs.
The first thing we note in Figure 7-2 is that a strong less-is-more
effect is predicted for all rules save the knowledge-based majority
rule. Furthermore, the effect is more pronounced than in the indi-
vidual case (e.g., the β = .6 line in Figure 7-1) in the sense that
there is more accuracy gained at the peak of the curve compared to
the point of full recognition at n = 100. While the middle sister
individually was more accurate than the eldest sister by 8 percent-
age points, if triplets use the simple majority rule, the middle trip-
let is more accurate than the eldest triplet by 10 percentage points.
The difference increases to 14 percentage points for the recogni-
tion-first rule. Partially ignorant groups thus have it even better
than partially ignorant individuals!
This finding is an illustration of a statistical theorem, the
so-called Condorcet jury theorem from the inventor of “social math-
ematics,” Marquis de Condorcet (1785; Groffman & Owen, 1986).
This theorem states that the accuracy of a majority increases with
the number of voters when voters are accurate more than half of
the time. Condorcet presented this statement amidst the French
revolution but it was not formally proven until the second half of
the 20th century. Both Condorcet and modern writers have since
seen the jury theorem, and its extensions, as a formal justification
of using the majority rule.
In fact there are more benefits of belonging to a group. Whenever
less-is-more effects occur for groups, they are at least as prevalent
as for individuals: Recall that when α = .8 and β = .6, p = 33% for
individuals. We found the same prevalence for the majority rule.
This is not a coincidence but can be deduced: Under the majority
rule, group accuracy increases with the number of individuals who
are correct, which in turn is more likely to increase with individual
accuracy. Thus the shape of the group and individual curves is the
same and this guarantees equal prevalence. For other group rules
producing a less-is-more effect, this effect can be more prevalent
than for the individuals in the group, so that the group in those
cases essentially amplifies the benefits of ignorance.
The prevalence of the less-is-more effect increases when mem-
bers who use the more accurate (α = .8) recognition heuristic are
given more influence: Prevalence equals 42% for the recognition-

first rule and 50% for the recognition-based majority rule. But, sur-
prisingly, the prevalence peaks, at 64%, for the knowledge-first
rule. Why is this? How can it be that a rule that first looks for people
who use general knowledge ends up rewarding ignorance the
most? An intuitive answer can be given via Figure 7-2. What
appears to make the difference in prevalence between the two lexi-
cographic rules is that the accuracy of the knowledge-first rule
decreases for n between the low 30s and about 50, while the accu-
racy of the recognition-first rule increases in that range. Why?
Forget, for a moment, the influence of guessing. As n rises from the
low 30s to about 50, the probability that a member can use the recog-
nition heuristic, r(n) = 2n(N–n)/[N(N–1)], rises more steeply than the
probability that a member has to use general knowledge, k(n) =
n(n–1)/[N(N–1)]. It thus becomes more likely that there is a larger
subgroup of individuals who can use the recognition heuristic and a
smaller subgroup using other knowledge. Based on the Condorcet
jury theorem, this increases the accuracy of the recognition-based
majority used in the recognition-first rule and decreases the accuracy
of the knowledge-based majority used in the knowledge-first rule.
The formal results we have on the predictions of less-is-more
effects in groups can be summarized as follows (see Reimer &
Katsikopoulos, 2004):
Result 2: In homogeneous groups where the recognition and

inference processes of members are independent given the
values of the criterion, the following statements hold: (a) If the
group uses the simple majority rule, the strong less-is-more
effect is predicted if and only if the recognition validity α is
larger than the knowledge validity β; furthermore, the preva-
lence of the effect equals the prevalence of the effect for one
member. (b) If the group uses the recognition-based majority
rule, the strong less-is-more effect is predicted for all values of
recognition validity and knowledge validity; furthermore the
prevalence of the effect quickly converges to one-half as the
number of objects increases. (c) If the group uses the knowl-
edge-based majority rule, the less-is-more effect never occurs,
that is, has zero prevalence for all values of recognition and
knowledge validity.
What Rules Do Groups of People Use?
The theoretical predictions about the effects of using recognition

in groups are clear but do they match what happens in reality?
Would a partially ignorant recognition heuristic user have a special

say in the search committee example? To test this, we gave the
city population comparison task to 28 groups of three students
each (see Reimer & Katsikopoulos, 2004). We used this task because
individuals facing it have been found to make decisions in accor-
dance with the recognition heuristic, allowing us to test our predic-
tions (Goldstein & Gigerenzer, 1999, 2002; see also chapter 5).
Before creating groups, we first quizzed the participants indi-
vidually about which of 40 U.S. cities they recognized. The
responses allowed us to estimate the recognition validity α for each
individual as the proportion of correct inferences made if that
individual used the recognition heuristic for all those pairs of
cities where only one city was recognized. For example, for an indi-
vidual who recognized Detroit but not Milwaukee, the inference
that the recognition heuristic would make for this pair of cities,
Detroit, would be correct and would increase the α estimate. Then,
participants were asked to perform the population comparison task
for those pairs of cities they recognized. We estimated each indi-
vidual’s general knowledge validity β to be the proportion of cor-
rect responses for these pairs. The averages of the individual
parameter estimates in this first session were α = .72 and β = .65.
We then wanted to create groups of students so that we could see
how they came to a consensus and also to see if we could find evi-
dence for a less-is-more effect between groups. But this posed a
challenge: According to the theory developed so far, the less-is-
more effect between two groups can be properly assessed only
when α, β, and n are constant across members in each group, α and
β are equal in the two groups, and n is different in the two groups.
In practice, this turned out to be a very strong condition that could
not be met with the combinations of individuals we had. Thus, we
simplified the problem by using only 15 cities (105 pairs) and con-
sidering the average values of α, β, and n instead. That is, we put
the participants into 28 groups of three so that there would be pairs
of groups with approximately equal average α and β but different
average n. If the group with the smaller average n turned out to be
more accurate, we could interpret this as an instance of a less-is-
more effect for groups.
How Is Consensus Reached in Group Decisions?

Which rules describe how groups reach consensus better, those
assuming that members who use the recognition heuristic are
more influential or those assuming that members who use other
knowledge are more influential? Across all groups, 28 × 105 = 2,940
inferences were made. If (lack of) recognition by itself allows a
decision to be made, as described by the recognition-based majority
rule, then 90% of all group decisions are made in accordance

with this rule. In contrast, if further knowledge alone allows a
decision to be made, as described by the knowledge-based majority
rule, then only 78% of the group decisions are made in this
way. Figure 7-3 shows this result individually for each of the
28 groups. This may indicate that lack of recognition is more impor-
tant for a consensus than further knowledge.
Recognition and further knowledge by themselves may often not
allow a decision to be made. In these cases, the two information
sources can both be used; even then, guessing may be necessary.
If the rules are applied to all inferences, using guessing to break
ties, group choices overall are described best by the recognition-
first (83%), majority (82%), and knowledge-first (81%) rules, fol-
lowed by the recognition-based (74%) and the knowledge-based
(70%) majority rules. The last two rules do worst because they have
to resort to guessing more often.
To find out which members, if any, have more influence in group
decisions, we examined those cases within groups where the
inference of the subgroup using the recognition heuristic differed
from the inference of the subgroup using other knowledge. These
cases break down into three different types: (a) One member uses
100
Prediction Accuracy (% Correct)
75
50
25
0
28 Groups
Recognition-based Majority Rule Knowledge-based Majority Rule
Figure 7-3: Accuracy of group choice rules at predicting observed

group choices. Shown are percent correct predictions for the rec-
ognition-based and the knowledge-based majority rules, without
guessing, for each individual group. Groups are ordered left to
right according to the performance of the recognition-based rule.
(Adapted from Reimer & Katsikopoulos, 2004.)
the recognition heuristic and two members use other knowledge—

34 cases, (b) two members use the recognition heuristic and one
member uses other knowledge—75 cases, and (c) one member uses
recognition, one member uses other knowledge, and one member
guesses—45 cases.
Consider first the situation where two members recognized both
cities and inferred that one city is larger while the third member
recognized the other city. Surprisingly, the single individual
trumped the majority more often than not: In 59% of these cases,
the group decision matched the inference indicated by the recogni-
tion heuristic! Thanks to the recognition heuristic user, Ms. Known
might get her interview.
Now you could be wondering if a minority using knowledge
rather than recognition would be as successful. Is it just that minor-
ity subgroups are persuasive in this task? The answer is no. When
two members recognized only one city while the third member
recognized both cities and inferred that the city that was not
recognized by the other two members is larger, the group decision
matched the suggestion of the members using the recognition
heuristic in 76% of the cases. In the third type of situation we
looked at, one member recognized only one city and made the
opposite inference from a second member who recognized both
cities, while the third member did not recognize either city. Here,
groups decided in accordance with the recognition heuristic in 61%
of the cases.
Finally, we also looked at those cases where two members
recognized neither city and only one member did not guess. When
this individual used the recognition heuristic, the group decision
matched that person’s inference in 78% of these 106 cases. But in
the 27 cases where this individual used other knowledge, the
match between the group decision and that of this individual
dropped to 58%, indicating that the groups put less faith in general
knowledge than recognition knowledge.
All in all, the analysis of these 287 cases suggests that groups
seem to follow, most of the time, those group member(s) who can
use the recognition heuristic. This is an adaptive strategy in those
environments, like ours, where recognition (α = .72) is more
accurate than other knowledge (β = .65). Note, however, that we do
not know how sensitive groups are to this difference (although
there is evidence that individuals may be—see chapter 5), or
whether they typically default to following recognition.
The group decisions alone do not tell us anything about the
process by which recognition-heuristic users influence the reach-
ing of consensus. We can get some hints by inspecting the video-
taped group discussions in the minority of cases where the
members using the recognition heuristic were not more influential.
These discussions usually showed that there were exceptional

reasons for not following recognition. For example, the city of
El Paso was chosen by a single individual using the recognition
heuristic but not chosen by the group as a whole in six cases. This
may be because Germans recognize El Paso from a well-known
country song and thus attribute their recognition to this particular
source and disregard it for deciding city size (see chapter 5). And in
the situation where two members using the recognition heuristic
disagreed with one using other knowledge, some of the cases where
the suggestion of other knowledge was followed relied on argu-
ments based on confidence. For example, when two members
recognized Indianapolis but not Fresno, the third member stated
that he “was 99% sure that Fresno was more populous.” In other
cases, reasons were used instead of confidence. An individual who
recognized both Raleigh and Oklahoma City managed to convince
two members who only recognized Oklahoma City by arguing that
Raleigh is a state capital and that it is on the east coast, which is
densely populated.
Are There Less-Is-More Effects Between Groups?

To test empirically whether there are less-is-more effects at the
group level we have to make comparisons between groups—that is,
we need to identify pairs of groups with similar recognition and
knowledge validities (α and β), but different amounts of knowledge
(number of cities recognized, n), and then compare their perfor-
mance. We observed seven pairs of groups with approximately
equal average α and β but unequal average n. Two groups were
considered to have approximately equal average α and β if these
averages differed by no more than three percentage points. This
threshold was chosen as the minimum one that, when increased by
one percentage point, did not increase the number of group pairs.
That is, using a threshold of two percentage points yielded fewer
than seven pairs while using a threshold of four points also yielded
seven pairs. In Figure 7-4, we graph the accuracy of the pairs so
that a line segment connects the accuracy of the group with the
smaller average recognition frequency to the accuracy of the group
with the larger average recognition frequency. The five segments
that slope downward represent less-is-more effects. This is the first
empirical demonstration that less-is-more effects occur between
groups of judges.
How well do the five combination rules predict when the effect
will occur and when it will not? Note that Result 2 does not apply
because it assumes homogeneous groups. We used the empirical
estimates of α, β, and n for each individual and, assuming indepen-
dence, derived point predictions using the same reasoning for
100
90
Group Accuracy (% Correct)
80
70
60
50
7 8 9 10 11 12 13 14 15
Average Number of Cities Recognized by Group Members
Figure 7-4: Empirical demonstration of less-is-more effects in group

decision making. Each point represents a group; the value on the
x-axis is the average number of objects recognized by members in
the group and the value on the y-axis is group accuracy. Points
connected with a line segment correspond to pairs of groups with
approximately equal mean α and β. The five segments pointing
downward represent less-is-more effects.
deriving the idealized curves of Figure 7-2 (Reimer & Katsikopoulos,

2004). We found that the recognition-first rule and the recognition-
based majority rule correctly predicted whether the effect
occurred in all seven cases. On the other hand, the knowledge-
based majority rule and the simple majority rule made six
correct predictions, and the knowledge-first rule made five correct
predictions.
We also considered how well the rules captured the magnitude
of the effect and its inversion. For each rule, we computed the sum
of absolute values of the differences between observed and pre-
dicted accuracies in the two groups. The recognition-first lexico-
graphic rule again outperformed the other rules, with the index
equaling 12 percentage points. The index equaled 15, 19, 24, and
36 percentage points for the simple majority, recognition-based
majority, knowledge-first lexicographic, and knowledge-based
majority rule, respectively. Thus, we again found that the rules
assuming that members who use the recognition heuristic are more
influential have higher predictive accuracy than the rules assuming
that using other knowledge is more influential.
Conclusions: Groups Rely on Informative Ignorance
Individuals have been shown to rely on the recognition heuristic

when recognition validity is high (chapter 5). Furthermore, if rec-
ognition validity is higher than knowledge validity, use of the rec-
ognition heuristic can lead to a less-is-more effect. In this chapter
we presented the first analogous findings for groups. Using mathe-
matical modeling of group inference, we showed under what con-
ditions the use of recognition in groups leads to less-is-more effects,
and then in an empirical study, we demonstrated that they do occur
when groups of people make decisions together.
If individuals are partially ignorant, then groups of individuals
may well be, too. Do the dangers of ignorance multiply when people
get together to reach a joint decision? The results of this work
argue no: We found that when individual inferences are combined,
groups seem intelligently to allocate more influence to those
members who are more accurate through using the recognition heu-
ristic. Marquis de Condorcet, who early on saw the applicability of
probability theory to social science, correctly conjectured that a
group of judges is often more accurate than an average individual
judge. We propose that this may not only be due to statistical rea-
sons but may also reflect people’s simple and smart reasoning about
reaching consensus. This heuristic consensus making can have dra-
matic effects, as when a single individual trumps a better-informed
majority. But this fits with other effects thought to be surprising.
Goldstein and Gigerenzer (1999, 2002) found that recognition is
applied in a noncompensatory fashion with respect to other cues,
and we found that recognition is applied in a noncompensatory
fashion with respect to other individuals.
We also found that groups recognizing fewer cities can outper-
form groups recognizing more cities, and we showed how this
less-is-more effect (and group behavior in general) can be modeled
using simple combination rules. We need to rethink the wide-
spread claim that groups make better decisions when they have
more information.
Part IV
REDUNDANCY AND VARIABILITY
IN THE WORLD
8
Redundancy
Environment Structure That Simple Heuristics Can Exploit
Jörg Rieskamp
Anja Dieckmann
There is a variety of “means” to each end, and this variety

is changing, both variety and change being forms of vicar-
ious functioning.
Egon Brunswik
There are many ways to skin a cat.
English proverb
I magine searching for a house to buy. After comparing a few pos-

sibilities, you try to judge what would be reasonable prices for the
houses. To make this inference you could use information such as
the number of rooms, the current property taxes, the size of the
garage, and the age of each house. But if the house sellers have
themselves only recently bought the house, you could use a short-
cut for your estimate, namely, the previous selling price. This is an
example of a situation with high information redundancy: Although
the number of rooms or the size of the garage is important for evalu-
ating a house’s value, these cues might not offer much additional
information about price fairness beyond what the recent selling
price of the house can tell you.
Many decision mechanisms, such as heuristics, are adapted to
particular environments. The match between particular environ-
ment structures and heuristics can enable an individual to behave
in a computationally rapid, information-frugal, and comparatively
accurate manner in the face of environmental challenges. In this
chapter we focus on one specific aspect of environments: infor-
mation redundancy, which we argue is a main factor determining
how well simple heuristics perform compared to more complex
inference strategies. We define information redundancy in purely
statistical terms, that is, the statistical correlation between two
187
188 REDUNDANCY AND VARIABILITY IN THE WORLD
predictors or cues. In a situation with maximum redundancy,

the two predictors are perfectly correlated with each other, such
that knowing the value of one predictor allows us to infer the other
predictor’s value accurately. In a situation with minimum redun-
dancy, the correlation between the two predictors is zero, meaning
that knowing one cue’s value tells us nothing about the other—a
situation of statistical independence. (The situation of informa-
tion redundancy is different from the situation of information
conflict, where two predictors are negatively correlated with each
other and therefore tend to make opposite predictions—see Fasolo,
McClelland, & Todd, 2007.)
Here, we will explore the impact of information redundancy on
a typical forced-choice inference task requiring a decision between
two alternatives that are described by a number of dichotomous
cues (e.g., Gigerenzer & Goldstein, 1996). The main question is how
the information inherent in the different cues can best be used to
make accurate inferences. The answer to this question will, to a
large degree, depend on the redundancy of information in the envi-
ronment. As an illustration, imagine a situation in which the cues
are highly correlated with each other, where relying on the infor-
mation of a single cue is an accurate strategy. Contrast this with a
situation involving low information redundancy, in which, for
example, the most valid cue is not correlated with the other cues:
In this case, the other cues provide additional information, and
consequently, checking and using these other cues appears sensi-
ble. In this chapter, we report the results of a simulation study that
examines whether this intuitive expectation is correct, and if so, to
what extent the accuracy of different inference strategies is influ-
enced by information redundancy.
In the next section, we highlight the influence that different envi-
ronmental factors can have on strategies’ accuracies. After this, we
define several inference strategies that compete against each other
in our simulation under different environmental conditions. We
then explore the accuracies of the strategies in 27 environments
and show how information redundancy in the environment affects
those accuracies. From these results we predict that information
redundancy is an important factor that decision makers should
take into account when making inferences. This prediction is tested
in two experimental studies that we summarize before finally dis-
cussing the conclusions that can be drawn from our results.
Characteristics of Environments
People’s decision processes can be influenced by many aspects of

the decision situation. Commonly, characteristics of the decision
REDUNDANCY: STRUCTURE THAT HEURISTICS CAN EXPLOIT 189
task are differentiated from characteristics of the decision environ-

ment (e.g., Payne, Bettman, & Johnson, 1993). Examples of task
characteristics that influence decisions are costs of information
acquisition and the time available for making a decision. When
searching for information incurs high costs, it is adaptive to select
a simple heuristic that requires only minimal information to
make an inference. Likewise, it is adaptive to use a fast and frugal
heuristic when under time pressure, as the application of more
complex strategies might consume too much time. Simple heuris-
tics describe people’s inferences well when information search
costs are high (Bröder, 2000a; Newell & Shanks, 2003; Newell,
Weston & Shanks, 2003), or when inferences have to be made under
extreme time pressure (Rieskamp & Hoffrage, 1999, 2008). In gen-
eral, people appear to select their strategies adaptively depending
on task characteristics and on the basis of individual learning
(Rieskamp, 2006, 2008; Rieskamp & Otto, 2006), such that their
decision behavior can often be best predicted by strategies that per-
form well under the given circumstances.
On the other hand, characteristics of the environment, which we
focus on here, refer to the objects to be considered, their criterion
values, and the cues describing them. Examples of environment
characteristics include the distribution of the criterion values
(e.g., normal vs. J-shaped distribution—see chapter 15), the disper-
sion of cue validities (see chapter 13), information conflict (i.e.,
negatively vs. positively correlated attributes), the predictability of
the criterion (i.e., the presence of errors in predictions—see chap-
ter 3), dispersion of the objects’ criterion values, the number of
available cues, the number of objects of an environment being con-
sidered, the granularity of the cues (e.g., dichotomous vs. continu-
ous cue values), and the information redundancy of cues. Hogarth
and Karelaia (2005a) have shown with artificially created environ-
ments that information redundancy of predictors can be a key fea-
ture for predicting when a simple heuristic performs well in
comparison to more complex strategies that rely on heavy informa-
tion integration. We follow and extend this work by examining
to what extent information redundancy plays an important role in
the performance of simple heuristics in natural environments as
well, and testing experimentally whether people respond adap-
tively to information redundancy by selecting appropriate decision
strategies.
The pair-comparison inference task we focus on in this chapter
can be conceptualized as follows: The environment consists of a
population of N objects, in which each object i is characterized by
a criterion value xi. For all possible pair comparisons, the task is to
predict which object has the larger criterion value. Each object is
described by a set of M dichotomous cues. Each cue m can have a
positive or a zero cue value cm (i.e., 1 or 0). Each cue has a specific
validity. The validity vm is defined as the conditional probability
of making a correct inference based on cue m alone given that
cue m discriminates, that is, that one object has a positive cue
value (cm = 1) and the other a value of zero (cm = 0).
We are interested here in two particular characteristics of the
environment: information redundancy and the dispersion of the
validity of information. The overall redundancy in information
conveyed by the different cues in the environment can be measured
as the mean correlation between all pairs of cues assessed across
all the objects. We compute the correlation between two cues on
the basis of the object pair comparisons each cue makes; that is, we
first calculate the cue difference vector for each pair of objects
and then correlate the differences for one cue across all object
pairs with the differences for the other cue across all object pairs.
The mean correlation over all pairs of cues can vary from a high
value of 1, where all cues are the same and so are completely redun-
dant (and hence where only one cue ever needs to be considered
for an inference), to a low value of 0, where each cue provides
independent information. Environments with a positive mean cue
correlation near 1 can be called “friendly” with respect to the deci-
sion maker (Shanteau & Thomas, 2000), because the cues tend to
point toward the same decision, while environments with indepen-
dent cues and correlations nearer 0 have been called “unfriendly,”
because their cues often provide contradictory information.
The dispersion of the validity of information in an environment
can be characterized in terms of the range of the cues’ validities—
that is, how much the validities of the cues differ. For instance,
if cue validities differ widely from .55 to .95, this is a high-
dispersion environment, whereas if all cues have similar validities
between .80 and .85, this is a low-dispersion environment. The dis-
persion of cues’ validities and the cues’ redundancy in a particular
environment can both influence a strategy’s performance in that
environment. For instance, in a situation with low information
redundancy and low validity dispersion, after seeing the most
valid cue it is worthwhile to consider another cue that offers non-
redundant information and still has a validity near that of the
first cue. In contrast, in a situation with high information redun-
dancy and high validity dispersion, after seeing the most valid cue
it could be of little benefit to look up another cue that offers only
redundant and less valid information. Hogarth and Karelaia (2005a)
found that under high information redundancy and high validity
dispersion a heuristic relying on only one single cue outperformed
multiple regression in making new inferences.
In the next sections we follow a standard approach to study-
ing ecological rationality (Todd & Gigerenzer, 2000), first using
simulations to compare the performance of different heuristics

in different environment structures, replicating and extending
Hogarth and Karelaia’s (2005a) results for artificially created envi-
ronments and generalizing the findings to natural environments,
and then testing the predictions of the simulations in experiments
with human participants. We begin by adding strategies to the sim-
ulation competition, including a more challenging benchmark
model.
Strategies in the Competition
The number of strategies that can be applied to solve the inference

task we just described is large. We selected a representative sample
of strategies that vary in their computational complexity, the infor-
mation they require, and the way they process it. We consider one
strategy, take-the-best, that is noncompensatory (i.e., the decision
indicated by one cue cannot be overturned by any combination of
less valid cues), along with five compensatory strategies that inte-
grate cue information: logistic regression, naïve Bayes, Franklin’s
rule, Dawes’s rule (tallying), and take-two.
The first compensatory strategy, logistic regression (see, e.g.,
Cohen, Cohen, West, & Aiken, 2003; Menard, 2002), appears most
suitable as a benchmark model for the inference problem we are
considering. The cue values of the first object, A, minus the cue
values of the second object, B, yield the cue differences dm (which
can be −1, 0, or 1) required for the logistic regression. If all object
pair comparisons are arbitrarily composed in such a way that for
half of the comparisons, A has the higher criterion value and for the
other half, B has the higher criterion value, then the following logis-
tic regression equation can be specified:
⎛ p
k( A ) ⎞
⎟ = b1d1 + ... + bmdm + ... + bM dM + b0 ,
B
ln ⎜ (1)
k( A
⎝1− p B )⎠
where k is a particular pair comparison and bm are the regression

weights. The regression model estimates the probability p̂k of
object A having a larger criterion value than object B on the basis of
all cue differences. The left-hand side of the logistic regression is
the so-called logit form. When the value on the left-hand side of the
regression is greater than 0, it implies that the estimated probability
of A having a larger criterion value than B is greater than .5, sug-
gesting that A should be selected (and vice versa if the value is less
than 0). (Using the cutoff probability of .5 is a reasonable choice,
but in principle, other cutoff values are possible—see Neter, Kutner,
Nachtsheim, & Wasserman, 1996.) Logistic regression integrates

the information from all available cues. It takes the intercorrela-
tions between cues into account by giving low weight to redundant
information, meaning that its accuracy should be less affected by
the degree of information redundancy than the accuracies of the
following strategies that ignore correlations between cues. While in
the past ordinary linear regression has been used as a benchmark
for comparing decision mechanism performance (e.g., Gigerenzer &
Goldstein, 1999; Hogarth & Karelaia, 2005a), logistic regression
could be more appropriate for predicting a dichotomous criterion
(see Cohen et al., 2003; Menard, 2002), because standard assump-
tions of linear regression are violated (e.g., no normally distributed
residuals—see Cohen et al., 2003). Thus, logistic regression can be
regarded as a benchmark model recommended by statisticians for
solving our inference task (see also Tatsuoka, 1988).
Naïve Bayes is related to logistic regression and has often been
used for classification problems where an object has to be assigned
to one of multiple exclusive categories (e.g., Friedman, Geiger, &
Goldszmidt, 1997), and as another benchmark for inferential per-
formance (Martignon & Laskey, 1999). Naïve Bayes also predicts the
probability that one of two objects has a higher criterion value, but
it makes the simplifying assumption that cues are independent of
each other. Its prediction can be determined by the posterior odds
that A has a larger criterion value than B, given a particular cue
profile. Transformed onto a log-odds scale, the posterior odds can
be computed by adding the log odds for each cue (derived from the
cue validities), multiplied by the cue difference encountered in the
problem. Thus, naïve Bayes can be defined as a special case of
Equation 1, when the regression constant b0 is assumed to be zero
and the regression weights of Equation 1 are replaced by bm = ln(vm/
(1−vm)), where vm is the validity of cue m. Naïve Bayes therefore
also integrates the information of all available cues, but unlike the
regression model it ignores correlations between cues (which logis-
tic regression takes into account in its search for the best regression
weights). Some authors have argued that naïve Bayes should be
regarded as the “rational” model for this pair-comparison inference
task (Lee & Cummins, 2004). Our simulations will show when naïve
Bayes works well and when it does not.
Franklin’s rule is a linear strategy that first determines a score for
each object by summing up the cue values multiplied by the cor-
responding cues’ validities and then selects the object with the
highest score. Franklin’s rule can also be defined by Equation 1 by
replacing the regression weights bm with validities vm and assuming
b0 = 0. When the right-hand sum is positive, object A is selected;
otherwise, B is selected. Compared to logistic regression and naïve
Bayes, Franklin’s rule appears coarse: Even cues with a validity of

.50—which means they provide no information at all—influence
the score and the decision. Nevertheless, the computational sim-
plicity of Franklin’s rule relative to logistic regression and naïve
Bayes makes it more psychologically plausible for predicting
people’s inferences. In fact, Franklin’s rule is often a good model
for predicting people’s choices when they face low information-
processing costs (Bröder & Schiffer, 2003b; Rieskamp, 2006, 2008;
Rieskamp & Hoffrage, 2008; Rieskamp & Otto, 2006). Moreover,
the family of weighted additive models (of which Franklin’s rule
is one) is often regarded as providing the normative benchmark
for preferential choice (Payne, Bettman, & Johnson, 1988, 1993).
Because Franklin’s rule uses regular validities (as opposed to con-
ditional validities—see Martignon & Hoffrage, 1999) as weights, it
is also insensitive to correlations between cues.
A simpler linear model is Dawes’s rule, which determines a score
for each object by tallying, that is, summing up the (unit-weighted)
cue values and selecting the object with the highest score. Dawes’s
rule can also be defined by Equation 1 by replacing regression
weights with unit weights (±1) and assuming b0 = 0. When the right-
hand sum is positive, A is selected; otherwise, B is selected.
The fifth strategy in the competition, take-the-best, searches
through cues sequentially in the order of their validity. The search
is stopped as soon as one cue is found that discriminates between
objects, and take-the-best simply selects the object with the posi-
tive cue value, ignoring all other cues (in the case that no cue
discriminates, a random choice is made). In contrast to the other
four strategies, which integrate cue values, take-the-best is non-
compensatory, because a deciding cue cannot be outweighed (or
compensated for) by any combination of less valid cues. Whereas
the weighting and adding of all pieces of information is prescribed
by logistic regression, naïve Bayes, and Franklin’s rule (and unit-
weighted adding by Dawes’s rule), take-the-best relies instead
on ordered, sequential search and one-reason decision making,
rendering weighting and adding unnecessary. Its simplicity and
accuracy make take-the-best a psychologically plausible model
of people’s inferences (Gigerenzer & Goldstein, 1996, 1999).
Technically (though not psychologically), the outcome of take-the-
best’s inference process can also be generated by Equation 1, by
replacing the regression weights with noncompensatory weights,
that is, weights that do not allow cues with a lower validity to
compensate for cues with a higher validity (Martignon & Hoffrage,
1999). For instance, noncompensatory weights can be constructed
by taking 10 to the power of a cue’s order position according
to its validity, with the highest position given to the most valid cue
(e.g., 106 for the most valid of six cues). Again, when the right-hand
sum is positive, object A is selected, otherwise, B. It needs to be
stressed that this computational representation is very different
from the process predicted by take-the-best, with its sequential and
limited information search.
The sixth and last strategy in our competition, which we call
take-two, builds a bridge between the compensatory strategies
and take-the-best (cf. Dieckmann & Rieskamp, 2007): It searches for
the cues in the order of their validity and stops searching when it
finds two cues that favor the same object, which is then selected
regardless of whether, during search, a cue was found that favored
the other object (see chapter 10 on two-reason stopping). If take-two
does not find two cues that favor the same object, it selects the
object that is favored by the cue with the highest validity (or else
picks randomly if no cue discriminates). The strategy follows the
idea that people sometimes do not want to base their decision on
one single cue but nevertheless may want to limit their information
search; take-two satisfies both goals. Take-two has the interesting
property of being able to produce intransitive choices. Since the
predictions of logistic regression (and also of all the other strate-
gies) are always transitive, take-two is the only strategy in our com-
petition that cannot be represented as a special case of Equation 1.
Testing the Strategies
Measures of Strategy Performance

Given this wide range of decision strategies, how well does each do
in different environments that vary in cue redundancy and validity
dispersion? Which approach to information processing fits best
to which structures of information? We answer these questions
by computing the proportion of correct inferences a strategy pro-
duces for all possible pair comparisons of a set of objects in some
environment. When examining the strategies’ performance, we
focus on their generalization ability or robustness, that is, their abil-
ity to make good predictions for new, independent data (Myung &
Pitt, 1997). A complex model such as logistic regression, with a
large number of free parameters (i.e., the regression weights), has a
high degree of flexibility to adapt to a particular environment. It is
not surprising that a model with high flexibility achieves high accu-
racy when fitted to a sample of data (Roberts & Pashler, 2000). The
drawback of a model’s complexity lies in the problem of overfitting:
High flexibility can lead a model to adjust its parameters to noise
instead of to reliable structures in the data. Therefore, we test
generalization performance via cross-validation by selecting a pro-

portion of the objects from an environment as a training set for
estimating the strategies’ parameters, while using the remaining
objects as a test set for assessing generalization accuracy (see also
chapter 2 for more on measuring strategy robustness). The size of
the randomly selected training sets is varied between 10% and
100% of the environment, with 100% representing pure data-fitting
performance.
In addition to examining the strategies’ accuracies, we measure
how frugal they are, that is, the average percentage of cues required
for making an inference. Whereas take-the-best and take-two spec-
ify how they search for information and when information search
stops, it is not clear how information acquisition should be thought
of for logistic regression, naïve Bayes, Franklin’s rule, or Dawes’s
rule. Previous research has assumed that these models need to
search for all available cues (e.g., Czerlinski, Gigerenzer, & Goldstein,
1999). However, even for these models limited information search
is, in principle, possible. Suppose the two most valid cues out of
three cues favor one object—then the third cue cannot change the
two-cue decision made by any of these strategies. Thus, search can
be limited by assuming that these compensatory strategies stop
search when additional cues cannot change a preliminary decision
based on the acquired cues. Additionally, it is assumed that cues
are checked in the order of their validities or beta weights, respec-
tively, as this search order allows for the earliest possible stopping.
Of course, except for perhaps Dawes’s rule, this search process
might not appear psychologically very plausible, since it requires
that a preliminary decision be determined after each acquired cue
and compared to a hypothetically determined final decision.
Nevertheless, we will assume this limited search for the strategies,
since it leads to a more demanding competition among the strate-
gies regarding their frugality and enables a stronger test of the
simple heuristics’ expected frugality advantages.
The Artificial Environments

As a first step we tested how accurate and frugal the strategies
are in artificially created environments with either high or low
information redundancy and high or low cue validity dispersion.
Focusing on the artificial environments first has the advantage
that here we should observe the strongest effects of information
redundancy due to the very high and very low correlations we can
create between cues. The two factors, information redundancy
and validity dispersion, were crossed, providing four groups of
environments. In more detail, 500 artificial environments, each
consisting of 50 objects and six cues, were created for each of

the four conditions. For every cue, 25 objects had positive cue
values and 25 had cue values of zero. We aimed for an average
correlation as high as possible for the high-, and as close as possible
to 0 for the low-redundancy environments. For the low-dispersion
condition, we aimed for cue validities ranging between .62 and .82
with an equal validity difference of .04 between the cues. For the
high-dispersion condition, we aimed for cue validities ranging
between .54 and .89 with an equal validity difference of .07 between
the cues. The environments were constructed by first randomly
distributing cue values to the objects. Thereafter, the environ-
ments were modified repeatedly through two phases of many
steps. First, in every step, two randomly selected cue values of
two objects for the same cue were interchanged and we checked
whether this moved the validities toward the desired values. If it
did, the modified environment was taken as a new starting point;
otherwise we kept the previous environment. This iterative process
produced the required cue validities. Thereafter, in the second
phase, in each step, two randomly selected cue values of two objects
for the same cue were interchanged and we checked whether
this moved the average correlation between cues in the desired
direction while keeping the cue validities within an allowed devia-
tion of ±.01. If the change was successful, the modified environ-
ment was taken as a new starting point. This iterative process
was repeated until we did not achieve any improvement over
100 steps, at which point the final environment was kept. Table 8-1
summarizes the environments we created. Given the cue validities,
Table 8-1: Average Cue Validities and Average Correlation

Between Cues in the Four Groups of Artificial Environments
High information Low information
redundancy redundancy
High Low High Low
dispersion dispersion dispersion dispersion
Validities of cues
First cue .89 .82 .89 .81
Second cue .82 .78 .82 .77
Third cue .76 .74 .75 .73
Fourth cue .69 .70 .68 .69
Fifth cue .62 .66 .61 .65
Sixth cue .56 .62 .54 .61
Average Pearson
correlation r = .51 r = .51 r = .01 r = .01
between cues
we were able to achieve average cue correlations of about .5 in

the high-redundancy environments and of about 0 in the low-
redundancy environments, so our artificial environments did
embody the particular combinations of characteristics we sought.
How Information Redundancy Affects Strategies’ Inferences
Accuracy of Inferences
Figures 8-1 and 8-2 show the performance (fitting and generalization)
of the different strategies in our artificial high- and low-redundancy
environments, respectively. For each redundancy condition, we
plot the average percentage of correct inferences in the test set made
by the different strategies with (a) low dispersion, and (b) high dis-
persion of the cue validities, for training sets with sizes varying
between 10% and 100% of the environment. The 100% sample
shows the strategies’ accuracies when trained on the entire envi-
ronment, that is, the pure data-fitting case.
Consistent with previous results (e.g., Czerlinski et al., 1999;
Gigerenzer & Goldstein, 1996), take-the-best, the simplest strategy
under consideration, performs very well under high information
redundancy. In particular, in the condition with low dispersion
of the cue validities, take-the-best is the best strategy for all except
the 80–100% training sizes (see Figure 8-1a). Logistic regression,
the benchmark model, is strongly influenced by the size of the
training set: In this condition, if the set is relatively small, less than
40% of the environment, its accuracy drops substantially below
the average accuracy of the other strategies, apparently over-
fitting. However, logistic regression’s accuracy increases with larger
training sets.
With high dispersion of the cue validities, though, logistic
regression substantially outperforms the other strategies (see
Figure 8-1b). Take-the-best is still the second best strategy. The
remaining four compensatory strategies perform at relatively simi-
lar levels. The more complex strategies, naïve Bayes and Franklin’s
rule, outperform the simpler strategies, take-two and Dawes’s rule,
but not by much.
For the low redundancy environments, where the different cues
convey different (independent) information, the results look very
different: Take-the-best is now outperformed by the compensatory
strategies. In particular, when the cue validities have low disper-
sion, take-the-best performs poorly (see Figure 8-2a). However,
when the dispersion of cue validities is high, take-the-best still
reaches accuracies close to those of Dawes’s rule and take-two
(see Figure 8-2b). Logistic regression’s accuracy is again strongly
(a) 85
Percentage of Correct Inferences 80
75
70
65
Take-the-best
Take-two
60 Dawes’s Rule
Franklin’s Rule
55 Naïve Bayes
Logistic Regression
50
10 20 30 40 50 60 70 80 90 100
Size of Training Set (%)
(b) 85
80
Percentage of Correct Inferences
75
70
65
Take-the-best
Take-two
60 Dawes’s Rule
Franklin’s Rule
55 Naïve Bayes
Logistic Regression
50
10 20 30 40 50 60 70 80 90 100
Figure 8-1: Strategies’ accuracies versus training set size in high

information redundancy environments for the test sets in (a) the low
validity dispersion case, and (b) the high validity dispersion case.
At 100%, the accuracy for the training set is provided (i.e., fitting
performance).
influenced by the size of the training set; when it is relatively large

(40% or above) the model is able to generalize well, reaching the
highest accuracy of all the strategies. Franklin’s rule and naïve
Bayes again, on average, outperform take-two and Dawes’s rule.
Franklin’s rule reaches a relatively high accuracy compared to
logistic regression, particularly when the dispersion of cue validi-
ties is low (see Figure 8-2a).
(a) 95
Percentage of Correct Inferences 90
85
80
75
Take-the-best
Take-two
70 Dawes’s Rule
Franklin’s Rule
65 Naïve Bayes
Logistic Regression
60
10 20 30 40 50 60 70 80 90 100
(b) 95
90
85
80
75 Take-the-best
Take-two
70 Dawes’s Rule
Franklin’s Rule
65 Naïve Bayes
Logistic Regression
60
10 20 30 40 50 60 70 80 90 100
Figure 8-2: Strategies’ accuracies versus training set size in low

information redundancy environments for the test sets in (a) the
low validity dispersion case, and (b) the high validity dispersion
case. At 100%, the accuracy for the training set is provided (i.e.,
fitting performance).
In sum, take-the-best achieves high accuracy especially in the

high-redundancy conditions, where it performs as well as or even
better than all compensatory strategies, with the exception of
logistic regression. Under low-redundancy conditions, when the
dispersion of the cue validities is high, take-the-best performs on a
similar level to Dawes’s rule and take-two. The only situation in
which take-the-best suffers a clear loss in accuracy compared to
other strategies is when cues are low in redundancy and have sim-
ilar validities.
The results of this simulation allow us to specify part of the
ecological rationality of take-the-best: Environments that are
characterized by high information redundancy are exploitable by,
and hence friendly to, take-the-best. But even when redundancy is
low, as long as validities are widely dispersed, take-the-best can
perform at a level close to that of compensatory strategies. In con-
trast, environments with low information redundancy and low
validity dispersion are hostile for take-the-best in comparison to
compensatory strategies. These results appear reasonable: Take-
the-best often makes an inference by relying on the information
of a highly valid cue, which leads to high accuracy relative to com-
pensatory strategies when the remaining cues do not offer much
new information anyhow. In contrast, compensatory strategies gain
an advantage in low-redundancy situations in which different cues
offer new information, particularly if take-the-best cannot rely on
high-validity cues (i.e., in the low-dispersion environment).
A compensatory strategy can do better than take-the-best when
the combined information in the cues that are not considered by
take-the-best leads to a better decision, which requires that the
weights given by the compensatory strategy to the remaining cues
allow for compensation (i.e., overruling the decision of take-the-
best). To see just how often this compensation among cues actually
happens for our benchmark logistic regression model, we calcu-
lated a compensation index, defined as the proportion of all possi-
ble pair comparisons between the objects in one environment in
which the set of weights for logistic regression (or other models)
allows for a compensation. For example, a compensation index of
10% for a particular set of cue weights says that over all possible
cue value settings with those weights, a preliminary decision that
is based on the first discriminating cue (searching through the cues
in weight order, large to small) will be compensated (overruled) in
10% of all cases by the remaining cues with smaller weights. To
put these results in perspective we first determined the theoretical
maximum value for the compensation index. To do so, we con-
structed all possible cue configurations (i.e., 26 different configura-
tions), formed all possible comparisons between them, and applied
a unit weight strategy (i.e., Dawes’s rule) to decide between them.
This procedure results in a compensation index of 27%, meaning
that no compensation will occur in 73% of all cases. Compensatory
strategies that weight cues unequally cannot achieve a higher com-
pensation index, because later cue weights (coming in order of
decreasing magnitude) will by definition be smaller than Dawes’s
rule’s equal weights and so will lead less often to compensation.
We determined the compensation index for logistic regression,
first for the high information redundancy environments across

both validity dispersion conditions: Here the regression weights
allow for compensation in only 3.5% of all possible cue configura-
tions. In comparison, in the low information redundancy condition
compensation occurred in on average 9.7% of the decisions across
both validity dispersion conditions. Thus, compensatory strategies
can take advantage of the possibility to overrule a wrong decision
of a highly valid cue by combining less valid cues only appreciably
often in the environments with low redundancy.
Strategy Frugality
Beyond accuracy, another important characteristic of a strategy is
the cost of applying it. Here we ignore computational costs and
focus only on frugality, that is, the percentage of the available
cues looked up for making an inference; this is anyhow likely to be
the most pressing cost for most decision makers (Todd, 2001).
As described above we defined limited information search for
Franklin’s rule, naïve Bayes, and logistic regression by assuming
that they look up cues in the order of their importance (i.e., validi-
ties, log odds, or regression weights), or randomly for Dawes’s rule,
and stop search when a decision on the basis of the information
acquired so far cannot be overruled by any additional information
that might yet be looked up.
Figure 8-3 shows the percentage of cues looked up by the
strategies to reach a decision. Since the strategies’ frugality did not
100
High Redundancy/Low Dispersion
High Redundancy/High Dispersion
90 Low Redundancy/Low Dispersion
Low Redundancy/High Dispersion
Percentage of Required Cues
80
70
60
50
40
30
20
10
0 Take-the-best Take-two Dawes’s Rule Franklin’s Rule Naïve Bayes Logistic
Regression
Strategies
Figure 8-3: Frugality of the six strategies in the four kinds of deci-
sion environments, in terms of percentage of cues needed to make a
decision.
differ between the training set and the test set, we only present fru-
gality based on the whole environments as samples. Take-the-best
required, on average, only 36% of the cues before reaching a deci-
sion, which is substantially less information than the compensa-
tory strategies that use on average 74% of the cues, even with the
limited information search assumed. Comparing different environ-
ment conditions, take-the-best required less information under
low information redundancy than under high information redun-
dancy. This was different for most of the compensatory strategies,
which required slightly more information under low compared
with high information redundancy.
How can these contradictory results for compensatory versus
noncompensatory strategies be explained? Under high information
redundancy, cues are positively correlated with each other such
that the cues a decision maker checks for will often support the
same object. Therefore, a second discriminating cue will very
often point to the same object as the first discriminating cue,
making it unlikely that a preliminary decision based on the cues
gathered so far could be changed by the remaining cues. Search is
therefore stopped relatively early by the search-stopping mecha-
nism we defined for compensatory strategies. But high information
redundancy also implies that when one cue does not discriminate
between two objects, a second cue is likely not to discriminate
between the objects either. Thus, take-the-best on average has to
search longer before encountering a discriminating cue under high
information redundancy than under low redundancy, where the
chance of finding a discriminating cue right after a nondiscrim-
inating cue is larger. This difference between take-the-best and
compensatory strategies provides an interesting prediction for
experimental tests: Participants favoring a compensatory strategy
should search for less information under high (versus low) infor-
mation redundancy, while participants favoring a noncompensa-
tory strategy should search for more.
Among the compensatory strategies, the simple Dawes’s rule
requires the most cues. This is not surprising since it does not give
larger weights to early cues, so they can be outvoted by later cues
right to the end. Franklin’s rule also requires many cues, in particu-
lar compared to naïve Bayes. Franklin’s rule uses the validities
as weights, which vary considerably less than in the weighting
structure used by naïve Bayes, whose high weight variation leads
it to require the least information among the compensatory strate-
gies (64%). The dispersion of the validities affects only naïve
Bayes’s frugality. This is the case because naïve Bayes’s weighting
structure depends on the validities and becomes extremely skewed
when cues with a relatively high validity exist, as is the case in the
high validity dispersion condition.
Testing the Strategies in Natural Environments
We started our analysis of the strategies’ performances with

artificially created environments so that we could amplify the pos-
sible effects of information redundancy. Having found such effects
there, we must now ask to what extent the results hold for natural
environments. To answer this question, we analyzed how informa-
tion redundancy affects the strategies’ accuracies across 27 real-
world environments, ranging from inferring a professor’s salary
to predicting gasoline consumption in the United States. Fourteen
of these environments have been used before to analyze the perfor-
mance of some of the strategies we consider (Czerlinski et al., 1999),
and now we have added 13 further environments from the domains
of computer science and economics to provide a more thorough
comparison (for an overview, see Box 8-1). For each environment,
the task was again to choose the object with the larger criterion
Box 8-1 Environments Analyzed
We examined 27 environments. The first 14 environments were also employed by

Czerlinski et al. (1999). For all environments a continuous criterion was employed. As
predictors, we used dichotomous cues. If a dichotomous cue had a missing value, it
was replaced with a positive or negative cue value, where the probability of using a
positive or negative value matched the frequency of positive and negative values for
the particular cue. For continuous cues, missing cue values were first replaced with the
mean cue value and afterward the cue was dichotomized according to the median.
Environment Description
1. Population size of Predicting the number of inhabitants of 83 German cities

German cities (Gigerenzer & Goldstein, 1996), described by the following
nine cues: soccer team in the premiere league, state capi-
tal, former East Germany, industrial belt, single character
license plate prefix, exposition/trade fair site, intercity
train station, national capital, and university
2. Dropout rate at Predicting the dropout rate at 63 Chicago high schools (Rodkin,
high schools 1995), described by the following 11 most valid cues:
attendance rate, graduation rate, percentage low-income
students, average class size, percentage white students,
percentage Asian students, average composite ACT scores in
reading, math, science, social science, and writing
(Continued )
Box 8-1: Environments Analyzed (Continued )
3. Selling prices of Predicting the selling price of 27 houses in Erie, Penn. (Narula
houses & Wellington, 1977), described by the following nine cues:
original price, number of fireplaces, current taxes, lot size,
living space, number of garage spaces, number of rooms,
number of bedrooms, and age of house
4. Salary of Predicting the salary of 52 college professors (Rice, 1995),
professors described by the following five cues: sex, highest degree,
rank, years in current rank, and year degree was earned
5. Rent for Predicting the rent per acre for 67 land units in different
farmland counties in Minnesota used for alfalfa plantations (Weis-
berg, 1985), described by the following four cues: liming
requirement, average rent for tillable land, density of dairy
cows, and proportion of farmland used as pasture
6. Lifespan of Predicting the lifespan of 58 mammals (Allison & Cicchetti,
mammals 1976), described by the following nine cues: body weight,
brain weight, slow wave sleep, paradoxical sleep, total
sleep, gestation time, predation index, sleep exposure
index, and overall danger index
7. Oxidants Predicting the number of oxidants in 30 observations in Los
Angeles (Rice, 1995), described by the following four cues:
wind speed, temperature, humidity, and insulation
8. Absorption of Predicting the amount of oxygen absorbed by dairy wastes in
oxygen 20 observations (Weisberg, 1985), described by the follow-
ing six cues: biological oxygen demand, Kjeldahl nitrogen,
total solids, total volatile solids, chemical oxygen demand,
and day of the week
9. Car accident rates Predicting the accident rate (per million vehicle miles) for
39 observed segments of highways (Weisberg, 1985),
described by the following 12 cues: federal aid interstate
highway, principal arterial highway, major arterial high-
way, length of segment, daily traffic, truck volume, speed
limit, lane width, width of outer shoulder, freeway-type
interchanges, interchanges with signals, and access point
10. Amount of rain- Predicting the amount of rainfall after cloud seeding for
fall after cloud 24 weather observations (Woodley, Simpson, Biondini, &
seeding Berkeley, 1977), described by the following six cues: action,
days after experiment, suitability for seeding, percentage
of cloud cover on day of experiment, pre-wetness, and
echo motion
11. Obesity Predicting the leg circumference at age 18 for 58 men and
women (Tuddenham & Snyder, 1954), described by the
following 11 cues: sex, weight at age 2, height at age 2,
weight at age 9, height at age 9, leg circumference at age
9, strength at age 9, weight at age 18, height at age 18,
strength at age 18, and somatotype
12. Number of Predicting the number of species for 29 Galapagos islands
species on the (Johnson & Raven, 1973), described by the following six
Galapagos Islands cues: endemics, area, elevation, distance to next island,
distance to coast, and area of adjacent island
13. Fuel Predicting the average motor fuel consumption (per person
in gallons) of the 48 contiguous United States (Weisberg,
1985), described by the following seven cues: population,
motor fuel tax, number of licensed drivers, per capita
income, miles of highway, percent of population with driv-
er’s licenses, and percent of licensed drivers
14. Homelessness Predicting the rate of homelessness in 50 U.S. cities (Tucker,
1987), described by the following six cues: percentage of
population in poverty, unemployment rate, public hous-
ing, mean temperature, vacancy rates, and population
15. Total costs of Predicting the total costs of 158 firms (Christensen & Greene,
firms 1976), described by the following seven cues: total output,
wage rate, cost share for labor, capital price index, cost
share for capital, fuel price, cost share for fuel
16. Costs of U.S. Predicting 90 observations of the costs of six different
airlines U.S. airlines (Greene, 2003), described by the following
three cues: revenue passenger miles, fuel price, and load
factor
17. Output of trans- Predicting the output of transportation firms in 25 U.S. states
portation firms (Zellner & Revankar, 1970), described by the following three
cues: capital input, labor input, and number of firms
18. People’s income Predicting the income of 100 people (Greene, 1992),
described by the following five cues: credit card applica-
tion accepted, average monthly credit card expenditure,
age, owns or rents home, and self-employed
19. U.S. Predicting total manufacturing costs for the U.S. from 25
manufacturing yearly observations (1947–1971; Berndt & Wood, 1975),
costs described by the following eight cues: capital cost share,
labor cost share, energy cost share, materials cost share,
capital price, labor price, energy price, materials price
(Continued )
20. Cost of electricity Predicting the total costs of 181 electricity producers (Ner-
producers love, 1963), described by the following seven cues: total
output, wage rate, cost share for labor, capital price index,
cost share for capital, fuel price
21. Program Predicting the effectiveness of a new teaching method pro-
effectiveness gram for performance in a later intermediate macroeco-
nomics course using 32 observations (Spector & Mazzeo,
1980) described by the following three cues: grade point
average, economic pre-test score, and participation in the
new teaching method program
22. Mileage of cars Predicting the mileage of 398 cars (Asuncion & Newman,
2007), described by the following four cues: displacement,
horsepower, weight, acceleration
23. Liver disorders Predicting the liver disorders (i.e., mean corpuscular volume)
of 345 patients (Asuncion & Newman, 2007), described by
the following five cues: alkaline phosphatase, alamine ami-
notransferase, asparate aminotransferase, gamma-glutamyl
transpeptidase, and number of half-pint equivalents of alco-
holic beverages drunk per day
24. CPU performance Predicting the relative performance of the central processing
unit (i.e., machine cycle time in nanoseconds) of 209 dif-
ferent CPUs (Asuncion & Newman, 2007), described by the
following seven cues: minimum main memory in kilobytes,
maximum main memory in kilobytes, cache memory in
kilobytes, minimum channels in units, maximum channels
in units, published relative performance, and estimated
relative performance
25. Refractivity of Predicting the refractivity of 214 different types of glass
glass (Asuncion & Newman, 2007), described by the following
six cues: sodium, magnesium, aluminum, silicon, potas-
sium, and calcium
26. Alcohol level of Predicting alcohol level of 178 kinds of wine (Asuncion &
wine Newman, 2007), described by the following 12 cues: malic
acid, ash, alkalinity of ash, magnesium, total phenols, fla-
vanoids, nonflavanoids, proanthocyanins, color intensity,
hue, OD280/OD315 of diluted wines, and proline
27. Populations of Predicting the number of inhabitants of 54 African coun-
African countries tries, described by the following seven cues: part of the
Sahel zone, area size, belongs to OPEC, media citations
in 2004, per capita income, number of inhabitants of capi-
tal, and illiteracy rate; data assembled on the basis of own
research, partly based on the World Factbook (Central Intel-
ligence Agency, 2005)
value from a pair of objects, described by several cues. Thus, we

first created all possible pair comparisons for each environment.
Again, we focused on robustness and examined strategies’ accura-
cies when trained and tested on different proportions of the envi-
ronments. We counteracted sampling biases by drawing 1,000
samples for each proportion and averaging the strategies’ accura-
cies across samples. Finally, we averaged the results across all envi-
ronments considered.
Consistent with past results, we found that logistic regression
did better than take-the-best on fitting, scoring on average 76% cor-
rect inferences versus 74% (averaged over the performance in ten
different sizes of training sets). The other strategies’ fitting accura-
cies were as follows: naïve Bayes, 73%; Franklin’s rule, 72%;
Dawes’s rule, 64%; and take-two, 64%. In contrast, when it comes
to the crucial test situation of generalizing to new independent
problems, take-the-best did better than logistic regression, with on
average 68% versus 64% correct predictions across all test sets with
different sizes. The other strategies’ generalization accuracies were:
naïve Bayes, 68%; Franklin’s rule, 67%; Dawes’s rule, 59%; and
take-two, 60%.
Did the 27 environments differ with respect to information
redundancy? For each environment we computed the average abso-
lute correlation between the cues in the training sets using 50% of
the data. The minimum average correlation observed was r = .11
and the maximum was r = .68, with an average across all environ-
ments of r = .32. Thus, in contrast to our artificial environments,
we did not observe any environment in which all cues were inde-
pendent of each other, and instead we found that the cues in natu-
ral environments were on average highly correlated. To examine
the influence of natural information redundancy on strategy perfor-
mance, we used a median split to create one group of environments
with relatively low correlations between cues (mean r = .22) and
another group of environments with relatively high correlations
between cues (mean r = .42).
How are the strategies’ accuracies affected by this information
redundancy? We focused on comparing the accuracy of take-
the-best with that of logistic regression—our benchmark model.
Figure 8-4 shows the difference between the percentage of correct
inferences by take-the-best and by logistic regression plotted against
training set size, differentiated for the environment groups with
low versus high information redundancy and for fitting and gener-
alization performance. A positive difference indicates that take-
the-best outperforms logistic regression, whereas a negative
difference means that logistic regression does better. Figure 8-4
clearly shows take-the-best’s advantage over logistic regression
when generalizing to new independent cases in the test sets. More
10
Difference in Percent Correct Inferences
Fitting: Low Redundancy

−5 Fitting: High Redundancy
Generalization: Low Redundancy
Generalization: High Redundancy
−10
10 20 30 40 50 60 70 80 90 100
Figure 8-4: Difference in accuracy between take-the-best and logis-

tic regression (plotted as take-the-best’s accuracy minus logistic
regression’s) differentiated for the low- and high-redundancy natu-
ral environments and for fitting and generalization.
importantly for our current focus, take-the-best’s advantage is larger

for the environments with high information redundancy than for
the environments with low information redundancy (for both fit-
ting and generalization). Focusing on the 50% training sets and on
the crucial generalization situation, take-the-best outperformed
logistic regression in the high-redundancy environment by, on
average, 5.7% (SD = 5.4%), which is a significantly greater advan-
tage than for the environments with low information redundancy,
where take-the-best outperformed logistic regression by only 2.1%
(SD = 5.4%), t(25) = 2.4, p = .02, d = 0.93 (representing a large
effect size according to Cohen, 1988).
In sum, the strong dependency of strategy performance on infor-
mation redundancy demonstrated for artificially created decision
environments can also be found in natural environments. As a gen-
eral trend, take-the-best’s accuracy advantage over logistic regres-
sion increases with increasing average cue correlations. Moreover,
the natural environments in our sample are characterized by a rela-
tively high average correlation between cues. If information redun-
dancy is indeed a common characteristic of decision environments
and, as demonstrated, strongly affects performance of decision
strategies, one can expect people to pay attention to this environ-
mental feature and adapt their decision strategies accordingly. We
test this expectation experimentally in the next section.
How Do People Respond to Information Redundancy in Environments?
As demonstrated in the simulation studies, low information redun-

dancy can make compensatory strategies worthwhile, whereas high
redundancy benefits the much more frugal noncompensatory strat-
egies. Do people select strategies accordingly when confronted with
high- and low-redundancy environments? To test this, we con-
ducted two computer-based experiments (for details see also
Dieckmann & Rieskamp, 2007).
Participants were told to imagine they were geologists hired by
an oil-mining company to decide which of two potential drilling
sites will offer more oil. To assess the drilling sites, different tests,
such as chemical analysis of the ground stone, could be conducted.
These tests were represented by small icons on the screen. The
validity of the cues, that is, the “success probability” of the tests, as
well as the direction in which the dichotomous test results pointed,
was displayed under each of the icons. When participants wanted
to conduct a test, they had to click on the corresponding icon, and
the results were displayed simultaneously for both drilling sites
(see Figure 8-5 for a screenshot). After they had checked as many
cues as they wanted, participants chose which site to drill at.
Figure 8-5: Screenshot of the computerized information search and

decision task that participants faced (adapted from the experiments
by Dieckmann & Rieskamp, 2007).
In an initial training phase consisting of three blocks of 32 pair

comparisons each, participants were allowed to uncover informa-
tion about the drilling sites at no cost before selecting one of the
sites. After each decision, feedback was provided about whether
the right or wrong site had been chosen. Additionally, each correct
decision was rewarded with 20 cents, while for a wrong decision,
20 cents was deducted from the participant’s account. The training
phase was followed by a crucial test phase, also consisting of
three blocks of 32 pair comparisons, with the only difference being
that information search became costly: Participants now had to
pay 3 cents for each test they conducted.
Participants were assigned to either a high- or a low-redundancy
condition. These two experimental environments were created
using a procedure similar to the construction process for the
artificial environments in our first simulation study. In the high-
redundancy environment, the average correlation between cues
was r = .50. To produce a stronger experimental manipulation, the
average cue correlation in the low-redundancy environment was
set to r = −.15. Thus, under low redundancy the cues not only pro-
vided additional valid information, but also revealed pieces of
information that were often in conflict with each other. In line with
the simulation results, applying a compensatory strategy such as
Franklin’s rule in the high-redundancy environment would result
in the same accuracy as applying the more frugal take-the-best,
leading to a higher payoff for take-the-best in the test phase. In con-
trast, in the low-redundancy condition, applying take-the-best
would lead to inferior accuracy and, despite higher frugality, lower
payoff in the test phase, compared to Franklin’s rule.
How did participants respond to the two redundancy condi-
tions? To elicit which strategy best described each participant’s
inferences, we tested the strategies’ process predictions, that is,
how participants should search for information. In the training
phase, the two groups of participants did not differ. Participants
rarely stopped their information search after finding a first discrim-
inating cue (as would be predicted by take-the-best): Stopping
consistent with take-the-best was observed in 26% of decisions
in the low-redundancy condition and 23% in the high. This pattern
changed dramatically when information search costs were intro-
duced in the test phase. Participants in both conditions became
more frugal in their information search. In the low-redundancy
condition, participants still often continued search even after
finding a discriminating cue, stopping search in accordance with
take-the-best in only 44% of decisions. However, stopping search
on the first discriminating cue became the predominant pattern in
the high-redundancy condition, where it was observed for 77% of
(a) High Information Redundancy

1
.8
Proportion of Trials
.6
Take-the-best Stopping Rule
Continued Search
.4
.2
0
1 2 3 4 5 6
Learning Phase Decision Phase
Blocks
(b) Low Information Redundancy

1
Continued Search
.8
.6
.4
.2
0
1 2 3 4 5 6
Blocks
Figure 8-6: Proportion of nonguessing trials in which search

stopped in accordance with take-the-best (i.e., when one discrimi-
nating cue was found) compared to the complement proportion of
instances in which search continued beyond a first discriminat-
ing cue, across the six blocks of trials for (a) the high-redundancy
condition and (b) the low-redundancy condition in Experiment 1
(adapted from Dieckmann & Rieskamp, 2007). (Error bars represent
one standard error.)
all decisions (see Figure 8-6). In sum, the participants apparently

learned that trusting the first discriminating cue is a successful
strategy under high information redundancy; they remained reluc-
tant to do so in the low-redundancy condition. They seem to have
discovered that compensation pays off in low-redundancy environ-
ments, even with search costs.
In a second experiment we assessed whether the participants

simply learned from feedback to apply the most adaptive strategy
without deliberately noticing the information redundancy, or
whether, in fact, they realized that the available information was
highly redundant. We had the same two experimental conditions
using the same inference problems as before. The only difference
was that the participants did not receive any outcome feedback in
the training phase. Participants could still explore the cue structure
at no cost, but they were not told whether their inferences were
right or wrong. Outcome feedback was only introduced in the test
phase, along with search costs. Without any outcome feedback it
was not possible for participants to learn whether a specific strat-
egy performed well in comparison to an alternative strategy in the
training phase. Thus, if participants were able to respond adap-
tively in the test phase of the experiment, this could be attributed
to their success in uncovering the information redundancy of the
environments.
In the training phase, participants rarely stopped their informa-
tion search right after finding the first discriminating cue. They
only stopped in accordance with take-the-best in 19% of all deci-
sions under low information redundancy, and in 29% of all deci-
sions under high redundancy. However, stopping behavior again
changed profoundly in the test phase. Participants in the low-
redundancy condition still predominantly continued to search
beyond the first discriminating cue and stopping consistent with
take-the-best was observed for only 42% of all decisions. In con-
trast, stopping right at the first discriminating cue became the most
frequent search behavior in the high-redundancy condition, in
63% of decisions. This effect was observed from the first block of
the test phase onward (see Figure 8-7).
Thus, even without outcome feedback in the learning phase,
participants were able to adapt their inference processes to infor-
mation redundancy in the environment, indicating that they
picked up on environment structure and not (just) strategy success.
But how could participants judge the degree of redundancy in the
environments they saw? In the learning phase, observing a frequent
occurrence of divergence between cues (i.e., cues supporting
different objects) could be used as a shortcut to identify a low-
redundancy environment, whereas seeing frequent accordance
between cues (i.e., cues supporting the same alternative) was indic-
ative of a high-redundancy environment. This experience of differ-
ent degrees of information redundancy obviously was sufficient to
trigger the selection of adaptive strategies. However, outcome feed-
back probably still enhances adaptivity, indicated by the fact that
the effects observed in the second experiment were smaller than
those in the first experiment.
(a) High Information Redundancy

1
Continued Search
.8
.6
.4
.2
0
1 2 3 4 5 6
Blocks
(b) Low Information Redundancy

1
Continued Search
.8
.6
.4
.2
0
1 2 3 4 5 6
Blocks
Figure 8-7: Proportion of nonguessing trials in which search

stopped in accordance with take-the-best (i.e., when one discrimi-
nating cue was found) compared to the complement proportion of
instances in which search continued beyond a first discriminat-
ing cue, across the six blocks of trials for (a) the high-redundancy
condition and (b) the low-redundancy condition in Experiment 2
(adapted from Dieckmann & Rieskamp, 2007). (Error bars represent
one standard error.)
Conclusions
This chapter has focused on how the information redundancy of

environments affects both strategy performance and decision
makers’ strategy use. We have demonstrated that information redun-
dancy strongly influences the ecological rationality of inference
strategies and acts as a key feature for people to use in selecting

strategies adaptively in response to environmental demands.
How exactly are strategies’ accuracies influenced by information
redundancy? In our artificially created environments, logistic
regression, our benchmark model, was outperformed by the other
strategies under conditions of high information redundancy and
low validity dispersion. In contrast, under low information redun-
dancy, logistic regression outperformed the other strategies. Take-
the-best, the simplest (and most frugal) strategy we tested, performed
well when information redundancy was high but fell far behind
all the other strategies when faced with an environment with low
information redundancy and low dispersion of cue validities. Thus,
we can conclude that in an environment in which several cues
have similar validities and often offer new information (i.e., near
zero cue correlation), take-the-best is not an adaptive strategy to
apply. In contrast, in a situation in which the cues are, to a large
extent, correlated with each other, take-the-best will perform well.
However, when we analyzed the 27 natural environments, we
did not find any environment in which the average cue correlation
was near zero. The low information redundancy case of our artifi-
cially created environments (mean r = .01) thus may represent a
rather extreme situation that does not occur very often in reality. If,
instead, real environments are usually characterized by informa-
tion redundancy, they can be exploited by simple strategies such
as take-the-best. This could explain why take-the-best outperformed
logistic regression in generalization for the natural environments
both with low and high information redundancy. Nevertheless, the
advantage of take-the-best in comparison to logistic regression was
larger for high information redundancy environments than for low
information redundancy environments.
One reason why cues used in a natural environment may often
be correlated with each other is that decision makers typically con-
sider cues that are positively correlated with the criterion. This
provides a constraint on how independent of each other the cues
can be. For instance, if two cues are perfectly correlated with the
criterion, then they must be perfectly correlated with each other. In
contrast, the lower the correlation is between the cues and the
criterion, the higher the degree of freedom as to how strongly the
two cues are correlated with each other.
For preferential choices, the correlation between cues or attri-
butes plays a different, more complex role (Fasolo, McClelland, &
Todd, 2007). Redundancy per se does not matter as much, but it is
important to differentiate between positive and negative attributes.
The essential factor is not the overall correlation between the dif-
ferent attributes of the choice options, but the correlation between
attractive and unattractive attributes. This determines whether the
situation is characterized by concordance or conflict. In concor-

dance situations, attributes that are highly valued by the decision
maker are positively correlated with each other, and negatively
correlated with unattractive attributes. Such a situation makes it
easy to differentiate good options from bad ones: Attractive options
are likely to have many good and few bad attributes, while unat-
tractive options tend to lack good attributes and negative features
prevail. It is like giving someone a choice between ten euros for
sure now versus one euro with a probability of 10% in 2 weeks.
As is obvious from this example, such simple choices are either
rare in our lives, or we simply do not experience them as choices,
because the better option is too obvious and does not require delib-
eration. Instead, we often experience conflict: Should we buy an
expensive digital camera with high resolution and lots of fancy
features, or spend less money on a slightly outdated model? Should
we accept a new job that offers more money but requires more
working hours? Highly correlated attributes can provide redundant
information in preference tasks as well. This can make information
search more frugal because we can infer from a few attributes which
other attributes are likely present. However, this does not tell us
what to choose. When there is conflict, we need to deal with—or
avoid—the trade-offs between attributes (Fasolo et al., 2007).
Past research has shown that the structure of an environment
can lead people to select different strategies for their inferences
(e.g., Bröder & Schiffer, 2003b; Rieskamp, 2006, 2008, Rieskamp &
Otto, 2006). The adaptive selection of strategies can sometimes be
conceptualized as a learning process in which people learn to select
the most successful strategy. The competition between various
strategies presented in this chapter illustrates that information
redundancy, by strongly affecting strategy accuracy, should be a
crucial factor in contingent strategy selection. And indeed, our
experimental results show that people respond adaptively to infor-
mation redundancy, selecting different strategies in high- or low-
redundancy environments to achieve ecological rationality.
9
The Quest for Take-the-Best
Insights and Outlooks From Experimental Research
Arndt Bröder
What is sometimes required is not more data or more

refined data but a different conception of the problem.
Roger N. Shepard
R oger Shepard’s (1987b) insight on new questions versus new

data is an important reminder for all of us concerned with scientific
research, but it is frequently overlooked in the busy rush of “normal”
science. Rather than filling journals and textbooks with new exper-
iments apparently corroborating old claims or piling up data in
support of minuscule theories (and also considering that such
new data too often have little impact in changing other scientists’
views, as described in chapter 3), it can be fruitful (and may turn
out to be crucial) to question the very assumptions behind existing
paradigms and to reconceptualize the problems being studied. This
may either help to shatter old beliefs or lead to a more coherent
view of seemingly separate fields. The ecological rationality per-
spective developed in this book is a new look at the apparent “ratio-
nality paradox” typified by the observation that “we can put a man
on the moon, so why can’t we solve those logical-reasoning prob-
lems?” (O’Brien, 1993, p. 110). Instead of taking the pessimistic
view that empirical results imply errors in reasoning, this perspec-
tive suggests the optimistic view that errors may instead lie in
posing the wrong research questions (McClelland & Bolger, 1994).
We should not ask why people make so many mistakes but rather
what environments and tasks our minds are particularly suited to.
The study of ecological rationality does just this, seeking to identify
the cognitive mechanisms in the mind’s adaptive toolbox, which
are effective in defined ecological settings, precisely specified in an
algorithmic manner, and computationally tractable (the latter being
a precondition for psychological plausibility).
216
THE QUEST FOR TAKE-THE-BEST 217
What must be added to Shepard’s statement, though, is the obvi-

ous fact that inventing new conceptions is not enough—it is only a
starting point for new empirical investigations. Hence, however
pretty it might be, any new conception is “only” a new theory, and
it has to pass rigorous empirical tests like any other. As a conse-
quence, it will be pulled onto the dissection table by merciless
experimentalists (like myself), at least if such curious people find it
interesting in the first place. This has certainly been the case for the
new conception of ecological rationality. In the beginning, many
scholars bemoaned the limited empirical evidence for the adaptive
toolbox concept and one of its first-studied tools, the take-the-best
heuristic (see Allen, 2000; Bröder, 2000a; Chater, 2000; Cooper,
2000; Lipshitz, 2000; Luce, 2000; Newell & Shanks, 2003; Newstead,
2000; Oaksford, 2000; Shanks & Lagnado, 2000), or they criticized
the existing evidence for take-the-best (e.g., Gigerenzer, Hoffrage, &
Kleinbölting, 1991; Hoffrage, Hertwig, & Gigerenzer, 2000) as too
weak to be convincing (Bröder, 2000a). Since that time, however, a
few dozen experiments have been conducted that have increased
our understanding of why, when, and how people use simple heu-
ristics such as take-the-best in making inferences. This chapter will
present some of that empirical work—that is, my own efforts to dis-
sect the adaptive toolbox and take-the-best to see if they really have
anything of substance inside.
Although a number of researchers who have experimentally
investigated take-the-best and similar heuristics have significantly
influenced my thinking through a direct or indirect exchange of
ideas (Hausmann, 2004; Lee & Cummins, 2004; Newell & Shanks,
2003; Newell, Rakow, Weston, & Shanks, 2004; Newell, Weston, &
Shanks, 2003; Rieskamp & Hoffrage, 1999; Rieskamp & Otto, 2006),
here I will mainly focus on work from my own lab. I will provide a
synopsis of our results in an effort to bring together the scattered
messages of separate journal articles. Table 9-1 gives an overview
of the questions addressed and the experiments and results reported
in this chapter (which will be numbered consecutively in the
text and do not necessarily match the experiment numbers in the
original papers), together with the published sources that provide
more detailed information about procedures and data. Altogether,
the work reported here sheds some light on the following ques-
tions: Is take-the-best a universal theory of probabilistic inferences?
Are people adaptive decision makers? What personality factors
influence strategy use? And what is the role of cognitive and memory
limitations and capabilities in selecting strategies? One main feature
of my work has been that the research questions themselves changed
dynamically with new insights. My hope is to communicate the
Table 9-1: Overview of the Experiments Mentioned in This Chapter
No. Source Main research question Tentative
answer
1 Bröder (2000c), Exp. 1 Do all people use take-the- No
best in all decisions?
2 Bröder (2000b), Exp. 1 Do all people use take-the- No
best, but possibly with
errors?
3 Bröder (2000c), Exp. 2 No
4 Bröder (2000a), Are people adaptive take- Probably
Exp. 2 the-best users?
5 Bröder (2000a), Probably
Exp. 3
6 Bröder (2000a), Exp. 4 Probably
7 Bröder (2003), Exp. 1 Are people adaptive take- Yes
the-best users?
8 Bröder (2003), Exp. 2 Yes
9 Bröder & Schiffer Do routines hinder Yes
(2006a), Exp. 1 adaptivity?
10 Bröder & Schiffer Yes
(2006a), Exp. 2
11 Bröder & Eichler Do take-the-best users have Probably
(2001) a particular personality? not
12 Bröder & Schiffer Does lowering cognitive No
(2003a) capacity promote take-
the-best?
13 Bröder (2005), Do take-the-best users have No
Exp. 4a a particular personality?
14 Bröder (2005), Exp. 4c No
15 Bröder & Schiffer Does memory retrieval Yes
(2003b), Exp. 1 induce cognitive costs?
(2003b), Exp. 2
(2003b), Exp. 3
(2003b), Exp. 4
19 Bröder & Schiffer Does stimulus format Yes
(2006b) influence strategy
selection?
20 Bröder & Gaissmaier Does take-the-best predict Probably
(2007) decision times?
218
spirit of this development and to distill some general conclusions

about principles governing adaptive strategy selection and use.
I will start with a few fundamental methodological remarks.
The Man Who Mistook Take-the-Best for a Theory
Take-the-best can match the fitting accuracy of a wide range of

linear models, such as multiple linear regression, Franklin’s rule
(weighting cues by their importance and then summing them all),
and Dawes’s rule (tallying positive and negative cues and compar-
ing them), all of which involve combining cue values (Czerlinski,
Gigerenzer, & Goldstein, 1999; see also chapter 2). However, its
virtue of accuracy compared to linear models turns out to be a
curse for the experimenter, because the enormous overlap between
take-the-best’s predictions and those of linear models makes empir-
ical distinctions between the mechanisms difficult to achieve
(Bröder, 2000c; Rieskamp & Hoffrage, 1999). Hence, one has to rely
either on process tracing techniques, which monitor information
acquisition patterns that may distinguish between strategies (e.g.,
Payne, 1976; van Raaij, 1983), or on formalized methods for clas-
sifying choice outcome patterns by strategy (e.g., Bröder, 2002;
Bröder & Schiffer, 2003a). Because process tracing only allows very
limited conclusions concerning heuristic decision rules (see the
critiques of Abelson & Levi, 1985; Bröder, 2000b), I prefer outcome-
based assessments, but I use both techniques. Whether the search
patterns identified by process tracing and the decision strategies
specified by the formal methods fit together as coherent mecha-
nisms is then treated as an empirical question rather than an a priori
assumption. Box 9-1 contains a description of our experimental
method and the logic of our strategy classification.
Our first attempts to put take-the-best (as it was introduced in the
theory of probabilistic mental models by Gigerenzer et al., 1991) to
an empirical test were somewhat plagued by an incomplete under-
standing of its theoretical status. Take-the-best is a hypothesized
cognitive mechanism and a component in the theory of the adap-
tive toolbox. But I mistook it for a whole theory and set out to
destroy it because it seemed too simplistic (cf. chapter 3 on similar
reactions to simplicity by other researchers), and empirical argu-
ments to date were not convincing. A theory must have what Popper
(1959) called “empirical content” and make falsifiable predictions.
Whereas the falsifiability of take-the-best as a mechanism is rather
high because of its precise predictions, it is rather low when viewed
as a whole theory because Gigerenzer et al. (1991) and Gigerenzer
and Goldstein (1996) originally only broadly specified its domain
of application, namely, memory-based probabilistic inferences, and
Box 9-1: How We Conducted Experiments and Why We Did It This Way
If we want to know the manner in which people integrate cue information for induc-
tive inferences (i.e., their decision strategies), we must first know which cues people
use. One way to be sure of this in an experiment is to give people the cues to use
explicitly. We provided our participants with four (or five) binary cues (either seen on a
computer screen or learned in training for later recall and use in the experiment) and
cue validities (either by telling them directly or letting them acquire the knowledge
indirectly via frequency learning) and then had them make inferences by choosing
between two or three objects. The pattern of decisions allowed us to draw conclusions
about the strategy probably employed by each participant, using a maximum like-
lihood classification principle (see Bröder & Schiffer, 2003a, for details). We used
domains without much preexisting knowledge to prevent participants from relying on
cues they might bring in from outside the experiment. The tasks we used were:
Extraterrestrial ethnology: Participants were scientists judging the population sizes of

beings on another planet by considering the existence or nonexistence of different
cultural achievements (Experiments 1–4).
Stock broker game: Participants inferred which one of multiple shares had the best
prospects for profit by considering different cues about the associated firms, such as
turnover growth (Experiments 5–13).
Criminal case: Participants were detectives judging which of two suspects was
more likely to have committed a murder, based on evidence found at the scene
of the crime. The features (cues) of the suspects had to be retrieved from memory
(Experiments 14–20).
did not specify how generally they thought it would apply: Did
they expect all people to use take-the-best whenever possible, or
all people to use it sometimes, or some people to use it always, or
even only some to use it sometimes? (At the time that I conducted
my first experiments, the notion that take-the-best is only one tool
in the mind’s adaptive toolbox had not been spelled out.) Hence,
our initial research question in approaching take-the-best empiri-
cally was, Is take-the-best a universal theory of inductive infer-
ences, that is, always used by everyone?
In the first three experiments I conducted with 130 participants
in total, I assumed either that all people use take-the-best all the
time (i.e., deterministic use with no errors, Experiment 1) or that
all people use it, but they occasionally make errors (Experiments 2
and 3). Both versions of the hypothesis were clearly rejected: First,
only 5 of the 130 participants used take-the-best all the time (in 15
or 24 trials; see Lee & Cummins, 2004, for a comparable result).
Second, for the other participants, choices were clearly influenced
by other cues than just the most valid discriminating one that
take-the-best would use; this systematic influence clearly showed
that the deviations from take-the-best’s predictions could not be
explained away as random response errors.
We could have stopped here and declared the heuristic a dead
end (some authors with similar critical results came close to this
conclusion, e.g., Lee & Cummins, 2004; Newell & Shanks, 2003;
Newell et al., 2003). However, we felt that this would be a prema-
ture burial, since no theory of decision making predicts behavior
correctly 100% of the time. A more realistic version of the theory
would probably allow for both (unsystematic) response errors
and a heterogeneous population of decision makers. For instance, a
small minority of people relying on other heuristics, averaged
together with a group of predominantly take-the-best users, could
have led to my results, as we will see in the next section.
Obvious conclusions of these first experiments were that (a) not
everybody uses take-the-best in every probabilistic inference task,
and (b) if some people do use take-the-best, one has to allow for
unsystematic response errors as psychologists routinely do in other
areas. Thus, I had a definitive—and negative—answer to my initial
research question about take-the-best’s universality, but I began to
doubt that it had been a good question in the first place! Before
claiming that take-the-best was not a reasonable cognitive model, I
thought it worthwhile to confront a more realistic version of the
hypothesis instead of a universal, deterministic straw man.
The Toolbox Assumption—Are People Adaptive Decision Makers?
I next asked, therefore, if a significant proportion of people use

take-the-best. This, as we will soon see, was again not the best ques-
tion to ask. Nonetheless, to answer it, I had to develop methods to
assess individual decision strategies, which is challenging if one
wants to avoid arbitrary criteria (see Bröder, 2002). First, the unit of
analysis must be the individual rather than a group mean, because
the latter would obscure potential individual differences. Second,
one has to compare different strategies (or, technically, models)
rather than just assess the fit of one strategy of interest to each indi-
vidual’s choice data. A good model fit per se is not very informative
(Roberts & Pashler, 2000). Third, I preferred modeling based on
decision outcomes rather than process-tracing measures because
the latter rely on some questionable assumptions (see Abelson &
Levi, 1985; Bröder, 2000b) and focus on information search rules
instead of the decision rules in which I was primarily interested
(Bröder & Schiffer, 2003a). In a nutshell, the methods I and my col-
leagues developed assess which strategy (take-the-best, Franklin’s
rule, Dawes’s rule, guessing) best fits an observed pattern of choices

of a participant in an experiment. Experiment 4 was our first to
assess the number of participants whose best-fitting strategy was
take-the-best. In this experiment, participants were sent to a distant
planet as extraterrestrial scientists who had to judge the level of
development of different cultures (the same task as in the first
three experiments). For 11 of 40 participants (28%), their choices
could best be described by take-the-best’s decision rule. Is that a lot
or a little? To decide, we need to compare with the other possible
strategies we had tested. The proportion of participants whose
choices could be best described by Dawes’s rule was 0%, but pre-
sumed users of Franklin’s rule (72%) were more prevalent than
those of take-the-best. While the proportion of presumed take-the-
best users is not overwhelming, it is still comparatively large enough
that it should not be entirely ignored. So now what?
As we did not get a satisfying answer, we reexamined our ques-
tion. Rather than asking if there is a sufficient proportion of take-
the-best users to take the heuristic seriously, we turned to the
question of whether there are conditions under which take-the-best
use is boosted and whether these conditions fit the model of contin-
gent decision making or the concept of ecological rationality (i.e.,
that there are environment structures that take-the-best can exploit
to do well). Hence, we changed our research question by asking
now, Are people adaptive take-the-best users? To elaborate on the
second point, the ecological rationality of heuristics lies in their
match with a certain environment structure (Czerlinski et al., 1999;
Johnson & Payne, 1985; Martignon & Hoffrage, 2002) and according
to the adaptive toolbox assumption, people should use take-
the-best when it is appropriate. Hence, we began to examine envi-
ronment and task variables that could be expected to influence
take-the-best deployment. If the proportion of take-the-best users
was unaffected by such variables and continued to hover around
the 28% level found in Experiment 4, this would render the adap-
tive use of take-the-best questionable.
One potential criticism of Experiments 1 to 4 is that they all
involved the simultaneous presentation of cue values on a com-
puter screen during decision making. In contrast, Gigerenzer and
Goldstein (1996, p. 651) had explicitly defined the task of take-the-
best as one involving search for information, and specifically search
in memory. In my first experiments, there were no costs of search-
ing for or retrieving information, which if included would probably
shift the balance of ecological rationality in take-the-best’s favor
(see Gigerenzer & Todd, 1999). In addition, the experiments involved
neither feedback on successful choices nor incentives for good deci-
sions, possibly hindering the ability and desire of participants to
behave adaptively. We therefore changed the experimental setting to
a hypothetical stock broker game on a computer screen in which

participants could acquire cue information about stocks before
choosing one of two or three alternatives to invest in (an idea mod-
ified after Rieskamp, 1997). The binary cues included information
about the firms (e.g., whether there was turnover growth during the
last year), and participants acquired this information by clicking
appropriate fields on the screen. This paradigm allows for monitor-
ing information search and manipulating the (relative) costs of
information acquisition. Furthermore, the success of the chosen
stock provides feedback that allows the participant to adjust strat-
egy choice accordingly. In the first experiment using this paradigm
(Experiment 5, N = 40) we used a crude manipulation of informa-
tion costs: To see a cue value in each trial, participants had to pay
either 1% or 10% of the maximum amount they could win in this
trial.1 This measure boosted the percentage of probable take-the-
best users to 40% in the low-cost condition and to 65% in the high-
cost condition. In Experiment 6 (N = 80), we replicated the result
of the 65% who were take-the-best users when the information
costs were high, and by isolating these variables we found that nei-
ther outcome feedback nor the successive cue retrieval per se were
responsible for the rise in take-the-best use. The message so far was
plain and simple: If you raise information costs, people become
reluctant to use all of the information and instead adhere to a frugal
lexicographic strategy such as take-the-best, using just the first cue
that allows a decision to be made.
This conclusion may not sound too surprising, and it is also
compatible with the assumption that people are miserly rather than
smart. But are monetary costs the only environmental factor to
which people adapt their strategy use? Earlier studies of the eco-
logical rationality of take-the-best showed other forms of environ-
ment structure that the heuristic could exploit, including high
variance of cue validities, high redundancy between cues (see
chapters 8 and 3), and scarce information (Martignon & Hoffrage,
1999, 2002). We next investigated an important instance of the first
form, namely noncompensatory versus compensatory environ-
ments. In noncompensatory environments, when cues are ranked
according to their importance (e.g., their weight in a linear combi-
nation), each cue cannot be outweighed by any combination of the
lower-ranked cues. In compensatory environments, some cues
can be outweighed—or compensated for—by a combination of
other, lesser cues. This has implications for the performance of dif-
ferent strategies, in that noncompensatory decision mechanisms
1. The “amounts” were hypothetical, not real. In most experiments

involving the stock market paradigm we had monetary prizes for the best
brokers to increase motivation.
that do not combine cues work better in noncompensatory environ-

ments than in compensatory environments where cue combina-
tions cannot beat individual cues. In particular, take-the-best, as a
noncompensatory strategy, cannot be outperformed in terms of
decision accuracy by a linear combination rule in a noncompen-
satory environment (if the order of the cues corresponds to that of
the linear weights—see Martignon & Hoffrage, 1999, 2002).
To find out whether people are sensitive to the difference between
noncompensatory and compensatory environments, we ran four
further experiments (Experiments 7–10 with N = 100, 120, 121,
and 120, respectively), in which we kept the nominal prices for
acquiring cue information constant but varied the importance dis-
tribution of the cues as defined by their weights in the payoff func-
tion. This meant that in the noncompensatory environments, the
expected payoff of consistently using take-the-best was greater than
the expected payoff of Franklin’s rule, a compensatory strategy,
because the former paid for fewer cues than the latter; or in other
words, the cost of checking all the cues exceeded the expected
return of the information they provided. In contrast, in the compen-
satory environments, the acquisition of more than one cue value
was profitable in the long run, and it was detrimental to ignore
information. What we found in all four experiments was that the
majority of participants used the strategy appropriate for the given
environment: adaptive strategy use. However, more people used
compensatory strategies overall, which points to a slight preference
for compensatory strategies, at least in this stockbroker task. Hence,
while many people were quite efficient at figuring out appropriate
strategies based on feedback (payments) they received, others
seemed to rely on an apparently “safe” compensatory strategy.
We see these patterns in Figure 9-1. Across the experiments, a
clear adaptive trend in strategy use can be seen: The higher the
ratio of expected gains in favor of take-the-best, the more people
employ this strategy. At the same time, looking only at the circles
(experimental conditions without further significant manipula-
tions), one can see that in all three compensatory environments
(payoff ratio <1), compensatory strategies were most prevalent,
while take-the-best was the most prevalent strategy in only five of
the nine conditions with a noncompensatory environment. This
points to a conservative bias in favor of compensatory decision
making (see Rieskamp & Otto, 2006, for comparable results).
Three of the four squares in Figure 9-1 are a profound exception
to the adaptive strategy use trend. They all represent conditions
from Experiments 9 and 10 in the second half of each experiment,
after the payoff structure of the environment had changed from
compensatory to noncompensatory or vice versa. Although these
payoff changes were rather extreme, participants obviously did not
80
Percentage of Take-the-Best Users

70
60
50
40
30
20
10 Compensatory Noncompensatory
Environments Environments
0
.4 .6 .8 1.0 1.2 1.4 1.6
Ratio of Expected Payoffs for
Take-the-Best vs. Franklin’s Rule
Figure 9-1: Percentages of take-the-best users in various settings of

the stock market game with environment structure that is charac-
terized by different expected payoff ratios for take-the-best versus
Franklin’s rule. Ratios less than 1 denote “compensatory” environ-
ments; ratios greater than 1 denote “noncompensatory” environ-
ments. The circles depict 12 conditions in Experiments 7–10; there
is a clear adaptive trend (r = .87) as environment structure changes.
Squares show the maladaptive tendency to maintain a routine that
was established before a change of the environment’s payoff struc-
ture (Experiments 9 and 10). The open and filled triangles depict
the “high cognitive load” and the control condition, respectively,
from Experiment 12. (Adapted from Bröder & Newell, 2008.)
react adequately—the level of take-the-best use mostly remained

appropriate to the previous environment structure. Neither receiv-
ing a hint (Experiment 9) nor a change to a related but different task
along with an additional monetary incentive (Experiment 10)
helped much to overcome this maladaptive reliance on a previ-
ously established routine. Hence, we concluded that most people
readily and quickly adapt to the payoff structure of a new task (with
a slight compensatory bias), but they have difficulties in over-
coming a routine that they had established in the same or a similar
task before. These routine-retaining tendencies were particularly
extreme for the information acquisition behavior (e.g., number of
cues looked up, time spent on info search, or Payne’s, 1976, strat-
egy index). Maladaptive routine effects have been known in cogni-
tive psychology for a long time (see Luchins & Luchins, 1959) and
have also been demonstrated in the domain of decision making (see
Betsch & Haberstroh, 2005, for an overview). Nevertheless, we were
quite surprised to find their massive impact in our studies, con-

trasting with the participants’ general ability to adapt.
To summarize, the evidence so far looked fairly supportive of
the idea of the adaptive decision maker (e.g., the contingency model
of Payne, Bettman, & Johnson, 1993) in general and the adaptive
toolbox in particular. Take-the-best seems to be a part of the mind’s
toolbox, and under appropriate circumstances many people will
employ this tool. However, the routine effects suggest that people
are reluctant to change a tool they have just gotten used to using.
Obviously, other cognitive processes play a role when an appar-
ently known situation changes. One may speculate that strategy
selection is more deliberate and effortful when people are first
confronted with a new situation, such as when entering a new
experiment session. They may switch to simpler and slower learn-
ing processes (e.g., reinforcement learning, see Rieskamp & Otto,
2006) when the situation is well known, such as after the session
has been underway for a while.
Who Are the People Who Use Take-the-Best?
Although the payoff structure of the environment was a major

determinant of take-the-best use (see Figure 9-1), there were obvi-
ously individual differences—not everyone employed the same
strategy. In noncompensatory environments, a proportion of par-
ticipants continued to use Franklin’s rule or Dawes’s rule, whereas
others still used take-the-best if a compensatory strategy was more
favorable. Individual differences in decision-making strategies have
been widely reported (e.g., Brehmer, 1994; Einhorn, 1970; Lee &
Cummins, 2004; Newell & Shanks, 2003; Slovic & Lichtenstein,
1971). Zakay (1990) emphasized this individual variation and
hypothesized that “strategy selection in decision making is depen-
dent both on a basic tendency toward using a specific strategy
and a cost–benefit analysis” (p. 207). Subsequently, Shiloh, Koren,
and Zakay (2001) bemoaned a surprising lack of systematic studies
concerning these hypothesized “basic tendencies,” which they
conceptualized as presumably stable personality traits. Thus, in
addition to investigating “adaptivity,” as discussed above, we had
to think about the causes of individual differences: In other words,
do take-the-best users have a particular personality?
The way in which psychologists assess individual differences
is simple in principle: They administer well-validated personality
tests and correlate them with the behavior of interest. In our case,
the behavior of interest was the decision strategy people use. To
look for correlations with this behavior, we had our participants
play the stock market game in four experiments (11–14) and
additionally fill out self-descriptive questionnaires intended to

measure different fundamental personality traits that we thought
could be plausible determinants of noncompensatory decision
behavior. In Experiment 11 (N = 61), the traits measured were action
orientation, achievement motive, self-efficacy, need for cognition,
impulsivity, and rigidity (see Table 9-2 for a list of traits tested
and references). Although we had no strong a priori hypotheses,
our intuition was that achievement motive, self efficacy, need for
cognition, and rigidity would be associated with more elaborative
compensatory decision making, whereas take-the-best users might
show higher action orientation and impulsivity. (We also measured
rigidity and action orientation in Experiment 7, N = 100.) Next, in
the two similar Experiments 12 and 13 (N = 60 for each, analyzed
here together) we assessed the impact of the so-called “Big Five”
traits nowadays considered to be fundamental personality dimen-
sions (emotional stability, extraversion, openness, agreeableness,
and conscientiousness; see Costa & McCrae, 1992). In addition, we
assessed both facets of socially desirable responding, namely,
impression management and self-deception (Paulhus, 1984). In
each of the experiments we computed the multiple correlations
between the personality construct and the decision strategy
used, shown in Table 9-2. To make a long story short, none of the
personality measures showed a substantial correlation with selected
strategies.2
Thus, we did not find any evidence for a basic personality trait
that might be associated with the default tendency to use lexico-
graphic rather than compensatory decision strategies. Furthermore,
an experimental manipulation of achievement motivation did not
have any impact on strategy use (Experiment 14, N = 60): In one
group, we told participants that performance in the stock market
game is highly correlated with intelligence, whereas the control
group was told only that they were involved in a preliminary study
of a new experimental task. Of course, we cannot exclude the pos-
sibility that we were looking at the wrong personality traits the
whole time, while ignoring more important ones. However, given
the broad class of cognitive and motivational variables we exam-
ined, we consider this possibility unlikely. We tend to conclude
that the individual differences observed may be dependent on par-
ticipants’ transient states rather than stable traits. This possibility
should be examined in further studies investigating the stability of
strategy preferences.
2. The small but significant correlation with impulsivity (R2 = .08) in

Experiment 11 was not replicated in Experiment 7 despite substantially
higher statistical power.
Table 9-2: Multiple Correlations (Adjusted R2) Between Decision

Rules and “Big Five” Personality Traits in Several Experiments
Study Scale Source Adjusted p
R2
Experiment 11 Achievement motive Fahrenberg .03 .14
(N = 61) (12 items) et al. (1994)
Action orientation Kuhl (1994) .02 .25
(24 items)
Self-efficacy Schwarzer & .01 .30
(10 items) Jerusalem
(1999)
Need for cognition Bless et. al. .01 .31
(16 items) (1994)
Impulsivity Stumpf et al. .08 .04
(16 items) (1984)
Rigidity (8 items) Zerssen (1994) .00 .42
Experiment 7 Achievement motive Fahrenberg −.02 .79
(N = 100) (12 items) et al. (1994)
Action orientation Kuhl (1994) .00 .36
(24 items)
Impulsivity Stumpf et al. −.01 .51
(16 items) (1984)
Experiments 12 Emotional stability Borkenau & −.02 .84
& 13 (N = 120) (12 items) Ostendorf
(1993)
Extraversion −.01 .55
(12 items)
Openness (12 items) .00 .38
Agreeableness .00 .40
(12 items)
Conscientiousness .02 .19
(12 items)
Impression Musch et al. .01 .24
management (2002)
Self-deception −.00 .27
Note. The strategy classification is a nominal variable that was dummy coded for
these analyses. Adjusted R2 is a measure of the association between the dummy-
coded strategy variables and the personality trait, and is a less biased estimate of the
population association than R2.
So we still do not know who these take-the-best users are! One

somewhat comforting fact is that other areas such as personality-
oriented consumer research have been no more successful in
answering this question (Foxall & Goldsmith, 1988). The inability
to find good predictors of individual decision-making strategies
seems to be widespread. But there may be a reason for this, namely,
that we again asked the wrong question. Rather than asking about
the relation between personality and default strategy use, the
adaptive question would be whether there is a correlation between
individual capacities and the ability to choose an appropriate strat-
egy in different environments.
One result of Experiment 11 left us somewhat puzzled and
helped us aim our individual differences question in a new direc-
tion: In addition to the personality measures, participants completed
several scales of an intelligence test (Berliner Intelligenz-Struktur-
Test—Jäger, Süß, & Beauducel, 1997; see Bröder, 2005, for details
of the subscales used), and the intelligence score was slightly, but
significantly, correlated with selected strategies (R2 = .10). However,
contrary to our expectation, it was the clever ones who used take-
the-best!
Individual Differences in Cognitive Capacity and Strategy Use
So our next question became, Do differences in intelligence help

explain strategy use in different environments? Experiment 11 had
only one environment structure: a 10% expected payoff advantage
for using take-the-best compared with using Franklin’s rule, which
we thought would be negligible. But what we found was that
take-the-best use was positively correlated with intelligence: The
take-the-best users had an intelligence score on average about 0.3
standard deviations above Franklin’s rule users and about 1.0 stan-
dard deviation above Dawes’s rule users. The more intelligent par-
ticipants seemed to be better at figuring out the subtle payoff
difference between strategies and consequently using the more
frugal take-the-best strategy. We replicated this trend in two other
experiments (Experiments 7 and 8) with different environmental
payoff structures. In both experiments, there was a significant cor-
relation between selected strategies and intelligence for noncom-
pensatory environments (R2 = .20 and R2 = .14, respectively),
whereas a correlation was absent in environments with a compen-
satory payoff structure (R2 = .05 and R2 = .00). Apparently, the
smartest people used take-the-best in noncompensatory environ-
ments, while in compensatory environments there was no strategy
difference between participants with different intelligence scores.
So the answer to the question of a particular take-the-best personal-
ity was surprising: Concerning motivational variables and cogni-
tive style, we did not find a specific take-the-best user profile. On
the other hand, cognitive ability was related to strategy used, but in
an unexpected way—higher intelligence scores were related to
greater use of an appropriate strategy, not to greater use of a par-
ticular strategy.
How can the consistent pattern of strategy use that we found be

explained? Our proposal is that most participants entered the
experiments with an initial “conservative” preference for compen-
satory decision making because they considered it risky to ignore
information. The feedback during the first decision trials would in
principle have enabled them to figure out the appropriate strategy,
but only the clever ones effectively used this information. In non-
compensatory environments, these participants adjusted their
strategy (and used take-the-best), whereas in compensatory envi-
ronments they stuck to the compensatory strategy almost everybody
used anyhow. (This explanation holds in situations such as ours
where people already know the order of cue importance or validity;
when this order must be learned, noncompensatory heuristic users
may take a long time to find the order—see chapter 11—which
could make it adaptive to start with a compensatory strategy and
greater cue exploration in those situations as well.)
Does Lowering Cognitive Capacity Promote Simpler Strategies?

According to the common reasoning about contingency models of
strategy selection, compensatory strategies are much more costly
than noncompensatory strategies to perform, but they are on aver-
age more accurate (Beach & Mitchell, 1978; Christensen-Szalanski,
1978; Chu & Spires, 2003; Payne et al., 1993). This traditional view
of decision making postulates an effort–accuracy trade-off, in which
to make better decisions people have to use more information
and processing—more is better. In this view, the reason why people
use simple heuristics is that we have limited cognitive capacities.
The effort–accuracy trade-off implies that people will have to
sacrifice some decision accuracy if their processing costs increase.
Typically, this will mean using simpler—for example, noncom-
pensatory—strategies. Since lowering cognitive capacity raises rel-
ative processing costs, simpler strategies such as take-the-best
should prevail when people are put under cognitive load. This kind
of effort–accuracy trade-off does not follow from the ecological
rationality perspective, focusing as it does on environment-driven
strategy selection, and furthermore seems to be at odds with our
results on individual differences in intelligence and strategy use.
This makes it interesting to test a more experimental manipula-
tion of capacity, allowing us to answer a new research question:
Does lowering a person’s cognitive capacity promote simpler
strategy use?
Experiment 12 (N = 60) was designed to test this implication of
contingency models, and it yielded another unexpected result.
In this experiment, the environment was set up to give a slight
advantage to take-the-best users. We had a control group of partici-

pants play the stock market game and make decisions while hear-
ing a series of digits they were instructed to ignore. The
experimental group, in contrast, was put under heavy attentional
demands: They had to attend to the digit string (while investing in
stocks!) and count the occurrences of the digit “nine.” Occasionally,
they were prompted to type in the number of nines presented since
the last prompt, and wrong answers were punished with charges
subtracted from their virtual bank account. This secondary task
massively decreased the cognitive resources available for the pri-
mary decision task. What did we expect to happen in terms of peo-
ple’s decision strategies? In accordance with the beliefs of
researchers who favor contingency models of decision making
(Beach & Mitchell, 1978; Christensen-Szalanski, 1978; Payne et al.,
1993) and of laypeople (Chu & Spires, 2003), we expected that a
decreased cognitive capacity would increase the relative process-
ing costs of elaborate (i.e., compensatory) strategies and therefore
shift the balance toward more take-the-best use.
Exactly the opposite happened: Only 27% of the people with
lowered cognitive capacity (high cognitive load) employed take-
the-best while 60% employed take-the-best in the low-load control
condition (depicted by the triangles in Figure 9-1). As with our IQ
results, greater cognitive capacity did not generally lead to more
“elaborated” compensatory decision making but rather, we believe,
to a more efficient detection of the tiny payoff advantage (less than
1%) of take-the-best in this environment (or to realizing that using
the less-effortful take-the-best would at least not harm their perfor-
mance). Correspondingly, limited cognitive capacity seems to have
hindered the detection of take-the-best’s advantage and so pre-
vented deviation from the default compensatory strategy. We there-
fore conclude that higher as opposed to lower cognitive capacity
(intelligence or working memory load) does not directly determine
the type of strategy used. Rather, cognitive capacity is helpful in
executing the metaprocess of strategy selection in an efficient
and adaptive manner (see Bröder & Newell, 2008, for a more exten-
sive discussion). To put it bluntly: If you have sufficient cognitive
resources, you do not always use more complex strategies. Rather,
you are better able to find out which information can safely be
ignored. Without sufficient resources, you may stick to an appar-
ently safe compensatory strategy. This interpretation implies that
the cognitive costs that matter most are those caused not by strategy
execution (as implied by Beach & Mitchell, 1978), but rather by
adaptive strategy selection. Hence, a high cognitive capacity does
not foster more “elaborate” strategies per se, but it enables people
to figure out appropriate strategies.
Unexpected Cognitive Costs: Memory Retrieval
The experiments reported so far followed a research tradition that

Gigerenzer and Todd (1999) termed “inferences from givens.” The
reason for the popularity of this approach is mentioned in Box 9-1:
Researchers who study information integration have to know what
information participants use in their judgments. Hence, they pro-
vide participants with that information rather than not knowing
what participants might happen to pull from the environment or
from memory. Gigerenzer and Todd criticized this approach for
studying fast and frugal heuristics because it did not involve the
cue search common to much of daily decision making, as, for
instance, in “inferences from memory,” where each cue value must
be recalled. Although our results reported above clearly showed that
high information costs promoted the use of take-the-best, we did not
know whether cue retrieval from memory would itself induce suffi-
cient cognitive costs to influence people’s inference strategies. After
all, retrieving information from memory usually seems like one of
our most effortless everyday activities, and hence Gigerenzer and
Todd’s criticism seemed at least bold (if not implausible). This time,
our skeptical research question was this: Does memory retrieval
really induce cognitive costs that impact strategy selection?
Gigerenzer and Todd (1999) forgot to provide suggestions for
how their hypothesis could be tested. As just mentioned, there are
good methodological reasons for using an “inference from givens”
approach—but can we gain similar control over the information
that people use in an “inferences from memory” task? Our simple
solution to this methodological challenge was to let people learn a
set of objects and their respective features by heart (following a
related idea of Hoffrage et al., 2000). After that, they would make
decisions based on the cues they had to retrieve from memory.
There were two consequences of this method: First, we could only
rely on outcome-based strategy classifications because process-
tracing data would not be available. Second, we had to choose a
domain in which the cues themselves did not suggest a to-be-judged
target variable during learning, because that would probably lead
to inferences being made already in the learning phase rather than
in the decision phase where we wanted to observe them. After sev-
eral pilot experiment attempts that tortured innocent participants
with bizarre stimuli such as Pokémon creatures and geometric
shapes, some students of mine came up with the ingenious idea of
using an invented criminal case. This had the invaluable advan-
tages of much higher participant motivation and relatively easy-
to-learn material, namely, potential murder suspects and their
characteristics (such as clothes, perfume, cigarette brand, vehicle,
accompanying dog, etc.).
In a pilot experiment (Experiment 15, N = 50), my colleague

Stefanie Schiffer and I wanted merely to test the suitability of the
material—but we were very much surprised to find 74% of our par-
ticipants classified as take-the-best users! Because of the general
tendency to use compensatory strategies that we had observed
before and our disbelief in Gigerenzer and Todd’s claim of costly
memory retrieval (as also earlier expressed in Gigerenzer &
Goldstein, 1996), we had expected a low percentage of take-the-
best users. Before we accepted this perplexing result as a confirma-
tion of the memory-search hypothesis, though, we had to test the
possibility that this take-the-best upsurge was caused by some
peculiarity of the material we had used. In Experiment 16 (N = 50)
we directly compared two groups with identical material solving
the same criminal case after learning all suspects by heart. The
experimental group had to retrieve the cue information from
memory whereas the control group saw all the information on the
screen during decision making. Although the percentage of take-
the-best users in the experimental group was less than in Experiment
15 (44%), it was significantly higher than in the control condition
(20%) in which Franklin’s rule (60%) and Dawes’s rule (20%)
together were clearly dominant. Again, we were surprised because
the screen versus memory difference remained even when the
materials were made identical. In Experiment 17 (N = 50) we pre-
sented cue information either verbally or pictorially and expected
consequent processing differences, which we did not find (64%
take-the-best users in both conditions).
A more effective manipulation of the two presentation formats as
depicted in Figure 9-2 (Experiment 18, N = 114) had a dramatic
effect: 47% of participants appeared to use take-the-best in the
verbal condition, whereas only 21% used it when processing the
holistic pictorial stimuli. Recently, we found this format effect even
more strongly (Experiment 19, N = 151, 70% vs. 36% take-the-best
use—see Bröder & Schiffer, 2006b). At first, we interpreted this as
evidence for simultaneous parallel processing of feature matching
for holistically retrieved pictorial stimuli (Bröder & Schiffer, 2003b),
but one piece of evidence does not readily fit this interpretation:
The decision time patterns for verbal and pictorial stimuli are
virtually identical, which may indicate equally costly memory
retrieval in the two cases. In summary, the results of our memory-
based inference studies corroborate Gigerenzer and Todd’s (1999)
memory search hypothesis claiming appreciable cognitive costs of
information retrieval.
Outcome-based strategy classification allows for an assessment
of whether people are using the decision rule postulated by take-
the-best. But what about their search and stopping rules? The
results reported can also be accounted for by assuming people used
(a)
(b)
Figure 9-2: Stimuli presented during the learning trials for the
criminal case game to investigate memory-based decisions in the
pictorial (top) and verbal (bottom) conditions of Experiments 18
and 19. (Adapted from Bröder & Schiffer, 2006b; original stimuli
were in color, with labels in German.)
234
a weighted additive strategy (e.g., Franklin’s rule) that mimics

take-the-best’s performance when the cue weights are noncompen-
satory (see Martignon & Hoffrage, 2002). How do we know that
people did not just use a compensatory strategy with different cue
weights when deciding from memory? Although genuine process
tracing data are not available with memory-based decisions, one
can analyze the time used for each decision. We classified the deci-
sion trials in our experiments into different sets: The first set con-
tained choice pairs in which the most valid cue differentiated
between objects—that is, one suspect possessed the critical feature
and the other did not. The second set contained those decision
trials in which only the second most valid cue differentiated
between the suspects, while the most valid one did not. We pro-
ceeded similarly with the third and fourth cues to construct four
different decision type sets. Figure 9-3 shows the mean decision
times of the 415 participants from Experiments 15 to 19, split into
their different outcome-based strategy classifications and further
divided by the four decision types. The time patterns observed fit
the processing assumptions of the four strategies reasonably well:
Those participants classified as take-the-best users show a marked
12
10
Decision Time (Seconds)
4 Franklin’s Rule (N=90)

Take-the-best (N=198)
2 Dawes’s Rule (N=83)
Guess (N=44)
0
Cue 1 Cue 2 Cue 3 Cue 4
Most Valid Discriminating Cue
Figure 9-3: Mean decision times (and SE) of participants with dif-
ferent outcome-based strategy classifications aggregated across
Experiments 15 to 19. The x-axis denotes different classes of deci-
sion trials in which the most valid cue discriminates (labeled cue
1), the second most valid cue (but not the first) discriminates (cue
2), and so forth. The decision time patterns roughly fit the process
descriptions of take-the-best, Franklin’s rule, Dawes’s rule, and
guessing (see text).
increase in decision time the more cues they have to retrieve to

make a decision (about one additional second per cue on average).
This fits the assumption that take-the-best users stop their informa-
tion search when they find the first discriminating cue (Bröder &
Gaissmaier, 2007).
Participants classified as guessing were the quickest decision
makers and did not show a significant increase in time depending
on the pattern of cue information, consistent with not systematically
searching for information but deciding randomly. Franklin’s rule
users showed a smaller decision time increase than take-the-best
users as the most valid cue changed and needed much more time
for their decisions. This is compatible with the assumption that
these participants always retrieve more information than just
the most important cue3 and integrate it in an effortful manner.
The time increment of Dawes’s rule users was also small, as they
must usually also retrieve most available cues, and they were much
quicker than Franklin’s rule users overall, reflecting their much
simpler and faster integration rule. Hence, the decision times are an
“indirect” process indicator that corroborates the sequential
retrieval assumptions of take-the-best.
Skeptics may worry that in all our experiments validity and
retrieval ease were confounded: Participants learned the cues in a
specific order, which was in terms of decreasing validities.
Therefore, we ran one more study (Experiment 20) to disentangle
cue validity and learning-order-based ease of retrieval by making
the cue-learning sequence differ from the validity order. Specifically,
the learning order of the cues was now cue 3–cue 1–cue 4–cue 2,
where the numbers denote the validity ranks. Because the two
orders no longer matched, this would make a validity-based retrieval
sequence cognitively harder to perform than in Experiments 15–19.
Nonetheless, the outcome-based strategy assessment suggested that
only 5 of 82 participants in this new experiment followed a take-
the-first decision rule ordering cues by learning order (and hence
retrieval ease), while 32 participants appeared to use take-the-best
3. Critics may complain that there should not be an increase in decision

times with Franklin’s rule and Dawes’s rule because these strategies always
use all information. But even if one follows a perfect compensatory decision
rule (e.g., Franklin’s rule with four cues), one does not always have to retrieve
the third and fourth cue. For instance, if the first two cues (when presented
in validity order) favor one option and speak against the other, this impact
cannot be overruled by two less valid cues. Hence, retrieving them is point-
less. This argument is valid, however, only for a fixed number of cues as in
our experiments. Another interpretation of the slight increase is that partici-
pants classified as using Franklin’s rule or Dawes’s rule did this in the major-
ity of trials, but they used take-the-best in some of the trials. The outcome-based
classification assumes constant strategies as a simplification.
25
23
Decision Time (Seconds)
21
19
17
15
13
11
9
Take-the-first (N=5)
7 Take-the-best (N=32)
5
Cue 1 Cue 2 Cue 3 Cue 4
Most Valid Discriminating Cue
Figure 9-4: Mean decision times (and SE) of participants classi-

fied as using take-the-best or take-the-first in Experiment 20. The
latter ad hoc strategy was apparently used by five participants who
ordered cues according to retrieval ease rather than validity. The
sequence of decision time means follows cue validity order for take-
the-best users and the sequence of cues during learning (3–1–4–2)
for take-the-first users.
and ordered cue retrieval by validity. Figure 9-4 shows the mean
decision times of both sets of participants. The order of decision
time means for take-the-first users exactly follows the expected
3–1–4–2 cue sequence, indicating that these people retrieved cues
in the order of retrieval ease and stopped search when a discrimi-
nating cue was found. This hardly seems like a coincidence—we
believe it is evidence that the decision times reflect part of the learn-
ing process, particularly showing its sequential nature. The impor-
tant insight is that this reaction time evidence indicates that both
take-the-first users and take-the-best users process cues sequen-
tially and ignore further information in a lexicographic fashion.
Putting Things Together: Insights and Outlooks
The research program summarized here pursued an experimental

approach in order to put claims about the adaptive toolbox in gen-
eral and take-the-best in particular to a strict test. So now what do
we know, 20 experiments and 1,491 participants later? I started out
motivated to smash what I took to be an all-too-simple theory, and
my very first attempts did not look too promising for take-the-best.
But upon relaxing the unrealistic assumption that take-the-best

is used universally, I found that under certain conditions, the
behavior of a considerable proportion of people can best be
described by this simple heuristic. Hence, take-the-best definitely
seems to be a part of the mind’s toolbox. However, as Lee and
Cummins (2004, p. 344) have criticized, “although there is no inher-
ent contradiction in arguing that different people make decisions in
different ways under different conditions, it is a less than com-
pletely satisfying conclusion.” But there are two ways to be unsatis-
fied with this state of affairs: If it is because not everyone is behaving
the same way (e.g., using take-the-best), then we are back to the
position of expecting universal behavior, which we already aban-
doned. (There are other good arguments for expecting variation in
behavior, such as adaptive coin-flipping in which genetic “coins”
are “flipped” in individuals to get varied phenotypic traits—see
Gigerenzer, 2000, pp. 207–209.) On the other hand, we can be
dissatisfied because the picture is incomplete: We want to know
when and why people behave differently, choosing different strate-
gies. This is the path that the rest of our studies pursued, and it led
to a variety of important new questions and insights.
One of these new findings is that people do not seem to always
have a default preference for fast and frugal decision making.
Rather, with inferences from givens, people show a small initial
bias for compensatory decision making. At least in our experimen-
tal settings, participants’ general a priori belief seems to be that
all information could be relevant and needs to be explored, and
they must actively learn to ignore information. A second insight
is that people are generally able to adapt to payoff structures of
environments and use appropriate strategies, whether frugal or
greedy concerning information. Thus, our results regarding adap-
tive strategy selection for probabilistic inference tasks are in line
with those of Payne, Bettman, and Johnson (1988, 1993) for prefer-
ential choices. Inference strategy selection is influenced not only
by obvious task variables such as time pressure (Payne et al., 1988;
Rieskamp & Hoffrage, 1999) and information costs (Bröder, 2000a),
but also by subtle payoff differences. However, not everyone is
sensitive to these differences—adaptivity requires cognitive pro-
cesses that different people can deploy to varying degrees. Our
results regarding intelligence as well as the effects of the secondary
task in Experiment 12 clearly show that available cognitive capac-
ity fosters adaptivity of decision making.
The next insight is that even if this capacity is available in prin-
ciple, it is not always used. This conclusion is suggested by the
massive routine effects observed in Experiments 9 and 10. In these
experiments, most participants stuck to the strategy they had
learned in the first half of the experiment, regardless of whether
they were given a hint about an environmental change, an addi-

tional monetary incentive, or a switch to a different but similar
task. Why are people so reluctant to change a once-successful strat-
egy when it becomes suboptimal? This question can be investigated
at two levels. At the environmental level, one can look at different
task domains and assess whether abrupt changes are common. If
they are, the routine effects observed here are potentially a threat to
adaptivity because people would benefit from being sensitive to
such recurring changes (Todd & Goodie, 2002). If abrupt payoff
changes turn out to be rare, though, then routines may be viewed
as a successful adaptation to avoid switching strategies in response
to false alarms. At the psychological level, a model would have
to explain why people are quick adaptors in a brand-new task
but adjust slowly to payoff changes or task switches. One prelimi-
nary speculation is that after quick but effortful strategy selection,
people adopt a routine mode in which adjustment can better be
described by a slow incremental reinforcement learning process
(e.g., Rieskamp & Otto, 2006). Furthermore, if one accepts the notion
that our collection of personality traits investigated as potential
moderators of strategy preference was not completely out of place,
another finding is that there are probably no stable strategy prefer-
ences related to fundamental personality characteristics.
An additional insight is that memory retrieval of cue information
is indeed a crucial variable that can trigger the use of take-the-best.
For me, this was a most unexpected result. Given the default prefer-
ence for compensatory decision making in low-cost screen-based
experiments, one can hardly escape the conclusion that memory
retrieval appears to incur at least subjective cognitive costs.
With all this in mind, a major implication of the results as a
whole concerns the presumed cognitive costs of decision strategies.
All contingency models of decision making (especially Beach &
Mitchell, 1978; Payne et al., 1993) assume that compensatory strat-
egies are more effortful than noncompensatory strategies (see also
Christensen-Szalanski, 1978; Chu & Spires, 2003). According to our
results, though, it is not the integration of cue information as done
in compensatory strategies that is costly. Rather, it is the retrieval of
cue information from memory if necessary and the operation of the
meta-decision rule to select a strategy in the first place that together
induce cognitive costs. Note that even under heavy cognitive load,
73% of the participants in Experiment 12 were well able to apply
compensatory strategies (53% Franklin’s rule, 20% Dawes’s rule).
Hence, effort–accuracy trade-offs probably concern the strategy
selection process more than the use of the strategy itself.
The data and conclusions we have amassed naturally lead, as
Shepard’s quote intimated at the beginning of this chapter, to a
revised conception of the central questions to study within the
framework of the adaptive toolbox: How exactly do participants

figure out the heuristic–environment match that enables choosing
one appropriate tool from the adaptive toolbox? And when and
how is this apparently effortful process initiated? These questions
of strategy selection are at the heart of all contingency model frame-
works in decision making as well as the adaptive toolbox, but the
topic has hitherto largely been a blind spot in empirical research
except for some speculations (see Payne et al., 1993).
The adaptive toolbox metaphor of the mind as incorporating
a collection of discrete heuristics is attractive on theoretical
grounds, and our experiments have further given it empirical sup-
port. Other metaphors for how we make adaptive inferences have
also been proposed, for instance, a single adjustable power tool
with changeable thresholds for controlling information accumula-
tion (Hausmann, 2004; Lee & Cummins, 2004; Newell, 2005). But
models derived from such metaphors also face challenges, such as
the question of how their thresholds are adjusted. Hermann
Ebbinghaus (1885/1966) formulated a simple but important truth
about metaphors: The only thing we definitely know about our met-
aphors is that they are wrong. Metaphors have limits, and it makes
no sense to ask whether they are true or false. Rather, we should
ask whether they are fruitful for describing phenomena and for
stimulating new research. The results presented in this chapter
indicate that the adaptive toolbox has hitherto been a very fruitful
metaphor. As the chapters in this book demonstrate, filling out
this metaphor and the toolbox, and comparing them to other meta-
phors, remain wide-open empirical and theoretical challenges.
10
Efficient Cognition Through Limited Search
Gerd Gigerenzer
Anja Dieckmann
Wolfgang Gaissmaier
Look and you will find it—what is unsought will go

undetected.
Sophocles
Fight or Run?
Each year in autumn, small but loyal groups of people across
Europe listen to a spectacular concert: the roaring of male red deer
in their rutting season. It might be a matter of taste whether a person
enjoys listening to this performance (also available on CD, and
broadcast live on the Internet via webcam), but for the red deer
stags fighting for control of a harem of mates, it is a matter of genetic
survival. Typically, the roaring is only the first in a sequence of
contests in which the harem holder competes against challengers.
A male red deer has to be in good physical shape to be able to
roar at great volume for some time. If the harem holder roars
more impressively, the challenger may already give up at this point
and walk away. But if the challenger roars with comparable endur-
ance, the next contest is initiated, parallel walking. This contest
allows the competitors to assess each other’s physical fitness at a
closer distance. If this also fails to produce a clear winner, the
third contest is started: head butting. This is the riskiest activity, as
it can result in dangerous injuries (Clutton-Brock & Albon, 1979).
This step-by-step information search allows competitors that are
clearly different in strength to terminate the competition at early
stages, sparing the inferior male the risk of injuries and the superior
male exhausting quarrelling, so that it can save its energy for more
serious challengers (Enquist & Leimar, 1990).
241
Trust or Refuse?
New York City taxi drivers are much more likely to be murdered
than the average worker. When drivers are hailed in the Bronx or
other dangerous neighborhoods, they need to screen potential
passengers to decide whether they are trustworthy. Many drivers
report that in cases of doubt, they first drive past the person to check
him or her out before pickup. If they pick up the wrong client, they
may be robbed or even killed. If they refuse a harmless client, they
will lose money. How do drivers make up their minds? Unlike
the red deer, they must decide rather quickly, and different drivers
can use different cues depending on their past experience. One
New York driver said that a hailer in a suit and tie could be trusted
whereas “people that mostly do the robbing have tattoos on . . .
roughneck dressing with some hoodie, trying to hide some part of
their face.” Another, in contrast, was wary of overdressing, because
“a well-dressed man in a bad area, something has to be wrong”
(Gambetta & Hamill, 2005, pp. 158–159). Some cues for trust, how-
ever, were shared by virtually all drivers, including older over
younger, female over male, and self-absorbed over inquisitive.
Whatever cues were used, they had to be assessed rapidly on the
fly, putting a premium on searching for a few, accurate, readily
observed attributes.
Shoot or Pass?
Handball players face a constant stream of decisions about what to
do with the ball. Pass, shoot, lob, or fake? They have to search in
memory for past experiences to generate appropriate present
options. The question, then, is when to stop generating options and
act upon one of them. In an experiment designed to find out,
skilled players stood in front of a video screen, where 10-second
scenes from top games were shown, ending in a freeze-frame
(Johnson & Raab, 2003). The players were then asked to imagine
that they were the player with the ball and to choose the best play
option as quickly as possible. After this, they could inspect the
still-frozen frame for another 45 seconds and then choose a play
again. With this additional time and information, about half of the
players changed their mind about the best option. But which
was better: the quick intuitive judgment or the one after extended
reflection? On average, the first option that came to players’ minds
was best. If they continued to search in their memory to generate
further options, they were likely to come up with second-best and
third-best options and end up acting on one of those instead.
Sometimes, searching for fewer options is not only faster, but also
better.
EFFICIENT COGNITION THROUGH LIMITED SEARCH 243
Kinds of Search
The red deer, the cab driver, and the handball player all search for
information––in the outside world and inside memory. Human
brains seem to have a large appetite for information, and for that
reason Homo sapiens have been baptized as informavores (Dennett,
1991). In this chapter, we explore two aspects of information
foraging: how people search for information, and when they stop
searching.
Principles of searching and stopping are candidates for the basic
elements of cognition, a set of mental operations that can be
combined to produce the myriad of feats the mind is capable of
performing (Simon, 1979a). One might expect that psychologists
have been busy studying the question of how informavores forage
for information. Instead, there has been a puzzling preference for
theories that ignore search and stopping. In research on categoriza-
tion and inference, for instance, almost all theories––from exem-
plar models to Bayesian models––assume that all of the relevant
information about cues is already given (see Berretty, Todd, &
Martignon, 1999). Similarly, in judgment and decision-making
research, most approaches, including expected utility theories,
prospect theory, and multiple cue probability learning, do not
model information search. This is understandable given that most
experiments conveniently lay out all the information in front of the
experimental participant, obviating search (see Cooksey, 1996;
Fishburn, 2001; Holzworth, 2001). The question asked is, how do
people process given information—in an additive, multiplicative,
Bayesian, or other way? Yet in the real world, the required informa-
tion is not usually handed to the decision maker on a tray (or a
computer screen). Physicians must decide when to stop diagnostic
testing and administer treatment, just as the cab driver must decide
when to stop looking for more cues and then trust or refuse. People
search on the Internet for information about digital cameras, think
about which friend to ask for advice on personal matters, or try to
find a good reason to justify buying a Porsche, and in each case
have to ultimately stop their search and make a choice. These kinds
of search for information are usually sequential. The male deer does
not simultaneously listen to the competitor’s roar, check him out
while running alongside him, and assess the force of impact of
his head, and then weight and add the information to decide
whether to run away or stay. Instead, each cue is checked in order,
only if it is needed to make the decision. The advantage of such
sequential search is that the process can be stopped at any time,
which can save energy, costs, and lives.
Why then is search often neglected in psychological research?
One possible answer lies in the kinds of statistical models
psychologists have adopted, and those they have ignored. The

models used for statistical inference—Fisher’s null hypothesis
testing and Neyman–Pearson decision theory––rely on a fixed
sample of observations and do not involve search and stopping.
In contrast, Wald’s (1947) sequential statistics and its variants, a
development of Neyman–Pearson theory, model when to stop
searching for new observations. For instance, with this approach
the number of participants in an experiment is not fixed but is
determined by a stopping rule that assesses whether enough data
has been collected based on the data itself. But such sequential sta-
tistics are rarely if ever used for hypothesis testing in psychology.
Researchers are mostly familiar instead with the Fisher and
Neyman–Pearson methods, which have in turn inspired many theo-
ries of mind, such as causal attribution theory and signal detection
theory. The consequence is that search and stopping are absent
from these tool-inspired cognitive theories (Gigerenzer, 1991, 2000).
The way people process information in the real world or in a
laboratory experiment depends on whether and how they have to
search for that information. More precisely, the decision rule used—
such as one-reason decision making, additive integration, or
Bayesian classification––depends in part on whether the person
had to search for information or not. If an experiment eliminates
search by presenting all pieces of information simultaneously, par-
ticipants may readily perform some cognitive integration of all or
most of the cues presented. But this integration is less likely if cues
must be searched for in memory. This is why in the first publica-
tions on fast and frugal heuristics (Gigerenzer & Goldstein, 1996;
Gigerenzer & Todd, 1999), inferences from givens (all information
provided to the experimental participant at the time of decision
making, so no search is necessary) were strongly distinguished
from inferences from memory (no information provided at the time
of decision making, so search in memory is necessary). Since that
time, experimental evidence has accrued for the hypothesis that
information processing actually differs in these two situations, as
we shall review in this chapter (see also chapter 9 for more on
related studies). Table 10-1 shows the distinctions between these
two approaches.
The study of search is not simply a prelude to the study of infor-
mation integration that one can choose to ignore; rather, conclu-
sions such as that people integrate information in an additive or
Bayesian way are contingent on the very possibility of search. If
people appear to integrate cues in an additive way when tested
on inferences from givens, the same does not necessarily hold for
inferences from memory, that is, when search must be conducted.
Yet, the great majority of theories and experiments in psychology
Table 10-1: Paying Attention to Search Versus No Search Leads to

Different Perspectives on Cognitive Theories and Experimental
Design
Search No search (inferences
from givens)
Theories Models of search, Models of decision
stopping, and decision (integration) rules only.
rules. Theories reflect Theories are based on the
how search and stopping implicit assumption that
rules constrain decisions search and stopping rules
rules. do not constrain decision
rules.
Experiments Participants make Participants make judgments
judgments based on based on information laid
search in memory or the out in front of them.
world.
eliminate search by exclusively relying on inferences from givens

(the right-hand column of Table 10-1).
From the perspective of ecological rationality, the human mind
is equipped with a toolbox of several search and stopping rules
rather than just one, and these can be used adaptively in response
to different environmental structures. Some environments present
information simultaneously, while in others it is distributed, and
different decision tools will work better or worse depending on this
and other aspects of the environment structure. We address three
questions following from this:
1. Which rules for information search and stopping are in the

adaptive toolbox?
2. In which environmental structures is a given search or
stopping rule ecologically rational?
3. Do people adapt their search and stopping rules to environ-
mental structures?
The classical model for search and stopping is Wald’s (1947)

sequential decision theory, which has influenced many of the psy-
chological theories that model search and stopping (for an over-
view see Busemeyer & Johnson, 2004). The focus of Wald’s theory
is not on models of search—it typically assumes random sam-
pling—but on determining the optimal stopping rule to use to
minimize different types of errors. In this chapter, in contrast, we
focus on satisficing models of search and stopping that work well
without attempting to optimize.
Cues or Alternatives
The goal of search can be cues or alternatives (Table 10-2) or both.
The deer has two essentially predetermined alternatives, to con-
tinue challenging the opponent or to run away, and must just search
for cues as to which option is more promising. Similarly, the taxi
driver has two alternatives, to trust or refuse a potential passenger,
and also just needs to look for cues. How do they search for the cues
they need, and when do they stop? One possibility is with lexico-
graphic heuristics such as take-the-best (Gigerenzer & Goldstein,
1996) and elimination-by-aspects (Tversky, 1972), which model
search and stopping over cues, assuming that the alternatives are
given. Now consider a search for alternatives. The handball player
must search in memory for alternative plays in a game situation.
The hotel guest who keeps pressing the television remote control
is searching for alternatives consisting of the subjectively most
interesting programs. How do athletes and hotel guests search for
alternatives and when do they stop? A second group of heuristics,
including satisficing search using aspiration levels, describes this
sequential search for alternatives (Simon, 1955a; see also Seale &
Rapoport, 1997; Selten, 2001; Todd & Miller, 1999).
This distinction between search for cues and alternatives is not a
mutually exclusive one. Often, both kinds of search are involved
in solving a problem: A challenger deer could look for more cues
about the strength of the current harem holder or could leave this
contest and look for another harem holder to challenge. Similarly,
when people search through alternatives such as potential houses
or spouses, they also collect information about their attributes
(Saad, Eba, & Sejean, 2009). And a channel surfer searching for
better alternatives also has to search for cues while sampling a few
seconds or minutes of each program in order to infer how interest-
ing it might be. Some research has been done that combines both
kinds of search, such as experiments where the information can be
uncovered from a matrix of alternatives and their cues. This design
allows researchers to determine to what degree a person employs
Table 10-2: What to Search for and Where

Search for cues Search for alternatives
Search Recall of cues, based on past Recall of alternatives, based
inside experience or experimental on everyday experience or
memory training experimental training
Search Search for cues in a feature Search for alternative
outside list, on the Internet, etc. websites, items in a
memory store, etc.
alternative-wise search, that is, first looking up all the cue values
of one alternative before examining the second alternative, or cue-
wise search, that is, looking up the value of all alternatives on
one cue before proceeding to the next cue (e.g., Bettman, Johnson,
Luce, & Payne, 1993; Payne, Bettman, & Johnson, 1988; Russo &
Dosher, 1983).
Inside or Outside Memory

What the deer and the taxi driver have in common is that they
search for cues externally, in the environment as opposed to in
memory. In contrast, participants in experiments who answer
general-knowledge questions (as in most overconfidence and hind-
sight bias studies—see Juslin, Winman, & Olsson, 2000; chapter 4)
have to search internally, that is, in their memories, for cues that
might indicate the correct alternative. Table 10-2 crosses the dis-
tinction between search inside and outside memory with the dis-
tinction between cues and alternatives. Note that both kinds of
search are unlike inferences from givens, where no search takes
place. Yet, it is likely that internal search also differs from external
search in its process and outcome. Search inside memory is for
cues and alternatives learned about in the past, where the targets of
search are modified over time by forgetting (see chapter 6), and
retrieval of less active cues tends to take more time. In contrast,
search outside memory is for cues acquired in the present, where
forgetting does not apply and retrieval time plays a smaller role
compared to issues such as monetary costs or the sensory access to
cues, as in the case of the red deer. Furthermore, external search is
typically sequential, whereas internal search could be either
sequential or parallel. The distinction between search inside and
outside memory is again not mutually exclusive; both sources are
often mixed. A taxi driver, for instance, might retrieve information
about past encounters from memory to help assess alternative cus-
tomers and their attributes outside his cab.
Exhaustive or Limited Search

Amazon ranks millions of books. The sheer number prevents
exhaustive search before buying something to read. While a hotel
guest might have the time and inclination to zap through all 78 TV
channels, consumers confronted with overwhelming information
and options have no choice but to employ limited search. As men-
tioned, psychological theories often implicitly assume exhaustive
search; internal stopping rules or external deadlines are not part
of the theories (for exceptions, see Busemeyer & Johnson, 2004;
Payne, Bettman, & Johnson, 1993). Consequently, it is sometimes
implicitly (or even explicitly) further assumed that searching for

all available information and alternatives is the only way to make
the best decision. In this view, the reason that we actually limit
our searches is our limited cognitive abilities, and the outcome of
our ignoring information in this way is our frequent judgment
errors (Tversky & Kahneman, 1974). Yet, recent research has
shown that limited search can sometimes produce better decisions
than can exhaustive search in the appropriate environments (see
Gigerenzer, Todd, & the ABC Research Group, 1999, and various
chapters in this book, including chapter 2). In fact, ignoring infor-
mation is essential for cognitive functioning in an uncertain world,
and thus the cognitive questions are where to search (which cues or
alternatives first), and when to stop. The answers depend on the
environment.
In theories that assume exhaustive search, the order in which
cues or alternatives are searched through is irrelevant. Most models
of judgment and decision making in fact rely on mathematical
structures in which order of information acquisition plays no
role, such as weighted additive and multiplicative integration, or
expected utility maximization and prospect theory. In this view,
experimental findings that judgment is influenced by order are
taken to be aberrations, or even reasoning fallacies––as in primacy
and recency effects, such as order effects in contingency judgment
(Chapman, 1991) or response-order effects in surveys (Krosnick &
Alwin, 1987). Yet attention to order can be crucial in limited
search, and ordered search is one of the key building blocks of the
mind’s adaptive toolbox, as we will discuss below.
Heuristic Versus Optimal Search

There are two visions of how informavores forage: via optimal rules
or using heuristic rules of search and stopping. The term optimal
applies to theories whose goal is to model how to find the best solu-
tion, whereas the term heuristic refers to theories that model how
minds find good-enough solutions. A typical optimal rule for
searching through cues in memory has the following structure:
Search through cues in an order that maximizes a given

criterion.
Several psychological theories postulate versions of optimal

search rules (e.g., Anderson, 1990; Nosofsky & Palmeri, 1997). For
instance, Anderson assumes that memory contents are searched in
an order determined by their probability of being relevant for deci-
sions at the current moment.
Next, consider an optimal rule for stopping search that has this
form:
Stop search when the costs of further search outweigh its

benefits.
This stopping rule aims at the best stopping point, not just a good
one. Models for sequential search with optimal stopping rules have
been proposed in economics (e.g., Stigler, 1961) and psychology
(e.g., Anderson, 1990; Busemeyer & Rapoport, 1988). For instance,
in one of Anderson’s models, search in memory stops when costs of
retrieving the next record (in terms of retrieval time, etc.) exceed
the expected benefit of retrieving it.
Optimization can be feasible in the small world of an experiment
with three or four independent variables, yet it may become sci-
ence fiction for a mind embedded in the large, uncertain world.
Many interesting problems are out of reach of optimization meth-
ods, because optimization is computationally intractable, too slow,
or too expensive (Michalewicz & Fogel, 2000). For instance, this
computational limitation applies to the problem of finding the opti-
mal order of a set of cues for use with a lexicographic decision
heuristic: This problem is NP-complete (Martignon & Hoffrage,
1999), meaning that when there are many cues, finding the solution
becomes computationally intractable. Similarly, finding the opti-
mal stopping point for search can be impossible in real-world set-
tings, unlike in small experimental worlds where the costs of future
search are specified by the experimenter. For instance, economic
theories of optimization under constraints typically assume that
people behave as if they are omniscient, that is, have perfect knowl-
edge of all benefits and costs of further search, and can solve the
differential equations needed to determine the ideal stopping
point where the costs of further search exceed its benefits. To their
credit, proponents of this approach usually make clear that these
optimization models make “the agents in our models more like the
econometricians who estimate and use them” (Sargent, 1993, p. 4).
But when optimization is out of reach, real people can employ heu-
ristic rather than optimal search and stopping rules.
Computational intractability is one constraint on the ideal of
optimal search or stopping rules. Another constraint is estimation
error, or alternatively the need for robustness. Consider once more
the task of estimating the optimal order of cues. In a computer sim-
ulation of a real environment with nine cues for inferring population
sizes of cities, Martignon and Hoffrage (1999) determined the opti-
mal cue order. This involved evaluating 9! = 362,880 orders, which
is computationally tractable (for a computer at least), meaning that
the best order could be found (in this case by exhaustive search).
A heuristic search rule (validity vi, described shortly) led to a cue
order that was better than 98% of all possible orders, although
the optimal order was by definition better. Yet, how well did these
heuristic and optimal orders fare in a new sample, that is, when it
came to foresight (prediction or generalization) rather than hind-
sight (data fitting)? Martignon and Hoffrage split the set of cities
into two halves and for one half (the training set) calculated the
optimal cue order as well as the heuristic (vi) order. This procedure
was repeated 100 times to control for random sampling effects. In
each case, the two orders were tested for their fitting accuracy on
the training set, as well as their ability to make accurate predictions
on the second half of the cities (the test set), a procedure known as
cross-validation. The surprising result was that the heuristic order
led to higher predictive accuracy than the optimal order did. Because
the loss of accuracy when comparing fitting to prediction was lower
for the heuristic order (a decrease from 75% for fitting to 73% for
predicting) than for the optimal order (from 77% to 72%), the heuris-
tic search rule is referred to as being more robust in this situation.
The general point is that an optimal order for one sample of
observations is not necessarily the optimal order for a new sample
(called out-of-sample prediction—see chapter 2). In this sense, a
heuristic search rule can perform “better than optimally” in predic-
tion, although not in fitting. Robustness is also crucial if the train-
ing and test sets are not random samples from the same population
but from two populations that differ in unknown respects (out-of-
population prediction). Out-of-population prediction is the rule
rather than the exception in medicine, for instance, where a diag-
nostic system with various cues or predictors is developed using a
sample of patients in one hospital and then applied to patients in
other hospitals and geographic locations, who belong to popula-
tions that differ in unknown ways (see chapter 14 for a discussion
of this in the context of building robust medical decision trees).
Learning How to Search and When to Stop

How does the red deer know the order in which to search for cues
on whether to fight or run? There are three potential sources of this
knowledge: evolution, social learning, and individual learning.
Evolution is slowest in finding cue orders, but it removes the need
for any given individual to find the order independently. Each red
deer already comes equipped with a search rule for assessing com-
petitors and does not have to learn everything the hard way by
experience. In contrast, much of what a taxi driver knows and
believes about potential passengers is socially transmitted, learned
from others’ advice and experience. Similarly, medical students are
instructed by their teachers on what cues to look for and in what

order for diagnosing a heart condition. Social learning works
through mechanisms such as teaching, imitation, and observing
the behavior of others (Laland, 2001). It is faster than individual
learning—as a proverb says, only a fool learns from his own mis-
takes; a wise man learns from the mistakes of others. But there are
also many cases when individuals must learn for themselves first-
hand via immediate experience, as in the shaping of behavior
through operant conditioning or learning what cues to check first
when you are looking for a restaurant in a new city (see chapter 11
for an investigation of simple learning rules that are useful in this
situation). Finally, these three types of adaptation are not mutually
exclusive but can be combined.
Scope
This chapter presents models of how people search for cues to make
a decision when optimization is out of reach. The decision task we
focus on is an inference between two alternatives. An inference
has a clear criterion—that is, the decision can be proven to be right
or wrong. The male deer’s decision to fight can be proven right if
he wins and wrong if he loses, and the New York taxi driver will
soon find out whether someone is honest or dodgy after deciding
that the person looks trustworthy. We do not deal with preferential
choices where direct feedback about right or wrong does not exist,
such as whether to marry or not, or whether to have chocolate or
vanilla ice cream for dessert. It is nevertheless possible that the
search rules we consider for inference also apply to preference.
Experimental research conducted to test search and stopping
rules is more mundane than the risky real-life decisions of taxi
drivers. Typically, participants have to use a set of binary cues to
infer which of two alternatives has the higher value on some crite-
rion, such as which of two shares will earn more money, which of
two suspects committed a murder, or which baseball team will win
a game. But a number of the search and stopping rules we discuss
here have been generalized to other tasks as well, such as classifica-
tion (Berretty et al., 1999; see also chapter 14) and estimation
(Hertwig, Hoffrage, & Martignon, 1999; see also chapter 15).
Building Blocks of Heuristics
Search rules and stopping rules are two of the building blocks of
heuristics. Particular building blocks in a given heuristic specify
what information to look for (search), how long to search for it
(stopping), and what to do with the pieces of information found
(decision). Consider the task of inferring which of two alternatives

has the higher value on a criterion based on binary cues, where the
cue values “1” and “0” indicate high and low criterion values,
respectively. If the values of a cue differ for the two alternatives
(0 for one of them and 1 for the other), we say that this cue dis-
criminates between the alternatives. The take-the-best heuristic is a
model of how people solve these kinds of tasks using limited infor-
mation search, and it consists of the following three building blocks
(Gigerenzer & Goldstein, 1996, 1999):
Step 1. Search rule: Search for cues in order of their validity

(proportion of right answers they lead to).
Step 2. Stopping rule: If cue values discriminate between the
alternatives, then stop search and proceed to Step 3.
Otherwise return to Step 1 to search for another cue.
(If no cues are left, choose an alternative at random.)
Step 3. Decision rule: Predict that the alternative with the pos-
itive value (“1”) on the discriminating cue has the
higher criterion value.
Now consider a different process model that relies on exhaustive

rather than limited information search, stopping search only after
all cue values are considered. It is known as tallying or Dawes’s
rule (after the seminal work by Dawes, 1979, on unit-weight linear
models) and employs these specific building blocks:
Search rule: Search for cues in random order.

Stopping rule: Stop only after all cue values have been
looked up.
Decision rule: Predict that the alternative with the larger number
of positive cue values (“1”s) has the higher criterion value. In
case of a tie, choose an alternative at random.
In both of these process models, the search and stopping rules

are “tuned” to each other. Dawes’s rule does not search for cues in
any particular order, just as order of processing is arbitrary for
weighted linear models. To compensate for this, it relies on exhaus-
tive search. Take-the-best, in contrast, searches for cues in order of
their validity, so that the one-reason stopping rule that ends search
on the first discriminating cue becomes reasonable. Each of the
three building blocks describes a process step, and whether or not
people use each of them can be tested independently. For instance,
the results of a number of experiments indicate that people’s
decisions are often consistent with the one-reason stopping rule of
take-the-best when it is ecologically rational (e.g., Bröder, 2003;
Bröder & Schiffer, 2003a, 2003b, 2006b; Rieskamp & Otto, 2006; see
chapter 9), and the use of this stopping rule predicts decision times
in individual judgments (Bröder & Gaissmaier, 2007).
In this chapter, we analyze adaptive decision making at the
level of search and stopping rules rather than of entire heuristics.
We will not deal with decision rules, except for the constraints
that particular stopping rules impose on decision rules. In what fol-
lows, we first describe several alternative search and stopping
rules and then address their ecological rationality and the empiri-
cal evidence for their use in different situations.
Models of Search
We distinguish two major ways in which cues are encountered:

random search and ordered search. Order in turn can be based on
discrimination, validity, recency, fluency, or ecological accessibility.
(Many of these classes of rules also apply to search for alternatives.)
Random Search
An elementary form of search is:
Random search: Go through cues in a random (or unsystem-

atic) order.
Random search for cues is part of the minimalist heuristic (which

has the same stopping and decision rules as take-the-best; Gigerenzer
& Goldstein, 1999). It is also an element of some heuristics that
search through alternatives, such as satisficing, where the alterna-
tives are encountered arbitrarily and not under the decision maker’s
control (Selten, 2001).
When is random cue search ecologically rational? Our hypothe-
sis is that random search is as accurate as or better than ordered
search (as in take-the-best) in environments where:
1. One needs to explore because one knows little about cue

validities, or
2. Cue validities are equal.
Imagine a physician working for Doctors Without Borders con-

fronted with an unknown infectious disease. She has to learn indi-
vidually by feedback and cannot rely on social heuristics such as
advice to order cues according to their validity. She is confronted
with two tasks: to learn which cues (symptoms) are valid predic-
tors, and to determine whether patients are infected or not. If the
physician started immediately with applying a lexicographic heu-
ristic for diagnosing patients while at the same time having to learn
a good order of cues, learning could be slow and some good cues
might get “stuck” low in the cue order so that they would seldom
be used (Todd & Dieckmann, 2005; see chapter 11). One way to
alleviate these problems would be to start with random rather than
ordered search, which promotes learning about all cues equally,
and not just those that happened to be ranked high in the initial
orderings. After a head start with random search, the physician
should then switch to ordered search. Random search corresponds
to an exploration phase, to be distinguished from an exploitation
phase, in which a heuristic exploits the cue structure in the envi-
ronment. Similarly, when learning samples are small and many cue
values are unknown, random search can lead to predictions as
accurate as those produced by search by validity and substantially
more accurate than those of multiple regression (Gigerenzer &
Goldstein, 1999).
For the second condition, it is easy to see that if cue validities are
about equal, then order of cues does not matter for accuracy. This is
one of the conditions that make Dawes’s rule ecologically rational
and enable it to predict more accurately than multiple regression
(Hogarth & Karelaia, 2005a, 2006b, 2007; Martignon & Hoffrage,
2002; see also chapter 3).
Search Using Discrimination and Validity

Red deer stags behave in a different way from the random search
just described, searching for information about their competitors in
a particular order. Ordering can be determined by several factors,
including how accurate, cheap, and accessible the cues are, and
how well they discriminate between alternatives. The discrimina-
tory ability of a cue, that is, its discrimination rate, corresponds to
the variance of the alternatives’ values on this cue. The discrimina-
tion rate di of a cue i is
Di
di =
P
where Di is the number of pairs where the values of cue i differ

between alternatives, and P is the total number of pairs that can
be formed out of N alternatives. This leads to the following
search rule:
Search by discrimination: Go through cues in the order of di.
The ability of a cue to discriminate is not the same as its ability

to predict well. In fact, they may be inversely correlated. For
instance, when a university department wants to find the best
among a set of applicants for a professorship, gender is a cue that

discriminates highly––if there are approximately as many female
applicants as male ones––but has low predictive power for future
performance. In contrast, having a Nobel Prize or not is a cue with
a rare positive value and thus has little discriminatory power
but high validity.
The predictive power of a cue i can be defined by its validity vi:
Ri R
vi = = i
Ri Wi Di
where Di is the number of pairs where the values of cue i differ

between alternatives, and Ri and Wi are the number of right and
wrong predictions, respectively, by cue i among the Di pairs. Thus,
cue validity is the probability that the decision will be correct if the
cue discriminates. If vi = .50, the predictive power of cue i is at
chance level; values greater than .50 mean that the cue can predict
better than chance. The following heuristic search rule results:
Search by validity: Go through cues in the order of vi.
In many experiments reporting that people often search on the

basis of cue validity, the discrimination rate was set to be the
same for all cues so that it could not be used to order search (e.g.,
Bröder, 2003; Rieskamp & Otto, 2006; see chapter 11 for a discus-
sion and alternate experimental design). When discrimination rates
vary widely, however, an adaptive decision maker might use a
search rule that reflects both validity and discrimination. Let us
call the blending of validity and discrimination rate the usefulness
ui of cue i:
ui = vidi
This leads to the following search rule:
Search by usefulness: Go through cues in the order of ui.
The multiplication of validity by discrimination rate might

appear to create an unrealistically complex search rule. But there
exists a simple shortcut for computing usefulness that the mind
can implement without performing multiplication. By multiplying
the discrimination rate and validity, Di cancels out and we get
Ri Di Ri
ui = × =
Di P P
In words, the usefulness of cue i is the number of correct predic-

tions Ri made by that cue among all P pairs of alternatives. An agent
therefore does not need to mentally compute and multiply the
two rates v and d but can simply search for cues in order of their
number of correct answers Ri (assuming that each cue has been
assessed in equal-sized samples of the population of P pairs of
alternatives, e.g., during exploration; see chapter 11 for similar
count-based ordering rules).
An alternative way of balancing validity and discrimination rate
was proposed by Martignon and Hoffrage (1999) and called search
by success. Like usefulness, success incorporates both the validity
and the discriminating power of cues but also includes successful
guesses. The success si of a cue i amounts to its usefulness plus the
proportion of correct decisions expected from guessing:
Ri ( P − Di ) 1 − di
si = = ui +
P 2
where P−Di is the number of pairs in which a cue i does not dis-
criminate. The corresponding search rule is:
Search by success: Go through cues in the order of si.
There is also a computational shortcut that results in the same

order of cues. If cues are presented one at a time, and inferences are
made based on only this one cue, the decision maker can simply
monitor the total number Ti of correct inferences (including correct
guesses) made with each cue. That is, instead of monitoring Ri
(the shortcut for ui), one has to monitor Ti = Ri + P(1–di)/2 (again
assuming equal sample sizes for all cues). Ordering cues by success
is isomorphic to ordering cues by expected information gain
(Oaksford & Chater, 1994), by expected change in belief (Klayman &
Ha, 1987), and by the information measured in Shannon’s (1948)
information theory (see Rakow, Hinvest, Jackson, & Palmer, 2004).
Despite their similarities, usefulness and success do not generally
lead to the same rank order of cues. Different rankings can occur
when the discrimination rates differ.
These forms of ordered search that incorporate validity (e.g., the
validity, usefulness, and success search rules), when combined
with one-reason stopping, do well in environments where
1. Cue validities show a large variability,

2. Cues are moderately to highly correlated (i.e., redundant),
and
3. Cues are costly.
The reason for conditions (1) and (2) can be intuitively under-
stood by considering their converse: If cue validities were all equal,
then ordering them would be pointless (as in the condition for
random search earlier), and if all cues were independent of each
other, then relying on one reason alone would result in inferior
predictions compared to combining multiple independent pieces
of information. In environments where (1) and (2) hold, search by
validity, combined with one-reason stopping, typically generates
more accurate predictions than multiple regression and other linear
strategies that weight and add all pieces of information (Czerlinski,
Gigerenzer, & Goldstein, 1999; Dieckmann & Rieskamp, 2007; see
more on the effects of redundancy in chapter 8). Note, however,
that condition (1) only benefits ordered search when the decision
maker knows the order of validities. If this order has to be estimated
from small samples (as discussed in chapters 2 and 11), then ordered
search in combination with one-reason stopping runs the risk of
betting on the wrong cue. Condition (3) has been frequently dis-
cussed in the literature (e.g., Payne et al., 1993). When search is
limited, it is important to rank cues by validity, discrimination rate,
or a combination of both in order to save costs by increasing the
chance of finding a good cue first, or a cue that allows for a quick
decision. We present a formal definition of costs later in the section
on stopping rules.
When are usefulness and success search rules more appropriate
than validity alone? Note that they only differ from validity when
the discrimination rates vary. If an organism’s goal is solely accu-
racy, these search orders cannot beat validity order. But when fru-
gality, defined as the number of cues looked up before search is
stopped, and decision speed matter in addition to accuracy, then
search by usefulness and by success could be preferable to search
by validity if
1. Discrimination rates vary highly, and

2. Cues are costly.
Under these conditions, search by usefulness and success can stop

sooner than search by validity and so lead to lower search costs.
Finally, when is heuristic search preferable to computing opti-
mal cue weights? Note that these rules for ordered heuristic search
have a common feature that distinguishes them from conventional
models of rational inference: They ignore dependencies between
cues, whereas “rational” models typically compute cue weights
conditional on other cues, as exemplified by partial correlations in
multiple regression or conditional probabilities in Bayes’s rule.
That is, each search rule orders cues in a simple way that looks at
each cue independently of the others, consistent with the empirical
evidence that people are well able to use ecological validities,

but not the correlations between cues in multiple cue learning
(Armelius & Armelius, 1974). To the theorist who works with com-
plex decision trees or regression models, these search rules might
appear laughably simple and doomed to fail. However, the seminal
work by Dawes (1979; Dawes & Corrigan, 1974; see chapter 3)
showed that computing beta weights in regression is not necessary
to achieve predictive accuracy in complex real-world situations,
and that unit weights can do as well or better. Thus, what we term
Dawes’s rule can be more accurate than multiple regression, despite
the latter using estimates of the optimal beta weights. More recently,
others have shown that search by validity and other simple orders,
combined with a one-reason stopping rule, can often match and sur-
pass the predictive accuracy of multiple regression (Czerlinski et al.,
1999; Gigerenzer & Goldstein, 1999), neural networks, exemplar
models, and decision trees (Brighton & Gigerenzer, 2008; Chater,
Oaksford, Nakisa, & Redington, 2003; see also chapter 2 for details).
Recency Search
The search rules proposed so far are adapted to environments that
are relatively stable. This is the case for many tasks, such as judging
the trustworthiness of potential passengers. But if a New York
taxi driver were to move his business to Belfast or Cairo, some of
the relevant cues or their order would probably change. How can
search adapt to such a changing environment? When environments
change quickly, social learning of cues, by advice or imitation, is
one option (Boyd & Richerson, 1985). Individual learning could be
possible by resetting the reference class, from New York to, say,
Belfast, and starting fresh. Another way an individual could learn
would be to exploit a cognitive limitation, namely, a short memory
window produced by forgetting (Anderson & Schooler, 2000; see
chapter 6). The simplest such strategy is to search through the most
recent experiences in the following way:
Recency search: Search for cues in order of their most recent

discrimination (i.e., whichever cue discriminated a decision
pair most recently is checked first).
This search rule is implemented in the take-the-last heuristic

(Gigerenzer & Goldstein, 1996, 1999). Unlike the search rules
already discussed, search by recency does not involve any count of
frequencies. When the environment provides feedback, recency
search can alternatively attend just to correct decisions:
Recency search by correct decisions: Search for cues in order

of most recent correct decision.
This search rule is identical to recency search by discrimination,

except that it is guided by whether a cue yielded a correct answer
and not by whether a cue merely discriminated and led to either a
correct or incorrect answer. It provides a model for the mechanism
underlying one of the central findings in the problem-solving lit-
erature, that people tackling a new problem tend to try the solution
that was successful the last time they faced a similar problem
(Luchins, 1942)—a strategy sometimes referred to as “Einstellung”
or “mental set.”
Recency search does not need an extended learning phase; it
adapts on-line to every change in cue performance and requires
only a small cue order memory. Thus it is ecologically rational in
variable environments where cue validities and discrimination
rates can change rapidly.
Fluency Search
Related to recency search is ordering cues based on their fluency,
that is, by how quickly they come to mind. Fluency is in some
sense a passive rather than an active ordering principle driven
by the experiences that an individual has had. As illustrated earlier
with the example of the handball players (Johnson & Raab, 2003),
there are situations in which experts are well advised to rely on the
first option they think of (Gigerenzer, 2007). The idea that fluency
of recall can be used as a cue in inferential judgment goes back to
research on the availability heuristic (Tversky & Kahneman, 1974),
which assumes that people use the ease of retrieving instances, or
the frequency of the instances they retrieve, to assess the probabil-
ity of events. The availability heuristic has been criticized because
its underlying processes are not precisely defined (e.g., Fiedler,
1983; Gigerenzer, 1996). But Schooler and Hertwig (2005) demon-
strated a way forward, using the ACT-R framework (Anderson et al.,
2004) to specify the related fluency heuristic and thereby produce
testable predictions about the efficacy of using fluency information
in different environments (see chapter 6 for details). The fluency of
retrieving information is often informative because it correlates
with the frequency and recency of encountering information in the
environment (Anderson & Schooler, 1991). This has primarily
been studied regarding comparisons between alternatives, such as
which music artists have higher album sales (Hertwig, Herzog,
Schooler, & Reimer, 2008) or which of two cities is larger (Schooler
& Hertwig, 2005). Gaissmaier (2008) extended the idea of informa-
tive fluency use from alternatives to the ordering of cues.
An important distinction needs to be made between fluency
search and the ordering principles described so far, in that fluency
is an attribute of particular cue values and not of cues per se. That
is, while other orders assume that the values of two alternatives are
compared on each cue, the fluency order dispenses with a cue-wise

search. Instead, evidence for and against each alternative is gath-
ered sequentially until search stops (defined by a stopping rule,
e.g., when a threshold of difference in evidence is reached), so that
alternatives are not necessarily compared on the same cues. For
example, someone could quickly retrieve the fact that Hamburg has
an airport and that Heidelberg does not have a soccer team in the
premier league and use this information to infer that Hamburg is
the larger city. Fluency search is defined as follows:
Fluency search: Rely on cue values in the order in which they

come to mind.
In general, fluency-based cue search is likely to be ecologically

rational when fluency of recalling particular cue values is corre-
lated with the decision criterion. For example, Gaissmaier (2008)
found that strategies using fluency search were both highly accu-
rate and frugal in predicting which of two German cities is larger.
In this environment, positive cue values (indicating the presence of
a feature, such as an airport) were more fluently recalled for larger
cities. At the same time, negative cue values (indicating the absence
of a feature) were more fluent for smaller cities. Furthermore, flu-
ency search had the potential to protect people against using incor-
rect cue values (i.e., values remembered incorrectly), because they
were retrieved more slowly on average than correct ones, and so
were less likely to be used in making a decision. This is congruent
with findings in the memory literature that producing incorrect
answers often takes longer (e.g., Ratcliff & Smith, 2004).
Search by Accessibility
Fluency search refers to the situation where memory access orders
the cues for us. When cues are available externally rather than in
memory, often the environment orders information for us. Some
cues, for instance, could be more readily accessible to the senses
than others, as the case of the red deer illustrates. The red deer
checks cues in a fixed order, following the reach of the senses:
Acoustic signals are available first (particularly in a forest environ-
ment), visual cues require closer proximity, and tactile signals
necessitate physical contact. This sensory order is also adaptive
because it correlates with the risk of being hurt while assessing the
cues. In mate choice, speed of accessibility can order cues: Physical
attractiveness of a potential mate is easy and quick to assess, while
it takes time and effort to find out about intelligence and sexual
fidelity (Miller & Todd, 1998). Although cues frequently vary in
their accessibility, we know of no studies that systematically
address the question of how this influences information search.

This topic deserves more attention.
Models of Stopping Search
Search rules give search a direction, whereas stopping rules specify

when to stop looking for more cues. Together, they define Simon’s
(1990) concept of limited search. As mentioned earlier, heuristic
rules for stopping do not try to optimize, such as by finding the
point where the expected costs of further search equal the expected
benefits of search. Rather, heuristic stopping is guided by simple
principles, two of which we consider next: the number of discrimi-
nating cues found, and the number of cues searched for overall.
Stopping After Discriminating Cues

A most frugal stopping rule forms the foundation of the family of
one-reason decision heuristics, including take-the-best, take-the-
last, and the minimalist heuristic (Gigerenzer & Goldstein, 1996):
One-reason stopping: Stop search after the first discrimi-

nating cue.
This rule is frugal because it only allows cues to be checked until

one is found on which a decision can be based. It enables fast deci-
sions as well, because only a single cue need be considered for
making a choice, so no combining or comparing is required. It is
also part of fast and frugal trees for classification (see chapter 14).
When is one-reason stopping ecologically rational? We discuss
four features of appropriate environments: noncompensatory dis-
tribution of cue weights, scarce environments, cue redundancy,
and costly cues.
Noncompensatory Information Consider an environment with M binary

cues c1, . . ., cM, with regression beta weights w1, . . ., wM. Cues
are ordered by their weight, with cue c1 having the greatest weight.
An environment is called noncompensatory if every weight wi is
larger than the sum of the weights wi+1 + wi+2 + . . . + wM (Martignon
& Hoffrage, 2002). An example is the set of weights 1, 1/2, 1/4,
1/8, 1/16. In a noncompensatory environment, search by validity
combined with one-reason stopping (and basing the decision on
that cue), as in take-the-best, leads to an accuracy that no linear
strategy can beat when using the same order of cues (Martignon &
Hoffrage, 1999, 2002). Note that this result assumes that the order
of cue validities is known (as when participants are informed about
that order) as opposed to when the order has to be estimated from

sampling (Gigerenzer & Brighton, 2009). If the cue weights are
skewed rather than strictly noncompensatory, this combination of
heuristic rules still on average matches or beats Dawes’s rule and
multiple regression (Hogarth & Karelaia, 2007; see also Hogarth &
Karelaia, 2005b, for similar results on a generalization of take-the-
best to more than two options, called deterministic elimination-by-
aspects).
Scarce Information In the language of information theory, M alterna-

tives can be coded by log2M binary cues. If there are fewer cues than
this, we say that this environment has scarce information. In such
environments, search by validity and one-reason stopping (as in
take-the-best) is, on average, more accurate than using all M cues
(e.g., by tallying; see Martignon & Hoffrage, 1999).
Redundant Information Essentially, two cues are said to be redundant if

knowing the value of one tells us something about the value of the
other. In contrast, two cues are nonredundant or independent if
knowing the value of one tells us nothing about the other. One way
to define redundancy is by means of the correlation between cues.
It is easy to see that the higher the redundancy between cues, the
less information any further cues can add to the first cue, which
makes one-reason stopping ecologically rational (Hogarth &
Karelaia, 2005a; see chapter 8). An extreme case of cue redundancy
can be observed in what Gigerenzer and Brighton (2009; see chap-
ter 2) called the Guttman environment, where all the cues have
validity 1.0 and are highly correlated with both other cues and the
criterion but vary in their discrimination rates. In Guttman envi-
ronments, take-the-best outperforms a greedy counterpart of itself,
which orders cues by conditional validities (i.e., modifying cue
validity computations as a function of what cues come earlier in
the cue order).
Information Costs We define the relative information cost I as the mon-

etary cost c of a piece of information (e.g., the values of two alterna-
tives on a particular cue) relative to the monetary gain g of a correct
answer:
I = c/g
Consider paired comparison tasks in which a correct decision

yields $1, and a cue costs 50 cents, so that I = 0.50/1 = 1/2. Here, the
expected value of guessing (0.5 × $1 = 50 cents) cannot be surpassed
by looking up a cue (which costs 50 cents, and may or may not be
offset by a gain of $1, so that the maximum possible net gain would
be 50 cents). Therefore, to induce any motivation for search (as

opposed to mere guessing), an environment must have the feature
I < 1/2. Moreover, the higher the relative information cost I, the
more one-reason stopping saves costs compared to stopping after
two or more cues.1
One-reason stopping is of course not ecologically rational in
every situation. The logical extension of one-reason stopping is to
stop if two pieces of useful evidence are found:
Two-reason stopping rule (confirmative rule): Stop as soon as

two discriminating cues are found that point to the same alter-
native.
The intuition behind this “delayed” stopping rule (called take-two in

chapter 8) is that one searches until finding a confirmation of what
the first cue indicated. Bruner, Goodnow, and Austin (1956) were
among the first to present experimental evidence that people some-
times look for confirming evidence so that their conclusion is doubly
sure. This stopping rule underlies what is known as the conjunctive
rule, a noncompensatory rule that requires that the values of two
cues reach a certain threshold, whereas a high value on only one
cannot compensate for a low value on the other (Einhorn, 1970).
When is the two-reason confirmative rule ecologically rational?
Karelaia (2006) showed that this stopping rule, which she calls CONF,
is remarkably robust and insensitive to cue ordering. Thus, it hedges
bets when it is difficult to estimate a good cue order, as when a person
is confronted with a new task. From Karelaia’s analysis, one can con-
clude that the confirmative rule works well in situations in which the
decision maker knows little about the validity of the cues, and the
costs of cues are rather low (see chapter 8 for more analysis).
A further extension is to stop after finding m discriminating cues
that are all in agreement. For instance, the work by Meyers-Levy
(1989) on advertisement and consumer behavior suggests that men
stop earlier than women do when looking for cues regarding poten-
tial purchases and base their judgments on only one reason, good or
bad, whereas women are less selective and search for more cues.
Meyers-Levy argues that companies tend to design one-good-reason
advertisements for men, but more elaborate ones for women.
1. Another way to see this is to start with the requirement that the
minimum gains minus costs for checking one cue, g−c, must be greater
than the expected reward from guessing, g/2, or g−c > g/2. Then g/2 > c,
1/2 > c/g = I, and I < 1/2.
Stopping After a Fixed Number of Cues

Diagnostic guidelines sometimes contain the rule “always check the
following two (or three, . . .) cues before you make an inference.”
Here one looks at the same fixed number of cues for each decision,
whether they discriminate between alternatives or not. For avoiding
avalanche accidents, for instance, there exist several decision aids
to help evaluate the current avalanche hazard. One of these, the
“obvious clues” method, says to check seven cues around the slope
of interest (McCammon & Hägeli, 2007). These cues include whether
there has been an avalanche in the last 48 hours and whether there
is liquid water present on the snow surface as a result of recent
sudden warming. When more than three of these cues are present,
the slope is considered dangerous—a simple method that could
have prevented 92% of the historical accidents where it was appli-
cable. Similarly, medical students are often taught to look at a cer-
tain complete set of cues before making a diagnosis, and often in a
prescribed order. Stopping after a fixed number of cues amounts to:
Fixed-number stopping rule: Stop search after the values of m

cues have been looked up (whether the cues discriminate
between the alternatives or not).
The simplest case is:
Single-cue stopping rule: Stop search after the values of one

cue have been looked up (whether the cue values discriminate
or not).
Unlike in one-reason stopping, the cue values may or may not

discriminate. For instance, the minimax heuristic is based on a sin-
gle-cue stopping rule: When choosing between alternatives, only
look at the minimum outcomes associated with each, ignoring
everything else, and choose the one with the largest (i.e., maximum)
minimum outcome. The motivation is to avoid the alternative
with the worst possible outcome. (See Thorngate, 1980, for other
heuristics that use single-cue or two-cue stopping.) A related
rule in the machine learning literature, known as 1R (Holte, 1993),
operates with continuous cues. In contrast, lexicographic heuris-
tics use a one-reason stopping rule, which allows different deci-
sions to be based on different reasons (see, e.g., the priority
heuristic—Brandstätter, Gigerenzer, & Hertwig, 2006).2
2. Hybrid stopping rules also exist that combine stopping based on a

number of discriminating cues with stopping after a fixed number of cues.
For example, when information is costly, decision makers might aspire
When is the single-cue rule ecologically rational? Computer sim-

ulations indicate that when used together with search by validity, it
has a higher predictive accuracy than Dawes’s rule and multiple
regression if the variability of cue weights is high, and the cue
redundancy is moderate (average r = .5; see Hogarth & Karelaia,
2005a). These are essentially the same ecological rationality condi-
tions as for the one-reason stopping rule. The key factor that distin-
guishes these two stopping rules is cue cost. If the relative cost I of
each cue is greater than half the payoff of guessing (1/2), that is, 1/4
< I, then there is no monetary incentive for searching for a second
cue, even if the first did not discriminate.3 Here, the single-cue rule
is superior to the one-reason stopping rule, because the latter will
often lead to checking a second cue.
Dependencies Between Search, Stopping, and Decision Rules

The process models discussed in this chapter incorporate a tem-
poral sequence: searching first, stopping search second, and making
the decision last. Thus, the question arises whether a specific search
rule can constrain the range of the stopping and decision rules it
is used with, and whether the stopping rule constrains the deci-
sion rule. It is easy to see that search rules do not constrain stop-
ping rules: Each stopping rule can be combined with each of the
search rules defined earlier. Similarly, because decision rules—such
as one-reason decision making, adding, averaging, Bayesian pro-
cessing, and so on—do not depend on the order of the cues looked
up, search rules also impose no constraints on the decision rule.
However, because decision rules do depend on what cues (includ-
ing how many) they have available, stopping rules can constrain
the range of possible decision rules. Specifically, one-reason stop-
ping implies one-reason decision making, as in take-the-best, as it
excludes all decision rules that weight and add the values of
multiple cues. The reverse does not hold: After looking at many
reasons (i.e., having a more lenient stopping rule), one can employ
a decision rule that integrates all this information, or one can nev-
ertheless rely on only one reason in making a decision.
to use one-reason decision making but at the same time set themselves a
limit as to how much information they will maximally purchase, such as
“Stop when a discriminating cue is found, but only look for a maximum
of m cues. If no discriminating cue is found by that point, then stop search
and guess.”
3. Again we can see this by starting with the requirement that the maxi-
mum possible gain after checking two cues, g−2c, must be greater than
the gain from just guessing (without checking any cues), g/2, or g−2c > g/2,
g/2 > 2c, g > 4c, 1/4 > c/g = I.
The constraint imposed by stopping rules on decision rules leads

to the hypothesis that experimental designs that call for search
and stopping will induce different behavior from those that display
all information in front of the participant and so require no stop-
ping rule. Specifically, designs that involve search in memory or in
the environment should lead more often to one-reason decision
making than designs where all pieces of information are displayed.
In the next section, we explore how these implications have played
out in a range of experiments.
Do People Adapt Search and Stopping Rules to Environmental Structures?
Can we use the heuristic rules and their match with specific envi-
ronmental structures, as defined in the previous sections, to predict
what heuristics people will use in particular environments? The
logic is first to analyze the match between various search and stop-
ping rules and particular experimental settings and then to see
whether people use these in an adaptive, ecologically rational way.
The ideal study of the adaptive use of search and stopping rules
would implement different relevant environmental structures as
independent variables and then analyze whether the distribution of
search and stopping rules used by people in those environments
changes as predicted by their ecological rationality. Such studies
exist, but the majority have tested only one rule or one heuristic
(often just take-the-best) in one or two environments. Therefore,
part of the evidence concerning the adaptive use of search and
stopping rules is indirect, based on comparisons between experi-
ments. We look first at experiments pitting inferences requiring
search against inferences from givens, and then turn to experiments
involving search for cues in environments with particular types of
structure.
Search Versus Givens
Search in Memory Versus Inferences From Givens

The search and stopping rules of take-the-best and other heuristics
were formulated to contrast these heuristics with decision making
without search, as in inferences from givens. Originally, these
rules were conceived of as models of search in semantic long-term
memory (rather than of search outside memory; see Gigerenzer
& Goldstein, 1996; Gigerenzer & Todd, 1999). The prototype of
take-the-best was a process model for inferences from memory
(Gigerenzer, Hoffrage, & Kleinbölting, 1991). The experimental tests
of these theories supported the use of the building blocks of take-

the-best but did not involve direct tests of whether inferences from
memory promote specific search and stopping rules compared to
inferences from givens. Because searching for cues in memory
involves significant search costs, including retrieval times that
tend to increase with each successive cue value one recalls and the
possibility of retrieval failures and forgetting, we can hypothesize
that decision makers will aim to limit their memory search:
Hypothesis 1: Inferences involving internal memory search

increase one-reason stopping, compared to inferences from
givens.
A series of experiments by Bröder and colleagues (see chapter 9)

consistently demonstrated that inferences from memory are better
predicted by the search and stopping rules of take-the-best than
by those of Dawes’s rule or guessing. They also indicate that
inferences from memory tend to elicit more ordered search and
one-reason stopping than inferences from givens, supporting this
hypothesis.
However, one methodological problem with studying inferences
from memory is that one cannot observe search directly. Bergert
and Nosofsky (2007) provided decision time analyses suggesting
that a vast majority of participants followed one-reason stopping
in an inferences-from-givens paradigm in which search was simi-
larly not observable directly. Building on this idea, Bröder and
Gaissmaier (2007) reanalyzed decision times from five published
experiments (and conducted one new experiment). Congruent with
take-the-best’s search rule, in all instances in which decision out-
comes indicated the use of take-the-best’s decision rule, decision
times increased monotonically with the number of cues that had to
be searched in memory according to take-the-best. In contrast, par-
ticipants classified as using compensatory strategies were expected
to search for all cues, and those using nonsystematic guessing
were expected to search for none—and in line with this, there was
little or no increase in decision times for those participants (see
chapter 9 for more details).
External Search Versus Inferences From Givens

When people have to search for information in the external world,
in contrast to memory search, one can typically access all cues with
some modest speed and cost (perhaps even the same cost, in set-
tings like information boards), and forgetting can be overcome by
looking up cues again. While this makes external search much more
like inferences from (external) givens, the crucial difference is that
in the former there are still noticeable costs associated with deter-
mining what information to seek next and then actually obtaining
it, even if this just means clicking on it with a mouse, rather than
merely casting one’s eyes over a table of cues already laid out. These
appreciable costs lead to a hypothesis that exactly parallels the one
for memory search:
Hypothesis 2: Inferences involving external search increase

one-reason stopping, compared to inferences from givens.
Bröder (2000a; see also chapter 9) directly compared the effect of

external search versus inferences from givens and found that infer-
ences from givens led to one-reason decision making in just 15%
of the cases. The same low rates held for search where cues did
not cost anything. But when cues had to be searched for (and paid
for), the percentage of participants classified as using take-the-best
rose to 65%. This suggests that take-the-best can describe the cogni-
tive processes of decision making based on search outside of
memory, not only search inside of memory, as originally envisaged
by Gigerenzer and Goldstein (1996). And it provides some support
for the hypothesis that people’s stopping rules differ between
inferences from givens and inferences using external search. We
next consider some structures of those external environments in
more detail.
Search in Structured Environments
Noncompensatory Environments
We have seen that in a noncompensatory environment, search by
validity and one-reason stopping are ecologically rational when the
order of cue validities is known (as opposed to being estimated
from samples—see Gigerenzer & Brighton, 2009). This result leads
to the following hypothesis:
Hypothesis 3: Search by validity and one-reason stopping

rules are used more frequently in a noncompensatory envi-
ronment than in a compensatory one (if order of validities is
known).
Several experiments provide a test of this hypothesis. Participants

in Rieskamp and Otto’s (2006) first study took on the role of bank
consultants who had to evaluate which of two companies applying
for a loan was more creditworthy. The values of six cues about
each company, such as qualification of employees, were provided
in an information matrix where participants could click to reveal

the values they wanted to see. One group of participants encoun-
tered a noncompensatory environment with some noise added,
meaning that in about 90% of the cases, the outcome feedback they
received was determined by the first discriminating cue rather
than an integration of several cues. For the second group, feedback
was determined in a compensatory way, meaning that in about 90%
of the cases, the more creditworthy company was determined by a
weighted additive rule as the one with the greater sum of cue values
multiplied by the corresponding cue validities. Did people use
different heuristics based on the structure of the environment they
encountered? They did: In the noncompensatory environment, the
choices consistent with take-the-best increased over the course of
168 trials with feedback from 28% to 71%, whereas in the compen-
satory environment, they decreased to 12%.
This sensitivity to compensatory versus noncompensatory envi-
ronments was replicated in several other studies. Oliveira (2005)
extended the findings to changing environments: When the structure
was switched from noncompensatory to compensatory, or
vice versa, participants adapted their rate of use of take-the-best
accordingly. In a task involving allocation decisions by a marketing
department that wants to sell vacuum cleaners in a foreign country,
Persson (2003) reported strong evidence that participants used take-
the-best when the cue structure was noncompensatory. In summary,
experimental evidence so far suggests that people adapt to non-
compensatory environments by relying on ordered search (e.g., by
validity) and one-reason stopping rules, whereas in compensatory
environments they rely more on compensatory strategies.
Variability in Cue Validity

How common are strictly noncompensatory environments outside
of experimental tasks? The answer to this question is not known.
Among the 20 real-world environments studied in Czerlinski et al.
(1999), a reanalysis revealed just three that were strictly noncom-
pensatory in terms of the cue regression weights. Noncompensatory
information is a special case of the more general situation of high
variability in cue validity (see chapter 8 for more on measures of
cue variability). With large variability of cue validities, one-reason
stopping is more likely to be more accurate than multiple regres-
sion and tallying in predictive accuracy, particularly when cues are
moderately correlated with each other (Hogarth & Karelaia, 2005a).
From this, we derive the following hypothesis:
Hypothesis 4: The larger the variability of cue validities, the

more often one-reason stopping is used.
In an experiment in which participants were asked to infer

based on various cues which of two shares will be more profitable,
Bröder (2003, Experiment 2) manipulated the variability of cue
validities. Consistent with this hypothesis, 77% of the participants
in the high-variability environment were classified as using take-
the-best, but only 15% in a compensatory environment with low
variability of cue validities.
This hypothesis also seems to generalize from inferences to pref-
erence judgments. In choices between gambles, high dispersion of
probabilities was associated with less information acquisition (i.e.,
higher frugality) and, more generally, processing patterns consis-
tent with lexicographic decision heuristics, that is, one-reason
stopping rules (Payne et al., 1988). Furthermore, simulations have
shown that with widely varying cue weights, continued informa-
tion search beyond the first cue rarely leads to different preferential
choices from one-reason stopping (Fasolo, McClelland, & Todd,
2007).
Cue Redundancy
As discussed earlier, one-reason stopping is adaptive relative to
compensatory strategies when the redundancy (e.g., correlation)
between cues is high. This suggests the following hypothesis:
Hypothesis 5: The more redundant cues are, the more often

one-reason stopping is used.
To test this hypothesis, Dieckmann and Rieskamp (2007; see chap-

ter 8) varied cue redundancy in an inference task. After an initial
learning phase to familiarize themselves with the environmental
structures, participants under the high-redundancy condition
(average intercue correlation = .50) followed one-reason stopping
in 77% of nonguessing trials, while under low redundancy (aver-
age intercue correlation = −.15) the rate was 44%. The second most
frequent stopping rule was the two-reason (confirmative) rule, again
used more under low redundancy (31%) than high (20%). Also
consistent with this hypothesis, Shanteau and Thomas (2000)
reported that one-reason stopping is less accurate in environments
with negative or no intercue correlations.
Costs of Cues
What influence do monetary costs have on stopping rules? In all
studies we are aware of, all available cues have had the same cost
(unlike in the red deer’s situation), so we restrict our analysis to
this condition. From the ecological analysis reported earlier, one

can derive the following hypothesis:
Hypothesis 6: The higher the relative information cost I (I < 1/2),

the more frequently people rely on one-reason stopping.
To test this hypothesis, we need studies that varied information

costs within one experiment. Experiment 3 of Bröder (2000a)
showed that when the relative information costs I increased from
1/100 to 1/10, the proportion of people classified as using take-the-
best increased from 40% to 65%. Whereas Bröder (2002) relied on
a regression-based classification method, Newell and Shanks (2003,
Experiment 1) investigated the process of stopping using an infor-
mation matrix design. In the low-cost condition (I = 1/10), adher-
ence to a one-reason stopping rule was only observed in 36% of
the trials (not including guessing), but when I was increased to 1/5,
this proportion rose to 85%. Dieckmann and Todd (see chapter 11)
also found a higher proportion of one-reason stopping under higher
(I = 3/20) compared to lower (I = 1/20) relative costs (70% and 51%,
respectively). All three studies support this hypothesis.
In those cases where participants did not rely on one-reason
stopping, what other stopping rule did they rely on? We reanalyzed
the cases in Dieckmann and Todd’s data where search continued
and a second discriminating cue was found (five cues were avail-
able in this experiment). In half of the cases, the second discrimi-
nating cue pointed to the same alternative as the first. Here, 86% of
participants stopped at this point, consistent with the two-reason
confirmative stopping rule. In the other half of the cases, the second
discriminating cue pointed to the other alternative, and partici-
pants then continued to search in 83% of the cases, again consis-
tent with this rule.
Earlier, we also showed that when cue costs get high, one-reason
(and two-reason) stopping becomes unprofitable. This leads to the
following hypothesis:
Hypothesis 7: The closer the relative information cost I

approaches 1/2, the more frequently people rely on single-cue
(rather than one-reason) stopping.
Newell, Weston, and Shanks (2003, Experiment 2) set I to 1/5 per

cue and reported that 29% of their participants relied on the single-
cue stopping rule and simply guessed whenever the first cue did
not discriminate. Note that these participants consistently did this
for all decisions they made. In a study by Läge, Hausmann, Christen,
and Daub (2005), the relative information costs were half of those
in the Newell experiment, I = 1/10. These authors found that none

of their participants used single-cue stopping in all tasks or as the
predominant stopping rule, and that participants stopped search
after one or more cues without having found a discriminating cue
in only 5% of the cases. These results are consistent with this
hypothesis. However, the comparison is across experiments, and
therefore the evidence is indirect. Another study compared the
search costs more directly (Läge, Hausmann, & Christen, 2005).
Also consistent with this hypothesis, the rate of stopping search
after one or more cues without having found a discriminating cue
increased dramatically with increasing costs. To illustrate, when
there were no costs at all (I = 0), stopping search without having
found a discriminating cue only occurred in 0.4% of the cases, and
this number increased to 11.6% for I ≈ 1/10, to 20.7% for I ≈ 2/10,
and to 31.3% for I = 5/10.
If cues become expensive, their discriminatory power becomes
relevant in addition to their validity. Thus, we can formulate the
further hypothesis:
Hypothesis 8: The higher the relative information costs I (0 ≤ I

≤ 1/2), the greater the increase in search by usefulness or suc-
cess compared to search by validity.
Experiments explicitly designed to test this hypothesis remain to

be done.
Time Pressure
Direct monetary costs are not the only means of favoring frugality
in information search. Time pressure should also increase the use
of a stopping rule that ends search quickly. Thus, we hypothesize:
Hypothesis 9: The higher the time pressure, the more fre-

quently people rely on one-reason stopping.
Presenting participants with a choice between different compa-

nies, either under low time pressure (50 seconds for each choice) or
high time pressure (20 seconds), Rieskamp and Hoffrage (1999)
found that under high time pressure, the largest group of partici-
pants relied on search by validity and one-reason stopping (46%).
Under low time pressure, a weighted additive strategy was
consistent with the choices of the largest group (42%). The phe-
nomenon that time pressure tends to increase lexicographic heuris-
tics, and noncompensatory heuristics in general, is well documented
in the literature (Payne et al., 1988, 1993; Rieskamp & Hoffrage,
2008).
To conclude our consideration of experimental evidence for

hypotheses concerning limited search and stopping, we consider
the idea that experiments and theories are independent entities of
research, with the experiment providing the test of the theory. The
research on information search shows that the situation is not that
simple. As we mentioned in the beginning of this chapter, theories
of cognition often assume an experimental situation where all rel-
evant information is conveniently displayed in front of a person, in
inferences from givens. But the process of stopping search occurs
earlier in time than the process of decision making, and thus the
stopping rule can constrain the decision rule. As a consequence,
experiments that exclude search and stopping and those that do not
are likely to come to different conclusions about the nature of deci-
sion processes. It is important to keep this in mind when general-
izing the results from particular experiments to others, and to
real-world settings.
Limited Search Is a Basic Element of Cognition
Limited search is central to Herbert Simon’s concept of bounded

rationality. In this chapter, we explicated limited search in terms of
two processes, how to search for information and when to stop, and
provided models of search rules and stopping rules. But different
search and stopping rules work best in different environments,
making them critical to the notion of ecological rationality, as well.
We have outlined what we know at this point about the ecological
rationality of search and stopping rules and introduced the growing
body of experimental evidence for the adaptive use of these
rules, but it is also clear that much of the evidence is still tentative
and there is need for many more extensive analyses, conclusive
experiments, and empirical studies. The study of search and stop-
ping rules should hence become an integral part of cognitive sci-
ence. We hope this challenge will be taken up by a new generation
of researchers.
11
Simple Rules for Ordering Cues in
One-Reason Decision Making
Anja Dieckmann
Peter M. Todd
Life is too complicated not to be orderly.

Martha Stewart
H ow can we determine what information to consider when

making a decision in an unfamiliar environment? Imagine moving
to a new country and trying to decide where to buy your morning
coffee. There are many coffeehouse options, and many cues you
might use to decide between them: the bright warm colors of some
establishments, whether they have an Italian espresso machine,
how many sparrows are lingering around outside waiting to
snatch the croissants of unwary customers. Which of these cues
lead to the best choices in this town? To find out, you could ask
others with more experience—but in this case, all of your local
friends are tea drinkers and of no use, so you are on your own to
learn about the useful information in this new domain.
You could start exhaustively (and exhaustingly) trying different
places, but you would rather come up with a quicker method.
Luckily, you know that you may not need much information to
make your choice. Rather than having to weight and sum all the
cues you can find for your various coffee alternatives, you plan to
use a simpler lexicographic decision rule, looking at one cue at a
time in a particular order until you find a cue that discriminates
between the options and indicates a choice (Fishburn, 1974). But
you still have to come up with that particular order, by learning
about the cues as you gain experience choosing coffee locations.
What learning rules will work?
Lexicographic rules are used by people in a variety of decision
tasks (Bröder, 2000a, 2003; Payne, Bettman, & Johnson, 1993)
and have been shown to be both accurate in their inferences and
frugal in the amount of information they consider before making a
274
SIMPLE RULES FOR ORDERING CUES IN ONE-REASON DECISION MAKING 275
decision. For instance, Gigerenzer, Todd, and the ABC Research

Group (1999) demonstrated the high accuracy and low information
use of several decision heuristics that stop information search as
soon as one discriminating cue is found. Because only that cue is
used to make the decision, and no integration of information is
involved, they called these heuristics “one-reason” decision mech-
anisms. Given some set of cues that can be looked up to make the
decision, these heuristics differ mainly in the search rule that deter-
mines the order in which the information is searched. And particu-
lar cue orders make a difference in the performance of these
heuristics, in terms of both their accuracy and frugality. This can
be seen, for instance, in a comparison of the take-the-best and
minimalist heuristics (Gigerenzer & Goldstein, 1996, 1999). Both
consist of three building blocks:
Search rule: Search through cues in some order. For the mini-
malist heuristic, order is random, while for take-the-best,
order is in terms of ecological validity, that is, the propor-
tion of correct decisions made by a cue out of all the times
that cue discriminates between pairs of options.
Stopping rule: Stop search as soon as one cue is found that
discriminates between the two options.
Decision rule: Select the option to which the discriminating
cue points, that is, the option that has the cue value associ-
ated with higher criterion values.
The performance of these heuristics has been tested on several

real-world data sets, ranging from professors’ salaries to fish fertil-
ity (Czerlinski, Gigerenzer, & Goldstein, 1999), in cross-validation
comparisons with other more complex strategies. Across 20 data
sets, take-the-best used on average only a third of the available cues
(2.4 out of 7.7), yet still outperformed multiple linear regression in
generalization accuracy (71% vs. 68%). The even simpler minimal-
ist heuristic was more frugal (using 2.2 cues on average) and still
achieved a reasonable 65% generalization accuracy. But this was
6 percentage points behind take-the-best’s performance, indicating
that part of the secret of take-the-best’s success lies in its ordered
cue search. As we will show, an agent need not use exactly take-
the-best’s ecological validity order to do well in making fast and
frugal decisions—there are many cue orders besides validity that
yield good performance. This is a good thing, because it can be
challenging to compute the validity order of a set of cues, as we
discuss in the next section. Still, it leaves us with the question of
how an accurate and frugal cue order can be obtained without
assuming full a priori knowledge of ecological cue validities or

other environmental statistics.
There are a number of possible routes to good cue orders. The
environment of interest may be structured such that any (including
random) cue order will perform well, as, for instance, when the
available cues are of equal validity and discrimination ability.
Alternatively, for adaptively important tasks, evolution may have
provided individuals with domain-specific cue-ordering informa-
tion (e.g., for food choice, edibility and calorie content should be
assessed before temperature and texture). Institutions may also
have developed to provide information about what cues should be
checked first (e.g., for traffic right-of-way, look for a stop sign before
looking for oncoming traffic—see chapter 16). Finally, other indi-
viduals may provide useful advice as to what pieces of information
to use, in what order, when making a decision (e.g., “check for rust
first when buying a used car”; see Garcia-Retamero, Takezawa, &
Gigerenzer, 2006). But in many situations, individuals must dis-
cover a useful cue order on their own, through their own limited
experience with the environment. This is the situation we con-
sider here.
Related research in computer science has demonstrated the effi-
cacy of a range of simple ordering rules for a similar search prob-
lem. We first describe these rules and expand on them to create a
set of psychologically plausible cue order learning mechanisms
which we compare through simulation tests. These tests reveal that
simple mechanisms at the cue-order learning stage can enable
simple mechanisms at the decision stage, such as lexicographic
one-reason decision heuristics, to perform well. We then describe
an experimental study through which we explored how well vari-
ous of these proposed simple cue-ordering rules account for
how people actually order cues in different environmental settings
of a challenging decision task.
Search Order Construction—The Hard Way
Although take-the-best is a very simple heuristic to apply, the setup

of its search rule requires knowledge of the ecological validities of
cues. When this knowledge is not available a priori via social or evo-
lutionary transmission, it must be computed from stored or ongoing
individual experience. Gigerenzer and colleagues (e.g., Gigerenzer
& Goldstein, 1999) have been relatively silent about the process
by which people might derive validities and other search orders,
an omission that several peers have commented on (e.g., Lipshitz,
2000; Wallin & Gärdenfors, 2000). The criticism that take-the-best
owes much of its strength to rather comprehensive computations
necessary for deriving the search order cannot be easily dismissed.

Juslin and Persson (2002) questioned how simple and information-
ally frugal take-the-best actually is, because of the need to take into
account the computation of cue validities for deriving the search
order. They differentiate two main possibilities for determining cue
validities based on when they are computed: precomputation during
experience, and calculation from memory when needed.
When potential decision criteria are already known at the time
objects are encountered in the environment, then relevant validi-
ties can be continuously computed, updated, and stored with each
new object seen. But if it is difficult to predict what decision tasks
may arise in the future, this precomputation of cue validities
runs into problems. In this case, at the time of object exposure,
all attributes should be treated the same, because any one could
later be either a criterion or a cue depending on the decision being
made. To use the well-known domain of German cities (Gigerenzer
& Goldstein, 1996, 1999), the task that one encounters need not be
the usual prediction of city populations based on cues such as train
connections but could just as well be which of two cities has an
intercity train line based on cues that include city population. To
keep track of all possible validities indicating how accurately one
attribute can predict another, the number of precomputed validities
would have to be A2−A, with A denoting the number of attributes
available. In the German cities example, there are 10 attributes
(9 cues plus the original criterion population size); thus 90 validi-
ties would have to be continuously computed, updated, and stored.
This number rises rapidly with an increasing number of attributes.
Even ignoring computational complexity, this precomputation
approach is not frugal in terms of information storage.
As a second possibility, Juslin and Persson (2002) considered
storing all objects (exemplars) encountered along with their attri-
bute values and postponing computation of validities to the point
in time when an actual judgment is required. This, however, makes
take-the-best considerably less frugal during its application. The
number of pieces of information that would have to be accessed at
the time of judgment is the number of attributes (cues and criterion
values) times the number of stored objects; in our city example, it is
10 times the number of known objects. With regard to computing
validities, for each of the N(N−1)/2 possible pairs that can be
formed between the N known objects, each of the C cues has to be
checked to see if it discriminated, and did so correctly. Thus the
number of checks to be performed to compute validities before a
decision can be made is CN(N−1)/2, which grows with the square
of the number of objects.
Juslin and Persson (2002) assumed worst-case scenarios in terms
of computational complexity for the sake of their argument, and
they focused on calculations of precise ecological validity values

when all that take-the-best relies on is the corresponding cue
ordering (and all that some other effective lexicographic strategy
would need is a different effective ordering). But they raise an
important point, highlighting one of the fundamental questions for
research on simple heuristics: How can the cue search orders used
by heuristics be found in psychologically plausible ways?
Simple Approaches to Constructing Cue Search Orders
To compare different cue-ordering rules, we need to know how

good the decisions made with the different cue orders they produce
are. Therefore, we first evaluate the performance of different cue
orders when used by a one-reason decision heuristic within a par-
ticular well-studied sample domain: large German cities, compared
on the criterion of population size using nine cues ranging from
having a university to the presence of an intercity train line
(Gigerenzer & Goldstein, 1996, 1999). Examining this domain makes
it clear that there are many good cue orders out of the 362,880 (or
9!) possible orders (Martignon & Hoffrage, 1999). When used with
one-reason stopping and decision building blocks, the mean accu-
racy across all of the cue orders is 70%, equivalent to the perfor-
mance expected from the minimalist heuristic, which uses a
random cue order. The accuracy of the ecological validity order
used in take-the-best, 74.2%, falls toward the upper end of the accu-
racy range (62–75.8%), but there are still 7,421 cue orders that do
better than it. The frugality of the search orders ranges from
2.53 to 4.67 cues per decision, with a mean of 3.34 cues, again
corresponding to using minimalist; take-the-best’s validity order
yields a frugality of 4.23, implying that most orders are more frugal.
Thus, there are many accurate and frugal cue orders that could be
found—and a satisficing decision maker who does not require
optimal performance only needs to find one of them. (Figure 11-1
shows the range of cue orders for this task in terms of accuracy vs.
frugality.)
An ordering problem of this kind has been studied in computer
science for nearly four decades, and this research provides us
with a set of potential heuristics to test. Imagine a set of data records
arranged in a list, with a probability pi that a particular record i will
be required during a sequence of retrievals. On each retrieval, the
searcher provides a key (e.g., a record’s title) and the list is searched
from the front to the end until the desired record, matching that
key, is found. The goal is to minimize the mean search time for
accessing the records in this list, for which the optimal ordering
.76
Accuracy (Proportion Correct)
.72
.68
.64 Delta Rule/Validity

Tally
Tally Swap
Simple Swap
Selective Move-to-front
Move-to-front
.60
2.6 3.0 3.4 3.8 4.2 4.6
Frugality (Number of Cues Used)
Figure 11-1: Mean final offline accuracy and frugality after 100
learning trials for various cue order learning rules. In gray, all
362,880 possible search orders for the city comparison task are plot-
ted in terms of their frugality and accuracy. The open star indicates
the performance of ecological validity ordering in take-the-best and
the black star shows random cue ordering in minimalist, corre-
sponding to the mean cue order where all learning rules begin. The
mean offline performance of all of the learning rules has improved
after 100 trials in comparison to that benchmark (greater frugality
and mostly higher accuracy).
is in decreasing order of pi.1 But if these retrieval probabilities are

not known ahead of time, how can the list be ordered after each
successive retrieval to achieve fast access? This is the problem
of self-organizing sequential search (Bentley & McGeoch, 1985;
Rivest, 1976).
A variety of simple sequential search heuristics have been pro-
posed for this problem, centering on three main approaches:
(a) transpose, in which a retrieved record is moved one position
closer to the front of the list (i.e., swapping with the record in front
of it); (b) move-to-front, in which a retrieved record is put at the
front of the list, and all other records remain in the same relative
1. Note that prominent memory models assume that memory retrieval

proceeds in a similar fashion. ACT-R (e.g., Anderson, 1990), for instance,
assumes that records are searched for sequentially in the order of their
need probabilities until the needed record is found.
order; and (c) count, in which a tally is kept of the number of times
each record is retrieved, and the list is reordered in decreasing
order of this tally after each retrieval. Because count rules require
storing additional information, more attention has focused on the
memory-free transposition and move-to-front rules. Analytic and
simulation results (reviewed in Bentley & McGeoch, 1985) have
shown that while transposition rules can come closer to the
optimal order asymptotically, in the short run move-to-front rules
converge more quickly (as can count rules). This may make
move-to-front (and count) rules more appealing as models of cue
order learning by humans facing small numbers of decision trials.
Furthermore, move-to-front rules are more responsive to local
structure in the environment (e.g., able to capitalize immediately
on a particular record becoming temporarily “popular”), while
transposition can result in very poor performance under some cir-
cumstances (e.g., when neighboring pairs of “popular” records get
trapped at the far end of the list by repeatedly swapping places
with each other).
It is important to note, however, that there are some critical dif-
ferences between the self-organizing sequential search problem
and the cue-ordering problem we address here. First, when a record
is sought that matches a particular key, search proceeds until
the correct record is found. In contrast, when a decision is made
lexicographically and the list of cues is searched through, there
is no one “correct” cue to find—search stops at the first cue that
discriminates between the decision alternatives (i.e., allows a deci-
sion to be made), and there may be several such cues. Furthermore,
once a discriminating cue is found, it may not even make the
correct decision (the lower its validity, the more likely it is to indi-
cate the wrong choice). Thus, given feedback about whether a deci-
sion was right or wrong, a discriminating cue could potentially be
moved up or down, respectively, in the ordered list. This dissocia-
tion between making a decision or not (based on the cue discrimi-
nation rates, that is, the proportion of all decisions on which
the cue makes a distinction between alternatives), and making a
right or wrong decision (based on the cue validities), means that
there are two performance criteria in our problem—frugality and
accuracy—as opposed to the single criterion of search time for
records. Because record search time corresponds to cue frugality,
the heuristics that work well for the self-organizing sequential
search task are likely to produce orders that emphasize frugality
(reflecting cue discrimination rates) over accuracy when they are
applied to the cue-ordering task. With this tendency in mind,
these heuristics offer a useful starting point for exploring cue-
ordering rules.
Simulation Study of Simple Ordering Rules
The Cue-Ordering Rules

We used computer simulation to compare a set of cue search-order
construction processes that are psychologically plausible by
being frugal in terms of both information storage and computation
(Todd & Dieckmann, 2005). The decision situation we explore
is different from the one assumed by Juslin and Persson (2002),
who differentiated learning about objects from later making deci-
sions about them. Instead, we assume a learning-while-doing
setting, consisting of tasks that have to be done repeatedly with
feedback after each instance about the adequacy of one’s decision.
For instance, on multiple occasions at the supermarket we can
choose one of two checkout lines, then get the feedback on whether
the one we have chosen or (more likely) the other one is faster,
and finally learn to associate this outcome with cues, including
the lines’ lengths and the ages of their respective cashiers. In
such situations, decision makers can learn about the differential
usefulness of cues for solving a task via the feedback received over
time.
Our explicitly defined ordering rules operate in a learning-while-
doing situation to construct cue orders for use by lexicographic
decision mechanisms. These mechanisms are applied to a particu-
lar probabilistic inference task: a forced-choice paired comparison
in which a decision maker has to infer which of two objects, each
described by a set of binary cues, is “bigger” on a criterion, as in the
city size comparison task for take-the-best described above. After
an inference has been made, feedback is provided as to whether a
decision was right or wrong. Therefore, the order-learning rule has
information about which cues were looked up, whether a cue dis-
criminated, and whether a discriminating cue led to the right or
wrong decision. The learning rules we propose differ in which
pieces of information they use and how they use them. Note that
all of the rules assume that learning in each decision trial only
occurs for the single first cue that was found to discriminate—that
is, even though other cues might also discriminate in this trial,
the lexicographic decision rule would not bother to check them,
and so no learning about them can occur. We classify the rules
based on their memory requirement—high versus low—and their
computational requirements in terms of full or partial reordering
(see Table 11-1).
The validity rule, a type of count rule, is the most demanding of the
rules we consider in terms of combined memory requirements and
computational complexity. It keeps a count of all discriminations
Table 11-1: Learning Rules Classified by Memory and

Computational Requirements
High memory High memory load, Low memory load,
load, complete local reordering local reordering
reordering
Validity: Reorder Tally swap: Move cue up Simple swap: Move
cues by their (down) one position cue up one position
current estimated if it made a correct if it made a correct
validity (incorrect) decision decision and down if
and its tally of correct it made an incorrect
Tally: Reorder cues minus incorrect decision
by number of decisions is ≥ (≤) that
correct minus of next higher (lower) Move-to-front: Move
incorrect decisions cue cue to front if it
made so far discriminated
Delta rule: Reorder Selective move-to-
cues by learned front: Move cue
association to front only if
strength it discriminated
correctly
made by a cue so far (in all the times that the cue was looked up)
and a separate count of all the correct discriminations (i.e., those
decisions where the cue discriminated and indicated the alterna-
tive with the higher criterion value). Therefore, its memory load
is comparatively high. The validity of each cue is determined
by dividing its current correct discrimination count by its total
discrimination count. Based on these values computed after each
decision, the rule reorders the whole set of cues from highest to
lowest validity.
The tally rule only keeps one count per cue, storing the differ-
ence between the number of correct decisions and incorrect
decisions made by that cue so far. If a cue discriminates correctly
on a given trial, one point is added to its tally, and if it leads to an
incorrect decision, one point is subtracted. The tally rule is thus
less demanding than the validity rule in terms of both memory and
computation: Only one count is kept, and no division is required.
Note that the tally rule with its single count is sensitive to the
number of discriminations while the validity rule is not. For
instance, the validity rule would rank a cue that made 5 discrimina-
tions, 4 of them correct and 1 incorrect, the same as a cue that made
25 discriminations, 20 correct and 5 incorrect (because 4/5 = 20/25),
while the tally rule would rank the latter higher (4 − 1 < 20 − 5).
The simple swap rule uses the transposition rather than the
count approach. This rule has no memory of cue performance other
than an ordered list of all cues and just moves a cue up one position
in this list whenever it leads to a correct decision, and down if it
leads to an incorrect decision. In other words, a correctly deciding
cue swaps positions with its nearest neighbor upward in the cue
order, and an incorrectly deciding cue swaps positions with its
nearest neighbor downward.
The tally swap rule is a hybrid of the simple swap rule and the
tally rule. It keeps a tally of correct minus incorrect discriminations
per cue so far (so memory load is high) but only moves cues by
swapping: When a cue makes a correct decision and its discrimina-
tion tally is greater than or equal to that of its upward neighbor, the
two cues swap positions. When a cue makes an incorrect decision
and its tally is smaller than or equal to that of its downward neigh-
bor, the two cues also swap positions. Otherwise, the tallies of the
neighboring cues suggest that the current cue order is reasonable
and no change is made, providing a degree of stabilizing inertia.
We also test two types of rules that move cues to the top of the
rank order. First, the move-to-front rule moves the last discriminat-
ing cue (i.e., whichever cue was found to discriminate for the cur-
rent decision) to the front of the order. This is equivalent to the
cue-ordering building block employed by the take-the-last heuristic
(Gigerenzer & Goldstein, 1996, 1999), which uses a memory of cues
that discriminated most recently in the past to determine cue search
order for subsequent decisions. Second, selective move-to-front
moves the last (most recent) discriminating cue to the front of the
order only if it correctly discriminated; otherwise, the cue order
remains unchanged. This rule thus takes accuracy as well as dis-
crimination-based frugality into account.
Finally, we consider an associative learning rule that uses the
delta rule (Widrow & Hoff, 1960) to update cue weights according
to whether they make correct or incorrect discriminations and then
reorders all cues in decreasing order of this weight after each deci-
sion. This corresponds to a simple network with K (in our dataset,
9) input units encoding the difference in cue value between the two
objects (A and B) being compared (i.e., ini = −1 if cuei(A) < cuei(B),
1 if cuei(A) > cuei(B), and 0 if cuei(A) = cuei(B) or cuei was not
checked), and one output unit whose target value encodes the
correct decision (t = 1 if criterion(A) > criterion(B), otherwise −1).
The weights between inputs and output are updated according to
⎛ K
⎞
Δwi = lr ⎜ t ∑ ink w k ⎟ ini with learning rate lr = 0.1. We expect this
• •
⎝ k =1 ⎠
rule to behave similarly to selective move-to-front initially (moving
a correctly discriminating cue to the front of the list by giving it the
largest weight when weights are small) and to tally swap later on
(moving cues only a short distance in the list once weights are larger).
To test the performance of these cue-order learning rules when
applied to small samples of data, we used the German cities data
set (Gigerenzer & Goldstein, 1996, 1999) consisting of the 83 largest
German cities (those with more than 100,000 inhabitants in 1990)
described on 9 cues that give some information about population
size. We present results averaged over 10,000 learning sequences
for each rule, starting from random initial cue orders. Each sequence
consisted of 100 comparisons to decide the larger of two randomly
selected cities. For each decision, the current cue order was used
to look up cues until a discriminating cue was found, which was
used to make the decision (employing a lexicographic one-reason
stopping rule and decision rule as in take-the-best). After each
decision, the cue order was updated using the particular order-
learning rule. We consider two measures of accuracy: The cumula-
tive accuracy (i.e., online or amortized performance—Bentley &
McGeoch, 1985) of the rules is defined as the total percentage of
correct decisions made so far at any point in the learning process,
which captures the essence of learning-while-doing. The contrast-
ing measure of offline accuracy indicates how well the current
learned cue order would do if it were applied to the entire test set
(also known as batch learning).
Results of the Simulations

For all but the move-to-front rules, cumulative accuracies during
the 100 learning trials soon rose above that of the random cue order
used by the minimalist heuristic (proportion correct = .70), which
serves as a lower benchmark. However, at least throughout the first
100 decisions, cumulative accuracies stayed well below the accu-
racy that would be achieved by using cues in ecological validity
order, as in take-the-best, for all decisions (proportion correct = .74)—
though this is no surprise as there were too few trials to learn the
precise validity order. Except for the move-to-front rules, whose
cumulative accuracies remained very close to random ordering, all
learning rules performed on a surprisingly similar level (.71–.72),
with less than one percentage point difference in favor of the most
demanding rules compared to the least. Offline accuracies (shown
in Figure 11-2) are very similar overall, though slightly higher,
again with the exception of the move-to-front rules.
While not as accurate, all learning rules result in cue orders that
are more frugal than ecological validity (mean number of cues
looked up: 4.23), and even more frugal than random order (3.34
cues), in terms of both online and offline frugality. We focus on
their offline performance (see Figure 11-3): The most frugal, as
.75
TTB
.74 Delta Rule
Validity
Tally
Tally Swap
Proportion Correct
.73 Simple Swap
Selective Move-to-front
Move-to-front
.72
.71
.70
random
.69
0 20 40 60 80 100
Decision
Figure 11-2: Mean offline accuracy of cue-order learning rules used

in lexicographic decision making over 100 learning trials, averaged
across 10,000 sequences. Performance of random cue order is indi-
cated, while the line marked TTB shows the (stable) accuracy of
the take-the-best heuristic using a precalculated cue validity order.
(Adapted from Todd & Dieckmann, 2005.)
TTB
4.2
Delta Rule
Frugality (Number of Cues Used)
Validity
4.0 Tally
Tally Swap
Simple Swap
3.8 Selective Move-to-front
Move-to-front
3.6
3.4
random
3.2
3.0
2.8
0 20 40 60 80 100
Decision
Figure 11-3: Mean offline frugality of cue-order learning rules used

in lexicographic decision making over 100 learning trials, averaged
across 10,000 sequences. Performance of random cue order is indi-
cated, while the line marked TTB shows the (stable) frugality of
the take-the-best heuristic using a precalculated cue validity order.
(Adapted from Todd & Dieckmann, 2005.)
285
expected, are the two move-to-front rules. There is little difference

between the rest of the rules.
The combined offline accuracy and frugality of the orders result-
ing after 100 trials for each rule is summarized in Figure 11-1, com-
pared with the performance of ecological validity in take-the-best
and of random cue order in minimalist, the benchmark equivalent
to the (mean) starting point of all the learning rules. The learning
rules move cue orders over time in the appropriate direction, that is,
toward both greater frugality and, except for the move-to-front rules,
higher accuracy.
Consistent with the finding that all learning rules produce
cue orders of high frugality, the resulting cue orders show positive
correlations with the order specified by cue discrimination rate
(reaching the following mean values after 100 decisions: delta
rule: r = .12; validity learning rule: r = .18; tally: r = .29; tally
swap: r = .24; simple swap: r = .18; selective move-to-front: r = .48;
move-to-front: r = .56). This means that cues that often discrimi-
nate between alternatives are more likely to end up in the first
positions of the order. This is especially true for the move-to-front
rules. In contrast, the cue orders resulting from all learning rules
but the validity learning rule do not correlate, or correlate nega-
tively, with the ecological validity cue order after being exposed to
this small sample of decisions, and even the correlations of the cue
orders resulting from the validity learning rule only reach an aver-
age correlation of r = .12.
Will any of these learning rules finally reach the accuracy
achieved by take-the-best and its validity cue order when simula-
tions are extended, giving them a chance to encounter more deci-
sion pairs? To test this, we ran simulations (starting from 1,000
random initial cue orders) for 10,000 rather than 100 decisions
between randomly selected cities. Only the validity rule produced
orders that on average reached take-the-best’s accuracy level within
this long series of trials (and not surprisingly, it usually did so
within around 3,000 trials, which are enough for it to have seen
nearly all pairs of cities and so learn the full validity order). The
other rules reached asymptotes within between 800 trials (for selec-
tive move-to-front, at .704) and 2,500 trials (for tally and tally
swap, both at .727). Beyond, changes in offline accuracy are less
than 0.01 percentage points per 100 trials.
Accounting for Differences in Rule Performance

Most of the simple cue-order learning rules we have proposed do
not fall far behind the validity learning rule in accuracy. The excep-
tions are the move-to-front rules, but they compensate for this
failure by being highly frugal. All the other rules achieve higher
accuracy and at the same time also beat minimalist’s random cue
selection in terms of frugality.
In fact, it could be that the frugality-determining discrimination
rates of cues generally exert more of a pull on cue order than
validity. One reason to expect this is the fact that in the city data
set we used for the simulations (as in other natural environments;
see Gigerenzer & Goldstein, 1999), the validities and discrimi-
nation rates of cues are negatively correlated. A cue with a low
discrimination rate along with a high validity has little chance
of being used and hence, of demonstrating its high validity.
Whatever learning rule is used, if such a cue is displaced down-
ward to the lower end of the order by other cues, it may never be
able to escape to the higher ranks where it belongs. The problem is
that when a decision pair is finally encountered for which that
cue would lead to a correct decision, it is unlikely to be checked
because other, more discriminating although less valid, cues are
looked up before and already bring about a decision. In this regard,
our learning rules, when combined with one-reason decision
making, are sensitive to the order of experiences, an effect described
in the incremental learning literature (e.g., Langley, 1995). Because
one-reason decision making is intertwined with the learning
mechanism in learning-while-doing scenarios, and so influences
which cues can be learned about, across these learning rules what
mainly makes a cue come early in the order is producing a high
absolute surplus of correct over incorrect decisions (which the tally
rule in particular is tracking) and not so much a high ratio of correct
discriminations to total discriminations regardless of base rates
(which validity tracks).
Overall, the tally and tally swap rules emerge as a good compro-
mise between performance, computational requirements, learning
speed, and psychological plausibility considerations. Remember
that the tally and tally swap rules assume a memory store of the
counts of correct minus incorrect decisions made by each cue so
far. But this does not make them implausible for use by natural
minds, even though computer scientists were reluctant to adopt
such counting approaches for their artificial systems in the past
because of their extra memory requirements. There is considerable
evidence that people are actually very good at remembering the
frequencies of events—even human babies and nonhuman animals
seem sensitive to differences in the frequency of observed or expe-
rienced events, at least for small numbers. For instance, Hauser,
Feigenson, Mastro, and Carey (1999) showed that both 10- to
12-month-old babies and rhesus monkeys preferred containers with
more food items in them after they had observed the experimenter
putting in the items one after another. In this situation, babies and
monkeys could discriminate between a container with up to three
items and a container with four items. Other studies have shown
that after extensive training, animals can even learn to discriminate
between much larger numbers. Rilling and McDiarmid (1965)
trained pigeons to repetitively peck one illuminated lever, and
when the light went out, to change to one of two adjacent levers
depending on the number of pecks they had made. In this way,
the pigeons were shown to be able to discriminate between 35 and
50 pecks. Hasher and Zacks (1984) concluded from a wide range
of studies that frequencies are encoded in an automatic way,
implying that people are sensitive to this information without
intention or special effort. This capacity is usually demonstrated in
experiments that involve tracking the frequency of many different
items (e.g., Flexser & Bower, 1975; Underwood, Zimmerman, &
Freund, 1971; Zacks, Hasher, & Sanft, 1982; for reviews see also
Nieder & Dehaene, 2009; Sedlmeier & Betsch, 2002).
Consequently, the tally-based rules seem simple enough for a
wide range of organisms—including college students in their role
as experimental participants—to implement easily. In comparison,
the simple swap and move-to-front rules may not be much simpler,
because storing a cue order may be about as demanding as storing a
set of tallies. The tally-based rules are also computationally simple
because they do not have to keep track of base rates or perform
divisions, as does the validity rule.
Estes (1976) provided empirical evidence for the use of tally-
based strategies, arguing that people often base decisions on raw
frequencies rather than converting them into base-rate-adjusted
probabilities. In a series of experiments, participants first observed
outcomes of an imaginary survey about people’s preferences for a
number of products. They saw pairs of products and were told
which one was preferred by a fictional consumer. By showing
participants different pairs of products (e.g., A vs. B, C vs. D, etc.)
with varying frequency, Estes could pit the probability of a product
being preferred against the number of times it was preferred. In the
subsequent test phase, critical pairs were formed (e.g., A vs. C, with
A having a higher probability of preference, say, in 8 out of 10 pairs,
and C having the higher absolute frequency of preference, say, in
12 out of 24 pairs). Participants then had to indicate which product
was more likely to be preferred by a new sample of people from the
same population. In this test phase, participants showed a strong
tendency to predict that the winner would be the product that had
been preferred more frequently in the observation phase, even when
it had a lower probability of preference (i.e., C over A in our exam-
ple). This result supports the idea that people may keep track of
the number of correct discriminations that a cue makes rather
than utilizing a conditional measure such as its validity when
determining a cue order to use. We next turn to a set of experiments
designed to test whether people do follow such a tally-based cue-

order learning strategy, or one of the others we have introduced.
An Experimental Study of Cue-Order Learning
Before we ask how people learn cue orders, we should ask what
cue orders they actually end up using when making decisions.
This question has been partly addressed in research on the use of
the take-the-best heuristic, with its fixed validity-ordered cue
search. In situations where information must be searched for
sequentially in the external environment, particularly when there
are direct search costs for accessing each successive cue, con-
siderable use of take-the-best has been demonstrated (Bröder,
2000a, Experiments 3 & 4; Bröder, 2003; see also chapter 9). Take-
the-best is also employed when there are indirect costs, such as
from time pressure (Rieskamp & Hoffrage, 1999) or from internal
search in memory (Bröder & Schiffer, 2003b). The particular search
order used by people in these experiments has not always been
tested separately, but when such an analysis has been done, search
by cue validity order has been found (Newell & Shanks, 2003;
Newell, Weston, & Shanks, 2003).
However, none of these experiments tested whether people were
ever using search orders other than validity. A closer look into
the experimental designs of the studies cited above reveals that
they would not even have been able to show the use of many other
search orders: They all used systematically constructed environ-
ments in which the discrimination rates of the cues were held con-
stant. Such fixed discrimination rates make several alternative
ordering criteria that combine discrimination rate and validity all
lead to the same cue order, namely, just validity again. Examples of
such criteria (see Martignon & Hoffrage, 1999; chapter 10) are suc-
cess, which is the proportion of correct discriminations that a cue
makes plus the proportion of correct decisions expected from guess-
ing on the nondiscriminated trials [i.e., success = v d + 0.5(1 − d),
•
where v is validity and d is discrimination rate of the cue], and

usefulness, the proportion of correct decisions not including guess-
ing (usefulness = v d).
•
Because these criteria collapse to a single order (validity) in the

reported experiments, nothing can be said about how validity
and discrimination rate may interact to determine the search orders
that participants applied. There are hints that when information is
costly, making it sensible to consider both the chance that a cue
will enable an immediate decision (i.e., its discrimination rate) and
the validity of those decisions, other search orders such as success
that combine the two measures show a better fit to empirical data
(e.g., Läge, Hausmann, & Christen, 2005; Newell, Rakow, Weston, &
Shanks, 2004). But these and all the other studies on cue order use
remain silent about how any cue order could possibly be learned by
participants.
In sum, despite accumulating evidence for the use of one-reason
decision-making heuristics, the learning processes that underlie
people’s search through information when employing such heuris-
tics remain a mystery. Additionally, in most previous experimental
studies on the use of take-the-best, cue-order learning was at best
greatly simplified—if not totally obviated—by encouraging par-
ticipants to use cues in order of their validity either directly, by
informing them about cue validities or the validity order (Bröder,
2000a, Experiments 3 and 4; Bröder, 2003; Bröder & Schiffer, 2003b;
Newell & Shanks, 2003; Newell et al., 2003; Rieskamp & Hoffrage,
1999), or indirectly, through the presentation of graphs that depicted
cue validities (Bröder, 2000a, Experiments 1 and 2). Thus, to find
out how people construct and adjust cue search orders in unfamil-
iar task environments, we had to design a new experiment.
In our experimental setup, we carefully controlled what infor-
mation participants had access to from the beginning. First, as it is
the cue-order learning process we are mainly interested in, we did
not tell people what the cue validities were in our task. Second,
many of the existing experiments on take-the-best framed the task
as a choice between differentially profitable shares or stocks from
companies that were described on several cues indicative of their
profitability (Bröder, 2000a, Experiments 3 and 4; Bröder, 2003;
Newell & Shanks, 2003; Newell et al., 2003; Rieskamp & Hoffrage,
1999). Because of the potential existence of rather strong initial
preferences for certain cues in this familiar domain, we instead
created a task about a subject most people know very little about:
oil mining. Participants had to find out how cues differed in their
usefulness for making correct decisions about where to drill for
oil. And finally, to highlight the importance of searching for the
right information in the right order, participants had to pay for each
cue they wanted to consider in making their decision. Using this
setup, we aimed to find out how people build and adjust their cue
orders as a result of feedback over the course of several decisions,
and how well their final learned cue orders would perform.
Different types of cue orders are appropriate for different types of
environments; for instance, as mentioned above, in an environment
in which all cues have the same discrimination rate but different
validities, a validity-based ordering makes sense. To study how
environmental structure might influence the cue-ordering process,
we constructed three different environments, each consisting of
100 decision pairs that could be decided on the basis of five cues
about the two alternatives (locations to drill for oil) being compared.
In the first environment (called VAL), cues differed strongly in

validity with values of .90, .82, .73, .65, and .57, but all had the
same discrimination rate of .51. In the second environment (DR),
discrimination rates varied, with values .56, .49, .43, .36, and .24,
while validity was kept constant at .75. Finally in a third environ-
ment (VAL*DR), both discrimination rates and validities varied
and were negatively correlated: Validities were .57, .66, .74, .83,
and .91, while the respective discriminations rates, following the
opposite order, were .56, .50, .43, .36, and .22. (Given these values,
exact validity or discrimination rate orders cannot be determined
by participants in only 100 trials, but the learning processes used
by participants can still be observed.)
Costs for cues were also varied and were either high or low rela-
tive to gains. Participants received performance-contingent payoff
expressed in an artificial currency called “petros.” For each correct
decision in the 100 pair comparisons, participants received 2,000
petros (corresponding to 20 eurocents). In the high-cost conditions,
they had to pay 300 petros per cue (i.e., costs relative to benefits of
3/20), compared to 100 petros in the low-cost conditions (i.e., rela-
tive costs of 1/20). This resulted in a 3 (environments) by 2 (cost
conditions) design. To decrease the final payoff differences between
high- and low-cost conditions, participants in the different condi-
tions had accounts that started with different balances: 10,000
petros in the low-cost conditions, and 30,000 petros in the high-
cost conditions.
We expected that individuals’ search order would move toward
the cue order that led to the highest performance in the environ-
ment they faced: validity in the VAL environment, discrimination
rate in DR environment, and a combination of both (e.g., usefulness
and success) in the VAL*DR environment. Furthermore, we
expected that the process of search-order construction would best
be described by simple learning rules, particularly the tally and
tally swap rules using simple frequency counts, as supported by
our earlier simulations.
We asked 120 participants, run individually in the lab, to imag-
ine they were geologists hired by an oil-mining company. Their
basic task was to decide at which of two sites, labeled X and Y,
more oil was to be found underground, based on various test results.
Five different tests (Mobil Oil AG, 1997) could be conducted, and
each had two possible outcomes; for instance, the “chemical analy-
sis” test (for measuring proportion of organic material in stone)
could return the answer “low” or “high” for each site. The tests
were first described to participants one at a time, with cue direc-
tions revealed by telling participants that more oil is to be found
more often at a site with a particular label (e.g., “high” for chemical
analysis) than at a site with the opposite label (e.g., “low”).
Participants were further told that the tests differed in how reliable
they were (i.e., their validity) and in how often they discriminated
between sites (i.e., their discrimination rate). To facilitate memori-
zation, the stronger adjective (e.g., “big,” “strong,” “fast,” etc.) was
consistently used as the positive cue value, indicating more oil.
Before the actual decisions started, participants were asked to
rank the five tests according to how useful they thought the tests
were going to be in the experiment. This was done to be able to
check for effects of any preexisting ideas about cue orders. The def-
inition of the word “usefulness” was left open intentionally.
Participants had to choose between two new oil sites, X and Y,
based on the values of test cues that they chose to see. Cue values
were always revealed pairwise, that is, simultaneously for both
alternatives. Participants had to conduct at least one test (i.e., one
cue had to be selected and revealed). After a test had been con-
ducted, participants could either go on with testing or decide in
favor of one of the sites right away by clicking on the “X” or “Y”
button. As soon as a decision between the sites had been made
and entered, outcome feedback was given: Either a box appeared
displaying the word “correct” and the chosen alternative was
circled in green, or a box appeared that said “wrong” and the
chosen alternative was crossed out in red. Furthermore, a cumula-
tive account of the participant’s earnings in petros so far was
displayed on the screen throughout the decision phase, updated
with each cue purchase and correct decision. A screenshot is shown
in Figure 11-4.
Finally, after the 100 decisions had been completed, participants
were asked again to rank the tests according to their usefulness.
Depending on the order they entered, they could increase their
gains by up to 20,000 petros (i.e., E2). Participants were told about
this opportunity for extra reward at the beginning of the experi-
ment to additionally motivate cue-order learning. The actual payoff
was determined by computing the correlation between the par-
ticipants’ final stated rank order and the order that yielded the
highest payoff in the particular environment they experienced and
multiplying this correlation by 20,000 petros. Negative payoffs
were treated as zero.
Results of the Experiment

General Performance and Use of Stopping and Decision Rules People did quite
well on this task: Overall accuracy ranged between 69% and 77%
across the different environments, being lowest in the VAL*DR
environment where both validity and discrimination rate varied.
And they performed rather frugally: In the high-cost conditions,
Figure 11-4: Screenshot from the experimental program depicting

the task participants faced (translated from German). This partici-
pant, on her first trial, has decided to perform the test “microscopic
analysis” first. Although it discriminates, she performs another test,
“geophones,” which shows the same result for both options X and
Y. Two hundred petros has been withdrawn from her account for
conducting these two tests, indicating that she participates in the
low-cost condition.
fewer cues were bought on average (2.2) than in the low-cost condi-
tions (2.8), although even with this frugality, participants earned
less in the more challenging high-cost conditions. On the majority
of trials, search was stopped immediately after having found one
discriminating cue, as specified by one-reason decision mecha-
nisms such as take-the-best: The proportion of one-reason stopping
was substantially higher in the high-cost conditions (at 70%) com-
pared to the low cost conditions (51%) but did not differ between
environments. Participants made choices in accordance with
take-the-best’s decision rule, deciding in line with the first dis-
criminating cue they encountered, on 87% of the trials (including
cases where they went on searching beyond the first discriminating
cue). Both the stopping and decision patterns indicate the strong
impact of the first discriminating cue on the choice that was
ultimately made, and thus both also point to the importance of
the order in which cues are considered. What orders did people
end up using, and were they matched to the structure of the differ-
ent environments?
What Cue Orders Do People Use? As an indicator of the search rule partici-
pants actually used by the end of 100 decisions, we focus on the
cue-order-ranking participants explicitly stated after the decision
phase. First, we checked whether the initial explicit ranking par-
ticipants were asked for was reflected in the final explicit cue order.
The correlation between the first stated and last stated cue order
was on average low (mean r = .27). Participants did not even start to
search cues in the order they initially stated—the correlation
between this and the order in which participants initially looked
up cues was only r = –.05. The correlation between the last stated
cue order and the cue positions on the screen from left to right was
also low (mean r = .10). It can thus be concluded that neither initial
ideas about cue usefulness nor the order in which cues were dis-
played on the screen had a major impact on the search order that
participants used.
At a minimum, we expected participants’ final stated cue orders,
when used in one-reason decision making, to beat looking up
cues in random order. This is indeed the case for all environments
except the most challenging one that combined high cost and a
trade-off between validity and discrimination rate. The average
performance of each participant’s final stated cue order if applied
to all decision pairs they had seen, assuming one-reason stopping
and deciding, is summarized in Table 11-2.
Overall, the analysis of the general performance of the stated cue
orders supports the notion that many participants were able to
learn an adaptive search order. As a next step, we correlated par-
ticipants’ cue orders with four search orders previously proposed
in the literature—validity, discrimination rate, usefulness, and
success—to see if participants approached the expected order in
each environment. However, the average rank-order correlations
are quite low, and sometimes even negative. Only in the first envi-
ronment (VAL) where discrimination rate was kept constant—and
high—while validity varied were participants’ search orders moder-
ately correlated on average with the ecological validity order (mean
rho = .36 in the low-cost and .30 in the high-cost condition).
Of course, participants did not look up all cues on all trials, lim-
iting their ability to estimate orderings by ecological validity,
discrimination rate, success, and usefulness. Also, they checked
different cues unequally often. By taking these different base rates—
the frequency with which a cue has been checked—into account,
we computed the subjective validity (Gigerenzer & Goldstein, 1996),
discrimination rate, success, and usefulness experienced by each
participant as that person chose which cues (and hence cue values)
to observe during decision making. However, there were even
lower correlations (and overlaps) between these subjective mea-
sures and participants’ stated final cue orders.
Table 11-2: Average Performance of Participants’ Final Stated Cue

Orders if Applied to All Decision Pairs in a Given Environment,
Assuming One-Reason Stopping and Deciding
Environment 1 Environment 2 Environment 3
(VAL) (DR) (VAL*DR)
Low High Low High Low High
cost cost cost cost cost cost
Percentage correct
Mean 79% 77% [79%] [79%] 76% 72%
SD 4.5 5.8 1.6 2.1 5.1 5.5
Number of cues checked per trial
Mean [1.84] [1.84] 2.06 2.00 2.27 2.08
SD 0.02 0.03 0.17 0.17 0.23 0.21
Payoff, E
Mean 13.88 9.82 13.66 9.77 12.87 8.11
SD 0.91 1.16 0.42 0.83 0.85 0.62
Note. Values in brackets refer to numbers that are not expected to be different from
random cue order, because cue validity and discrimination rate were held constant
in the first and second environment, respectively. VAL: Environment with varying
validities and equal discrimination rates; DR: environment with varying discrimina-
tion rates and equal validities; VAL*DR: environment with varying validities and
discrimination rates.
Given the surprising lack of a match between participants’ cue

orders and these subjective measures, we reconsidered whether
measures like validity might be too complex for this task and
whether the cue orders might reflect some simpler environmental
attribute. Validity is, after all, a conditional probability (or relative
frequency): the chance that a cue makes a correct decision given
that it discriminates. The way the subjective measures were com-
puted also resembles conditional probabilities: Subjective discrim-
ination rate, for example, can be understood as the probability that
a cue discriminates given that it had been checked. If participants
were creating their cue order by taking into account the number
of times they had checked each cue, these subjective measures
would end up being very similar to ecological validity in terms of
computational and memory demands. But there are simpler uncon-
ditional measures that people could be using instead. For instance,
they could be ordering cues based on the number of correct deci-
sions, the number of discriminations, and the number of correct
minus the number of wrong decisions they experienced for each
cue. Again, we ordered cues based on these experienced tallies
separately for each participant and compared them with each par-
ticipant’s final stated cue order.
When we looked at these simpler ways of ordering cues, we

found that they matched participants’ stated cue orders much more
closely. The correlations between the participants’ stated orders and
cue orders based on the absolute number of correct decisions that
each participant made with that cue are considerably higher (.42 ≤
rho ≤ .66 across environments and conditions; see Table 11-3).
Surprisingly, orders based on a tally of mere discriminations
made with each cue, regardless of whether they indicated a right or
wrong decision, also show strong correlations (.28 ≤ rho ≤ .77).
Only the orders by correct minus wrong tally partly fall behind in
terms of the size of the correlations (−.03 ≤ rho ≤ .55).
Thus, overall, people do seem sensitive to the performance of
the cues they have seen, but in a simple way that does not adjust
for how often they have seen each cue. The differences between
conditions, however, do not show a consistent pattern. One might
expect, for instance, that in the high-cost conditions, participants
would value the number of mere discriminations made by a cue
more than in the low-cost conditions, where accuracy should be
of primary concern. The correlations we found, though, do not
provide any evidence that correct decisions and discriminations
are treated differently depending on the condition.
Very simple tallying of raw frequencies of cue performance thus
seems to best match participants’ learned cue orders. But this does
not tell us how that frequency information is used to build a cue
order. The final cue order alone may be a poor clue to the process
that created it, because the learning-while-deciding setting leads to
Table 11-3: Average Spearman Rank Correlation Coefficients of

Participants’ Final Stated Cue Orders with Rank Orders Based on
Unconditional Cue Performance Measures Each Participant
Experienced in the Course of the Experiment
Average Environment 1 Environment 2 Environment 3
correlation (VAL) (DR) (VAL*DR)
with order
based on: Low High Low High Low High
Number of .66 .66 .54 .57 .42 .67
correct
decisions
Correct minus .36 .39 .55 .54 −.03 .29
wrong
decisions
Number of .77 .67 .61 .59 .28 .61
discriminations
unintuitive interactions between the positions of cues and the

amount of information that is gathered about them. We therefore
need to examine the learning process more closely and find out
how participants translate the feedback they receive about the
cues they see during decision making into a cue order. In the fol-
lowing section, we first describe the participants’ ordering process
in more detail. Then we compute how well several cue-order learn-
ing rules predict participants’ information search.
How Do People Construct Their Cue Order? To get an idea of when and how
participants move cues around in their current cue order, we look
for changes in the order used from one decision trial to the next.
On any given trial t, we assume that participants have a current
cue order, which we infer in the following way: The cues used on
the present trial t are put in a list in the order in which they were
checked. Any missing cues (not checked in the present trial) are
added to the end of the list in order of most recent use, so, for
instance, if cue 4 was used on trial t–1 but cue 2 had not been used
since trial t–3, then cue 4 would be followed by cue 2 in the con-
structed order list. Then we look at the N cues used in trial t+1
and see if they are ordered differently from the first N cues in
the current cue order list. If so, we relate these cue order changes
to the cue values and decision outcome seen on trial t, update the
current assumed cue order for trial t+1, and proceed to consider
trial t+2.
The foremost pattern that emerges from this analysis is that cue
order usually does not change. On 60% of the trials across all par-
ticipants, no change in cue position was observed, regardless of
the previous decision outcome. Some participants did make many
more changes than others, though—the rate of cue-order change
ranged from 1% to 98% of trials for individuals. This is congruent
with a tendency of some participants to converge more quickly and
others less quickly to a particular cue order and then use it for the
remaining trials, mostly without further influence from feedback.
We will come back to this point below.
When cues are used in a different order, to what extent does their
direction of movement follow from their impact on the previous
trial? We only considered cues that were checked at the third posi-
tion in the search order, because for these, both upward and down-
ward movements are equally possible.
When a cue that was looked up at the third position discrimi-
nated and indicated a correct decision, it is 1.5 times more likely to
move up in the order (so it will be checked sooner) than to move
down. In other words, it moved up 28% of the time, stayed in place
54% of the time, and moved down 18% of the time. In contrast,
after wrong discriminations a cue is 1.4 times more likely to move
down. When the third cue is checked but does not discriminate, it
is also more likely (1.6 times) to move down.
How far do moving cues travel in the order? We again concen-
trated on cues that were checked at the third position in the search
order. We found that after correct discriminations, a step size of
+1 is the most frequent (besides a step size of 0, i.e., no movement),
at 17%. After nondiscriminations, a step size of −1 is most fre-
quently observed (21%) and the same holds for wrong discrimina-
tions (21%). Step sizes of +2 and −2 are observed rarely (in 8% and
6% of the cases, respectively, on average across correct, wrong, and
nondiscriminations).
These descriptive analyses provide initial hints that people
might respond to outcome feedback via adaptive changes to the cue
order, that is, moving cues up in the order after they make correct
discriminations, and down after wrong discriminations or after
they failed to discriminate. The finding that there is most often no
change in a cue’s position in the search order, regardless of what
kind of impact the cue had on the previous trial, potentially speaks
against the use of swapping and move-to-front rules and instead
supports rules that converge to (relatively) stable orders. Because
tally and tally swap rules count up correct decisions or discrimina-
tions across all decisions made so far, the relative impact of the
single current decision decreases over the course of the decision
phase, so that cues move less and cue orders become more stable
over trials. As a consequence, these rules might, as we predicted
based on our simulation results, fit behavior better than the simple
swap rule. In addition, the relatively high prevalence of step size +1
after correct discriminations and −1 after wrong discriminations
could be a hint that tally swap rules might fit behavior somewhat
better than complete-reordering tally rules. We find out if that is the
case by next testing the fit of particular learning rules to partici-
pants’ cue search data.
Fit of Learning Rules We tested how well different cue-ordering rules

could account for our participants’ ongoing cue search behavior
using the same basic types of learning rules as those that were
tested in the simulations reported earlier. Motivated by the correla-
tion results reported above, we added two more variants to both
the tally rule and the tally swap rule. These variants count correct
decisions only and discriminations only, instead of counting cor-
rect minus wrong discriminations as in the original tally and tally-
swap rules. With these four additions we have 10 different learning
rules.
We computed the fit of the learning rules (i.e., correctly predicted
matches to an individual’s data) for each participant separately.
For each decision trial, we compared the cue order predicted by
each learning rule with the order in which participants actually

looked up cues. After each decision, the current cue order predic-
tions of each learning rule were updated based on the information
the participant had encountered in that trial. Unpredictable cases
in the first few trials, when no information about a particular cue
had yet been gathered (because it had not been looked up), were
excluded. That is, the fit was only computed for the cases in which
the learning rule made a precise testable prediction about the posi-
tion of a particular cue.
We measure fit as a proportion: Of all cues looked up by a par-
ticipant on a given trial, how many were checked at exactly
the order position predicted by the learning rule? The average
proportions of cue look-up positions correctly predicted by the 10
learning rules are reported in Table 11-4. Across all conditions,
the tally swap rules achieve the highest fit, particularly in the VAL
environment (though they do well in the other environments, too,
consistently beating the other rules). Within this set, the rule that
keeps a tally of just the correct decisions per cue fits best. It cor-
rectly predicts the exact position of half of the cues that were looked
Table 11-4: Proportion of Cues Looked Up by Participants at

Exactly the Position Predicted by the Respective Learning Rule,
and, for Comparison, the Corresponding Proportion Expected
Randomly
Learning rule Overall Environment Environment Environment
mean 1 2 3
(VAL) (DR) (VAL*DR)
Low High Low High Low High
Validity .23 .26 .28 .20 .22 .24 .19
Tally:
Correct − wrong .39 .43 .53 .32 .39 .29 .39
Correct .42 .47 .53 .34 .42 .33 .45
Discriminations .41 .50 .51 .34 .41 .28 .41
Tally swap:
Correct − wrong .49 .58 .51 .50 .50 .41 .46
Correct .50 .59 .52 .51 .50 .41 .49
Discriminations .49 .58 .52 .47 .49 .38 .48
Simple swap .40 .44 .41 .43 .39 .35 .37
Move-to-front .32 .35 .38 .31 .31 .26 .31
Selective .33 .36 .40 .31 .30 .27 .31
move-to-front
Random model .20 .20 .20 .20 .20 .20 .20
up (proportion = .50). The mean distance between its predicted

positions and where each cue was actually looked up was less than
one position (0.87). The validity learning rule achieves the lowest
fit, with proportion correct .23 and distance measure 1.51.
How do the learning rules compare to a random model, with a
new random cue order being generated and applied on each trial?
This random model would lead to an expected proportion of
.20 correct position predictions, and an expected distance of 1.6
positions. All of the proposed learning rules achieve a higher fit to
participants’ data than does this random cue-ordering model,
although the validity rule’s fit is very close to random level, with no
difference to be found in some conditions.
In a second step, we classified participants uniquely according
to the learning rule that predicted the most cue positions correctly
for that participant and that additionally fulfilled the criterion
that the proportion of correctly predicted positions is greater than
.25.2 (For nine participants, two learning rules had the same pro-
portion of correctly predicted positions. These participants were
thus counted as 0.5 for each of the respective rules.) Mirroring
the average results across participants, these findings suggest that
more than half of our participants fall into the class of tally
swap rules (see Table 11-5). Within that set, most participants are
classified with the rule that keeps a tally of correct decisions alone,
closely followed by the rule that tracks correct minus wrong deci-
sions. The plain tally rules that assume complete reordering are
best at predicting the cue search orders used by just over a further
third of the participants. In stark contrast, only very few par-
ticipants are classified as following the validity learning rule, the
simple-swap rule and the two move-to-front rules. The average
proportion of correct cue position predictions achieved for these
participants are also lower than those in the other categories.
Overall, these results indicate that participants generally came up
with adaptive cue orders that worked in a range of environmental
conditions. The cue orders participants explicitly stated at the end
of the experiment achieved better than random performance
in most experimental conditions, even though the decision envi-
ronments participants encountered had very different statistical
characteristics. But at the same time, correlations between partici-
pants’ stated cue orders and the standard search orders that would
have worked well in the different experimental environments
2. This threshold was chosen based on the distribution of the propor-

tion of matches expected from a random model. The mean of this distri-
bution is .20, and the standard deviation is .02, so our threshold is more
than two standard deviations away from the mean and thus expected to be
exceeded by random chance with a probability less than .02.
Table 11-5: Number of Participants (Across All Experimental

Conditions) Classified According to the Learning Rule that
Achieves the Maximum Proportion of Correctly Predicted
Positions for a Particular Participant, Under the Condition that the
Proportion (Whose Means for Each Rule Are Shown) Exceeds the
Threshold of 0.25
Learning rule Number of Mean proportion
participants correctly predicted
positions
Validity 2 .41
Tally:
Correct − wrong 9.5 .56
Correct 11 .62
Discriminations 14.5 .51
Tally swap:
Correct − wrong 23.5 .59
Correct 26.5 .62
Discriminations 16.5 .59
Simple swap 1.5 .38
Move-to-front 2 .40
Selective move-to-front 3 .47
Fit below threshold 10 .22
(e.g., search by ecological validity in the first environment) were

quite low on average. Rather, participants’ cue orders were more
positively correlated with orders based on simple tallies (e.g., of
correct decisions made by each cue). Thus, participants’ cue orders,
though often beating the random cue order used in the minimalist
heuristic, could have done better, lagging behind the respective
environmentally matched orders by a noticeable margin.
These correlational results suggest that participants’ cue-order
construction processes may correspond to learning rules based on
simple, unconditional tallies of cues’ performance. Both the tally
swap rules and, to a slightly lesser extent, the tally rules that
completely reorder the set of cues predict participants’ trial-by-trial
cue search well. Thus, these rules may be psychologically plausible
descriptions of the cue-ordering process. This conclusion is in line
with the research cited earlier on the well-developed human
capacity for frequency processing (Hasher & Zacks, 1984) and the
tendency to sometimes base decisions on raw frequencies of par-
ticular outcomes rather than base-rate-adjusted probabilities (Estes,
1976). If frequencies are indeed recorded and recalled with ease,
then this might also explain why these rules account for partici-
pants’ behavior better than the supposedly simplest, frequency-
ignoring rules, simple swap and the move-to-front rules. As argued
before, storing a cue order, as required by those rules, may be about
as demanding in terms of memory resources as storing a set of tal-

lies, while providing lower performance (as seen from the results of
the simulation study). Thus, it is not surprising that participants
may have behaved instead in accordance with the more accurate
tally and tally swap rules. Additionally, learning rules based on tal-
lies are less and less likely to make changes over time, thus leading
to highly stable cue search orders. In contrast, the simple swap and
move-to-front rules do not stabilize the search order, and changes
are just as likely after many learning trials as they are at the begin-
ning of learning, which may be undesirable.
It was striking how poorly the validity learning rule predicted
participants’ behavior in comparison to the other rules. This was
found even though participants in our experiment could have esti-
mated validity in the online process of decision making in a
relatively simple way, as the ratio of two tallies, contrary to the com-
plex worst-case assumptions made by Juslin and Persson (2002).
Nevertheless, almost nobody appeared to use the validity learning
rule in our experimental setting.
Taken together with the general result that the search orders
used by participants mostly achieved a higher payoff than random
cue ordering, our findings suggest that people adaptively update
their cue search order when engaged in one-reason decision making,
but they do it in simpler ways than prescribed by the validity learn-
ing rule.
When—and Why—People Use Simple Cue-Ordering Rules
Simple one-reason decision heuristics gain much of their power

from searching through cues in a particular order, so that they find
a single good reason to use in making a decision. But finding good
cue orders themselves to guide heuristic search can be computa-
tionally complex. Here, we have investigated ways to reduce the
computational complexity of the setup of one-reason decision heu-
ristics by suggesting simple rules for the construction of cue search
orders. These rules are inspired by early work in computer science
(e.g., Bentley & McGeoch, 1985; Rivest, 1976) on the problem of
ordering a sequential list of repeatedly retrieved items whose
relative importance is not known a priori. Motivated by the severe
constraint on computer memory capacity at that time, simple order-
ing rules relying on small amounts of stored information were
developed. Given that this memory constraint remains important
when it comes to human cognitive capacities, we focused our
exploration on related mechanisms.
Our simulations of a range of simple cue order learning rules
showed that several of them enable one-reason decision making to
quickly perform better than random cue search in terms of accuracy

and frugality. Among these rules, those that reorder cues based on
their tallies of correct minus wrong decisions perform especially
well and very close to the more complex validity learning rule that
reorders cues based on current validity estimates.
We explored whether people actually use such simple cue order
learning rules in an experimental study. We found that those rules
based on simple, unconditional tallies like the number of correct
decisions per cue showed the highest fit with participants’ cue
order construction processes. Moreover, these simple order-learning
rules produced ecologically rational cue orders that yielded both
accuracy and frugality across environments and gave participants
reasonable payoffs, for the most part considerably better than
chance levels.
At the same time, it was clear that participants did not order
cues by estimated validity. Searching through cues in order of
validity is one of the building blocks of the take-the-best heuristic,
whose performance and use in decision tasks has been widely
explored as a mainstay of the simple heuristics approach. Therefore,
given our results, it is important to determine environments and
situations in which we could expect validity-based cue ordering
to be used, as distinct from situations such as the small-sample
learning-while-doing setting that we explored here.
First, validity orders may be found and used more when there is
greater payoff for decision accuracy. It could be that our experimen-
tal setting provided too little pressure for coming up with the best
search order in a given environment, leading to little use of the
validity learning rule. Applying search by validity even in our most
favorable experimental environment—that in which cue validity
varied while discrimination rates were constant—would have
increased the expected payoff by less than E2 compared to random
search, assuming one-reason stopping and deciding. Some partici-
pants might not have cared about a possible reward increment of
this magnitude in exchange for the effort of continuously improv-
ing their search order through careful monitoring of feedback.
This may have been true especially given the already high benefits
resulting from adaptive stopping alone, which are also more imme-
diately noticeable (i.e., in each trial) without the necessity of slow
learning from feedback. Thus, participants might have settled
instead for relatively simple cue-order learning rules.
Second, validity orders may be learned more when there are
greater costs for lower accuracy. Wrong decisions did not incur
a loss of money in our experiment (besides the money spent on
information search). This might have made mistakes less salient,
leading people not to actively punish cues for making mistakes.
(Furthermore, if a cue’s mistaken decisions were not noticed as
much, this could also explain why tally and tally swap rules based
on a count of correct decisions alone showed a higher average fit
than rules that count correct minus wrong decisions per cue.) In
future experiments, it can be tested whether the validity learning
rules achieve a higher fit when a wrong decision would entail a
loss of money just as a correct decision would involve a gain, and
accuracy would thus become even more important.
Third, evolution may have determined some environmental
domains in which it is important to learn valid cues. Such domains
will involve decision problems of adaptive relevance but where
there could also be environment-specific variation that requires
individual learning (as opposed to more stable environments
where knowledge of the cue order itself could be “built in” by evo-
lution). These could include food choice (where cues to what is
edible or poisonous could vary regionally and seasonally) and
avoidance of dangerous animals (where predator prevalence can
vary over space and time). For example, rhesus monkeys can
quickly learn to associate a snake-shape cue, even in a snake-shaped
toy, with a fear response, which then strongly supports the decision
not to approach an animal with that form (Cook & Mineka, 1989,
1990). Note that in these domains motivational and emotional
responses play a role, possibly making the cues more powerful or
even establishing a noncompensatory cue structure that allows
quick decisions based on little information—a quick and hence
adaptive design in high-risk decision domains. However, it is not
easy to tell whether the cues that are used in these decisions really
follow an order by validity, because validity for criteria relevant in
our evolutionary past can be difficult to determine in an objective
way in some of these cases. Moreover, validity may not be the prime
concern in these domains, but rather making quick decisions or
avoiding costly mistakes (Bullock & Todd, 1999).
Fourth, individuals could also learn a validity ordering from
others (or records created by others), in environments that enable
social learning or cultural transmission. In many cases people can
just look up indications of highly valid cues in books or on the
Internet or can directly ask experts. Especially in important and
high-stakes domains, it is likely that someone already has taken the
effort to compute validities based on large data sets, such as the
predictive accuracies of diverse diagnostic cues in medicine, or, as
in our experimental task, the validity of various potential indica-
tors of oil deposits (though such information is unlikely to be made
publicly available). However, for such important decisions, when
the decision maker will probably be held accountable and have to
justify the choice, people are less likely to engage in one-reason
decision making and more likely to gather additional information
before making a decision (Siegel-Jacobs & Yates, 1996; Tetlock &
Boettger, 1989). This can reduce the advantages of having a good

cue search order (but see Dhami & Ayton, 2001, for a contrasting
result in a legal domain).
Finally, and more in reach of further experimental investigation,
there could be situations in which individual learning could lead
to good estimations of the relative validity of cues, particularly
when there is more opportunity to explore the environment. Our
learning-while-doing setting constrained the exploration that par-
ticipants would engage in, because each cue checked cost them
money and could also lead to making worse decisions (if cues, par-
ticularly low-validity ones, indicate the wrong choice). If instead
people were in an explore-first situation, or explore-while-learning,
where low-cost checking of different cues could be done, this could
lead to better estimates of cue validity order. Such exploration
could allow better validity-order learning because under some cir-
cumstances, people can judge correlations quite well (see Alloy &
Tabachnik, 1984, for a review), and ecological validity is a mono-
tonic transform of the Goodman–Kruskal rank correlation (γ = 2v–1;
see Martignon & Hoffrage, 1999), meaning that both measures pro-
duce the same cue order. Even if people cannot keep track of the
correlations among multiple cues simultaneously, focusing on just
the relationship of two variables is more manageable. For example,
research on multiple cue probability learning suggests that there
might be interference effects when cues are concurrently learned
(e.g., Castellan, 1973; Edgell & Hennessey, 1980) that are dimin-
ished when cue–criterion relationships are learned one at a time
(Brehmer, 1973). Validities of cues for certain criteria could thus
possibly be learned one at a time, and when required by the deci-
sion-making task, an order of cues based on these validities could
be assembled ad hoc.
Along these lines, people’s general knowledge about the world
could help them focus on certain cue–criterion relationships and
thus identify valid cues. Research by García-Retamero, Wallin, and
Dieckmann (2007) suggests that people make use of causal informa-
tion about cue–criterion relations as an indicator of highly valid
cues. The researchers found that participants looked up cues that
can be causally linked to the criterion first, before cues for which
such a link was less easily established. Furthermore, participants
were more likely to base their decisions on these causally plausible
cues than on others, and were more accurate in estimating their
validities. Causal knowledge might thus reduce the number of
correlations to keep track of, through targeting particular cues from
an otherwise often wide range of possible cues. This would make
the task more similar to a single-cue learning task, in which, as
mentioned above, people are better able to learn cue validities than
in multiple-cue settings.
In short, in real-world situations that do not involve learning-

while-doing, there could be many possible ways to learn an order
of cues by validity. However, in the online process of simultaneous
learning and decision making we studied here, we did not find evi-
dence that people apply the necessary computations to construct
the validity order for the cues they use.
Conclusions
Individuals facing new decision environments on their own need

to determine what information to use and how to process it to make
good decisions. In an early paper on one-reason decision making,
Gigerenzer and Goldstein (1999) wrote: “If people can order cues
according to their perceived validities—whether or not this subjec-
tive order corresponds to the ecological order—then search can
follow this order of cues” (p. 81). However, it is important to under-
stand when people actually perform such validity ordering, and
how, along with how they arrive at good cue orders the rest of the
time. We have found that in at least one kind of situation where
people have the opportunity to learn to order cues according to
their subjective experienced validity, they instead use the other
simpler tally-based orders. These learning rules appear to be
applied across a range of environment structures and result in cue
orders that produce good accuracy and frugality in one-reason
decision strategies. As such, they help to answer the question of
how ecological rationality can be achieved by individuals encoun-
tering new environments, through an adaptive process of learning
cue orders and making decisions at the same time.
Part V
RARITY AND SKEWNESS IN THE WORLD
12
Why Rare Things Are Precious
How Rarity Benefits Inference
Craig R. M. McKenzie
Valerie M. Chase
Only with the help of . . . bold conjectures can we hope to

discover interesting and relevant truth.
Karl Popper
I magine that you have just moved to a desert town and are trying
to determine if the local weather forecaster can accurately predict
whether it will be sunny or rainy. The forecaster often predicts sun-
shine and rarely predicts rain. On one day, you observe that the
forecaster predicts sunshine and is correct. On another day, she
predicts rain and is correct. Which of these correct predictions
would leave you more convinced that the forecaster can accurately
predict the weather? According to a variety of information-theoretic
accounts, including Bayesian statistics, the more informative of the
two observations is the correct prediction of rain (Horwich, 1982;
Howson & Urbach, 1989). As we show in more detail later, this is
because a correct prediction of sunshine is not surprising in the
desert, where it is sunny almost every day. That is, even if the
forecaster knew only that the desert is sunny, you would expect her
to make lots of correct predictions of sunshine just by chance.
Because rainy days are rare in the desert, a correct prediction of
rain is less likely to occur by chance and therefore provides stron-
ger evidence that the forecaster can distinguish between future
sunny and rainy days. The same reasoning applies to incorrect pre-
dictions: Those that are least likely to occur by chance alone are
most informative with respect to the forecaster’s (in)accuracy.
In short, rarity is valuable. Whether your expectations are
confirmed or violated as a result, observing a rare conjunction of
events is more revealing than observing a common one. Trying to
assess the desert forecaster’s accuracy by checking the weather only
after she predicts sunshine would be like looking for the proverbial
309
310 RARITY AND SKEWNESS IN THE WORLD
needle in a haystack: Because nearly every day is sunny, the more

informative rainy days would be few and far between. Of course, if
you had nothing else to do, you could compare her daily forecasts
of rain or sunshine with the actual weather for hundreds of days in
succession in order to assess her performance. But in case you do
have other things to do, it would be a lot easier just to wait until a
rainy day and check whether the forecaster predicted that day’s
weather correctly. Event rarity matters in the real world because
people are boundedly rational; that is, they have limited time, lim-
ited opportunities to gather information, and limited cognitive
capacities for processing information. Gravitating toward rare
events like rainy days in the desert enables people to zero in quickly
on the most information-rich regions of their environment. Of
course, what is rare depends on the specific setting. For instance, if
the forecaster were predicting weather in a rain forest rather than a
desert, then, assuming the forecaster usually predicts rain, correctly
predicting sunshine would be rarer and therefore more informative
than correctly predicting rain.
Given that rare conjunctions of events are more informative than
common ones, a question naturally arises: Are people sensitive to
event rarity when making inferences? Anecdotal evidence that at
least some people are comes from observing scientists, who strive
to predict events that are unlikely a priori, presumably because
they believe that correct predictions of unlikely events provide rel-
atively strong support for their hypothesis or theory. Consider, for
example, Galileo’s surprising—and famously correct—prediction
that light and heavy objects fall at the same rate. Of course, scien-
tists may be sensitive to rarity when conducting research not
because it is intuitive but because it is prescribed by some philoso-
phers of science (e.g., Lakatos, 1978). That is, professional research-
ers might behave differently from people making inferences in their
everyday lives. Are laypeople also influenced by rarity?
In this chapter, we review evidence showing that people are
remarkably sensitive to the rarity of events when making infer-
ences. Indeed, people are so attuned to event rarity that their
implicit assumptions about rarity guide their thinking even in labo-
ratory tasks where experimenters have implicitly assumed that
rarity would not matter. Participants’ sensitivity to, and assump-
tions about, rarity have important implications for understanding
lay inference.
Much as physicists study falling objects in a vacuum, psycholo-
gists who study intuitive inference typically present participants
with tasks that are abstract or unfamiliar in an attempt to eliminate
real-world influences that are not of theoretical interest. Viewing
the experimental tasks from this perspective, psychologists often
WHY RARE THINGS ARE PRECIOUS 311
turn to content- and context-independent models of inference—

such as logic or probability theory—to determine what constitutes
optimal, or rational, responses in the task. Because participants’
behavior consistently departs from the predictions of these models,
it has been generally concluded that people are poor inference
makers. Psychologists have only recently begun to realize that,
faced with laboratory tasks stripped of content and context, partici-
pants fall back on ecologically rational assumptions, that is, default
assumptions based on their real-world experience. The mismatch
between these assumptions and the content- and context-free
tasks presented to them in the laboratory can make their adaptive
behavior in these experiments appear irrational (for reviews, see
Funder, 1987; Hogarth, 1981; McKenzie, 2005). When observed in
laboratory tasks in which, unbeknownst to participants, these
assumptions are violated, lay inference can look maladaptive.
An important assumption about task environments that is made
by experts and laypeople alike is that events that stand out and are
therefore spoken and thought about—in the context of weather
forecasting, personal health, corporate performance, or any other
realm—are generally rare rather than common (see this also in the
context of recognized vs. unrecognized objects in chapter 5). We
argue that it is adaptive for people to make this rarity assumption
in situations without information to the contrary, because it reflects
the ecology of the real world (see also Dawes, 1993; Einhorn &
Hogarth, 1986; Klayman & Ha, 1987; Oaksford & Chater, 1994). But
we also present a wide range of evidence that people’s behavior is
adaptable in the sense that it is sensitive to violations of the rarity
assumption (McKenzie, 2005). In other words, when the rarity
assumption clearly does not hold, people’s behavior changes largely
in accord with Bayesian prescriptions, often erasing inferential
“errors” or “biases.”
In the next section, we define rarity more precisely and illustrate
the normative importance of rarity in inference. In the four
sections thereafter, we demonstrate the psychological importance
of rarity when people assess covariation between events, evaluate
hypotheses after receiving data, and search for information about
causes and effects. Hypothesis testing and covariation judgment
have been major research topics over the past few decades, but only
recently has it become evident that participants’ assumptions and
knowledge about rarity strongly influence their behavior. After
reviewing the evidence, we argue that, despite the computational
complexity assumed by a Bayesian analysis, simply being influ-
enced more by rare events than by common ones is a boundedly
rational strategy for making inferences that is qualitatively consis-
tent with Bayesian norms.
A Bayesian Analysis of Rarity
What makes an event or observation rare? Because we are con-

cerned with events that either do or do not occur, we define an
event as rare if it is absent more often than not, that is, if it has a
probability of occurrence of less than .50. Of course, events are
more rare (or common) to the extent that they occur with probabil-
ity closer to 0 (or 1).
Imagine again the desert weather forecaster attempting to pre-
dict rain or sunshine. The four possible observations are shown
in Figure 12-1: A correct prediction of rain (Cell A), an incorrect
prediction of rain (Cell B), an incorrect prediction of sunshine
(Cell C), and a correct prediction of sunshine (Cell D). The column
marginal probabilities indicate that rain is rare, occurring on 10%
of days, and that sunshine is common, occurring on the remaining
90% of days. (We use this relatively high rate of desert rain because
using smaller probabilities makes the numbers in our example
inconveniently small.) The row marginal probabilities indicate
that the forecaster predicts rain just as often as it occurs, that is, on
10% of days (i.e., rarely), and predicts sunshine on 90% of days.
Recall that you are trying to determine whether the forecaster
can predict the weather at better than chance-level performance.
The values in each cell in the left matrix in Figure 12-1 indicate the
probability of each observation, given H0, the “chance-level” hypoth-
esis (i.e., that predictions and events are independent, or that ρ, the
true correlation between the forecaster’s predictions and actual out-
comes, is 0). Under this hypothesis, the probabilities in the cells in
the left matrix in Figure 12-1 are the result of simply multiplying the
H0: Prediction and Event Independent (ρ = 0) H1: Prediction and Event Dependent (ρ = .5)
Event Event
Rain Sun Rain Sun
Rain 0.01 0.09 0.1 Rain 0.055 0.045 0.1

Prediction
Prediction
A B A B
C D C D
Sun 0.09 0.81 0.9 Sun 0.045 0.855 0.9
0.1 0.9 0.1 0.9
Figure 12-1: Cell proportions when predictions and events are

independent (H0; left matrix) and when there is a moderate cor-
relation ρ between them (H1; right matrix). In both cases, rain is
predicted to occur, and rain actually occurs, 10% of the time.
respective marginal probabilities, which is appropriate if the

predictions and events are assumed to be independent. For exam-
ple, if the forecaster merely guesses that it will rain on 10% of days
and it does rain on 10% of days, the forecaster would be expected
to correctly predict rain (by chance) on 1% of days (Cell A). Let the
competing hypothesis, H1, be that there is a positive relationship
between predictions and events (say, ρ = .5; details about com-
puting correlations for 2 × 2 matrices can be found later in the sec-
tion on covariation assessment). In this case you would expect that
there is a moderate contingency between the forecaster’s predic-
tions and events rather than no contingency. The right matrix in
Figure 12-1 shows the probabilities under H1.
Now we can ask how informative each of the four possible obser-
vations, or event conjunctions, is given these hypotheses. From a
Bayesian perspective, data are informative, or diagnostic, to the
extent that they help distinguish between the hypotheses under
consideration. Informativeness can be captured using likelihood
ratios. In this chapter, we concentrate on how informative a given
observation is—regardless of the hypothesis it favors—in situations
where the qualitative impact of each observation is clear (A and D
observations always favor one hypothesis, and B and C observa-
tions always favor the other).
Let the numerator of the ratio be the probability of observing
the data assuming that H1 is true, and let the denominator be
the probability of observing the same data assuming that H0 is
true. A datum is diagnostic to the extent that its likelihood ratio
differs from 1. In this example, the likelihood ratio for a Cell A
observation is p(A|H1)/p(A|H0) = .055/.01 = 5.5. That is, a correct
prediction of rain is 5.5 times more likely if there is a moderate
contingency between the forecaster’s predictions and the actual
events than if the forecaster is merely guessing. For the remaining
cells, p(B|H1)/p(B|H0) = p(C|H1)/p(C|H0) = .045/.09 = .5, and
p(D|H1)/p(D|H0) = .855/.81 = 1.06. The fact that the likelihood
ratios for A and D observations (correct predictions) are greater
than 1 indicates that they are evidence in favor of H1, and the like-
lihood ratios of less than 1 for B and C observations (incorrect pre-
dictions) show that they are evidence in favor of H0. The log
likelihood ratio (LLR) is a traditional Bayesian measure that con-
verts the likelihood ratio into bits of information: LLRj =
Abs(log2[p(j|H1)/p(j|H0)]), where j corresponds to Cell A, B, C, or
D (e.g., Evans & Over, 1996; Good, 1983; Klayman & Ha, 1987). The
measure is bounded below by zero and unbounded above. For the
A through D observations in this example, LLR equals 2.46, 1.0, 1.0,
and 0.08 bits, respectively.
Consider first the relationship between the correct predictions
of rain and sunshine, Cells A and D, respectively. Consistent with
the intuitive analysis offered earlier, the correct prediction of rain

is much more informative than the correct prediction of sunshine.
Indeed, the correct prediction of sunshine is virtually uninforma-
tive. Several assumptions were made in the above analysis, how-
ever, including that H1 was ρ = .5, H0 was ρ = 0, and p(predict rain)
= p(rain) = .1. How sensitive to these assumptions is the result that
LLRA>LLRD? As it turns out, the competing hypotheses are irrele-
vant. If the marginal probabilities are the same under
the competing hypotheses, all that is necessary for the correct
prediction of rain to be more informative than the correct predic-
tion of sunshine is that p(predicted rain) < 1−p(rain) (McKenzie &
Mikkelsen, 2007, provide a proof; see also Horwich, 1982; Mackie,
1963; McKenzie & Mikkelsen, 2000; Forster, 1994, provides a
non-Bayesian account of inference in which rarity plays an impor-
tant role). Thus, if rain and predictions of rain are both rare by our
definition—that is, if each has a probability of less than .50—Cell
A is more informative than Cell D.
What about the informativeness of the two wrong predictions?
Under the assumptions outlined earlier, the two wrong predic-
tions are equally informative. All that matters is the relationship
between p(predicted rain) and p(rain). (Again, the competing hypo-
theses are irrelevant.) Because these two probabilities are equal
in the above example, LLRB = LLRC. However, if p(predict rain)
< p(rain), the wrong prediction of rain is more informative, and if
p(predict rain) > p(rain), the wrong prediction of sunshine is more
informative. Put differently, if the forecaster is biased to predict
sunshine, then a wrong prediction of rain is the strongest dis-
confirming outcome, and if the forecaster is biased to predict rain,
then a wrong prediction of sunshine is the strongest disconfirming
outcome.
The four panels in Figure 12-2 show each of the four cells’
informativeness (LLRj) as a function of p(predict rain) and p(rain),
which were orthogonally varied between .1 and .9 in steps of .1
(resulting in 81 data points in each panel). H1 was ρ = .1 and
H0 was ρ = 0. [The low ρ value for H1 was used because there are
low upper bounds on positive values of ρ when p(predict rain) or
p(rain) is low and the other is high.] The top left panel shows that
a Cell A observation is most informative when both probabilities
are low; the top right panel shows that Cell B is most informative
when p(predict rain) is low and p(rain) is high; the bottom left
panel shows that Cell C is most informative when p(predict rain)
is high and p(rain) is low; and the bottom right panel shows that
Cell D is most informative when both probabilities are high.
The important point is that rarity—how often the forecaster pre-
dicts rain versus sunshine and how often it is rainy versus sunny—
almost single-handedly determines the informativeness of the
Cell A Informativeness (LLR)
Cell B Informativeness (LLR)

1.0 3.5
3.0
0.8
2.5
0.6 2.0
0.4 1.5
1.0 1.0 1.0
0.2
0.8 0.5 0.8
)
in
in
0.6 0.6
ra
ra
0.0 0.0
0.4 0.4
ict
ict
1.0 0.8 1.0 0.8
ed
ed
0.2 0.2
0.6 0.6
pr
pr
0.4 0.0 0.4 0.0
p(
p(
0.2 0.0 0.2 0.0
p (rain) p (rain)
Cell C Informativeness (LLR)
Cell D Informativeness (LLR)

3.5 1.0
3.0
0.8
2.5
2.0 0.6
1.5 0.4
1.0 1.0 1.0
0.2
0.5 0.8 0.8
)
)
in
in
0.6 0.6
ra
0.0 0.0
ra
0.4
ict
0.4
ict
1.0 0.8 1.0 0.8
ed
0.2
ed
0.6 0.4 0.6 0.4 0.2
pr
pr
0.2 0.0 0.0 0.2 0.0 0.0
p(
p(
p (rain) p (rain)
Figure 12-2: The log likelihood ratio (LLR) of a datum in each

of the four cells (A, B, C, and D) as a function of p(predict
rain) and p(rain). The informativeness measure used is LLRj =
Abs(log2[p(j|H1)/p(j|H0)]), where j corresponds to the particu-
lar cell. To generate the data in the figure, hypothesis H1 was that
ρ = .1 (i.e., there was a weak positive relationship between predic-
tions of rain and actual rain) and H0 was that ρ = 0 (i.e., predictions
of rain and actual rain were independent).
different outcomes. Of course, this analysis generalizes beyond

the forecasting example and is applicable to any situation in which
one is trying to ascertain whether two binary variables are related
(e.g., a symptom and a disease; handedness and a personality trait;
for similar analyses, see Evans & Over, 1996; Nickerson, 1996;
Oaksford & Chater, 1994).
Note that this analysis is incomplete in the sense that it consid-
ers only likelihood ratios and ignores the inference maker’s degree
of belief in H1 as opposed to H0 before and after observing particu-
lar events, which in Bayesian terminology are referred to as the
prior probability and the posterior probability, respectively. A more
complete analysis would take into account, for example, prior
beliefs regarding the weather forecaster’s ability to predict at better-
than-chance-level performance. We emphasize the likelihood ratio
because of our interest in how people perceive data informative-

ness rather than how they incorporate information, once garnered,
into their beliefs. Moreover, as will be shown later, not only the
prior and posterior probabilities but the specific dependence
hypothesis (the hypothesis specifying that there is a relationship
between the variables) under consideration has surprisingly little
impact on this and other measures of informativeness.
Furthermore, our analysis thus far has concentrated on the
informativeness of passively witnessed outcomes. What about situ-
ations in which one must decide how to actively search for infor-
mation (as discussed for cue search in chapter 10)? If you had to
choose between checking whether a prediction of rain is correct
and checking whether a prediction of sunshine is correct, for exam-
ple, which would better help you determine whether or not the
forecaster is capable? Because you do not know which outcome
will occur (e.g., when checking a prediction of rain, you do not
know whether you will find that it subsequently rained or was
sunny), considerations of expected informativeness come into play.
Here, too, event rarity is crucial. We present a more formal analysis
of information search in a later section.
We now briefly review several areas of research in which partici-
pants’ sensitivity to rarity has turned out to be key to understand-
ing their inference-making behavior. In several cases, what have
traditionally been interpreted as biases on the part of participants
have turned out instead to be adaptive behavior, driven to a large
extent by participants’ reasonable ecological assumptions about
event rarity.
Covariation Assessment
Imagine that, after moving to the desert town, you occasionally

experience allergic reactions, but you do not know why. You might
attempt to discern which events tend to precede the reactions. That
is, you might try to figure out what events covary, or tend to go
together, with your allergic reactions. One can think of this in terms
of the familiar 2 × 2 matrix (e.g., Figure 12-1): For example, when
you are around cats, how often do you have a reaction, and how
often do you not have a reaction? And when you are not around
cats, how often do you have a reaction and how often not? Accurately
assessing how variables covary is crucial to our ability to learn
(Hilgard & Bower, 1975), categorize objects (Smith & Medin, 1981),
and judge causation (Cheng, 1997; Cheng & Novick, 1990, 1992;
Einhorn & Hogarth, 1986; for reviews, see Allan, 1993; McKenzie,
1994). In a typical covariation task, participants are asked to
assess whether (or how strongly) two variables, both of which can
be present or absent, are related. Consider the following scenario

used by McKenzie and Mikkelsen (2007). Participants were asked
to uncover the factors that determine whether people have person-
ality type X or personality type Y. They were informed that every-
one has either one personality type or the other. The factor to be
examined was genotype, and participants were told that everyone
has either genotype A or genotype B. To find out if there was a
relationship between genotype and personality type, participants
viewed records that stated whether each person had genotype A
(yes or no) and personality type X (yes or no). Note that these
records were described in terms of the presence and absence of
genotype A and personality type X.
Participants were shown two different (purportedly random)
samples of nine people, given at the top of Table 12-1 (Condition
1). The frequencies indicate the number of people falling into
each category for each sample. For instance, six of the nine people
in Sample 1 had genotype A and personality type X. Participants
were asked whether Sample 1 or Sample 2 provided stronger
support for a relationship between genotype and personality type.
Most (76%) selected Sample 1, in which the large frequency corre-
sponded to the joint presence of the two variables (the yes/yes
category), traditionally labeled Cell A in the covariation literature.
In another condition (Condition 2 in Table 12-1), the labeling of
the observations in terms of yes and no was reversed without
altering the logical identity of each observation. Rather than indi-
cating whether or not each person had genotype A and personality
type X, the records showed whether each person had genotype B
(yes or no) and personality type Y (yes or no). For example, a person
identified in Condition 1 as genotype A/personality type X (Cell A)
was instead identified in Condition 2 as not-genotype B/not-per-
sonality type Y (Cell D). Participants in this condition were pre-
sented with the two samples of nine people shown in Table 12-1,
Condition 2.
Note that these two samples are equivalent to their counterparts
presented earlier (Condition 1). For example, the two Sample 1s are
the same; the categories are simply labeled differently. Nonetheless,
Table 12-1 shows that participants’ preferences reversed: Now
most participants reported that Sample 2 provided stronger evi-
dence of a relationship between genotype and personality type.
These results replicate what has been found in numerous previous
studies, namely, that the number of Cell A (joint presence) observa-
tions has a much larger impact on judgments of covariation than
does the number of Cell D (joint absence) observations (Kao &
Wasserman, 1993; Levin, Wasserman, & Kao, 1993; Lipe, 1990;
Schustack & Sternberg, 1981; Wasserman, Dorner, & Kao, 1990). In
terms of impact, the ordering of the cells is often A>B≈C>D.
Table 12-1: Composition of Conditions Along With Results From
McKenzie and Mikkelsen’s (2007) Covariation Study
Factor Sample 1 Sample 2 Cell

present?
Condition 1 (Abstract)
Genotype A/Personality X Yes/Yes 6 1 A
Yes/No 1 1 B
No/Yes 1 1 C
No/No 1 6 D
Participants (%) choosing 76.3 23.7
sample as strongest
evidence of relationship
Condition 2 (Abstract)
Genotype B/Personality Y No/No 6 1 D
No/Yes 1 1 C
Yes/No 1 1 B
Yes/Yes 1 6 A
sample as strongest
Condition 3 (Concrete)
Disturbed/Dropout Yes/Yes 6 1 A
Yes/No 1 1 B
No/Yes 1 1 C
No/No 1 6 D
sample as strongest
Condition 4 (Concrete)
Healthy/Graduate No/No 6 1 D
No/Yes 1 1 C
Yes/No 1 1 B
Yes/Yes 1 6 A
sample as strongest
Note. Sample columns indicate number of fictional people in each sample with
indicated factors present or absent. Participants considered the sample in which
the large frequency corresponded to Cell A (rather than Cell D) to provide the
strongest evidence of a relationship—except in Condition 4, where participants
knew that Cell A observations were common. In that condition, participants
considered the large Cell D sample to provide the strongest support.
318
A model considered normative by covariation researchers is the

phi coefficient: φ = (AD−BC)/[(A+B)(C+D)(A+C)(B+D)]1/2, where A,
B, C, and D correspond to the respective cell frequencies. Phi is a
special case of Pearson’s product-moment correlation coefficient,
ranging between −1 and 1. (Whereas ρ, discussed earlier, is a popu-
lation parameter, φ is a sample statistic.) The closer this coefficient
is to 1 (−1), the stronger the positive (negative) relationship between
the variables: One variable is more (less) likely to be present when
the other variable is present rather than absent. When φ = 0, the
variables are independent. In Table 12-1, reversing the frequencies
in Cells A and D (both of which provide evidence of a positive
relationship) leaves φ unchanged. Thus, all the samples show the
same objective phi correlation, namely, .36. Because the four cells
contribute equally to φ, their differential impact on perceived cor-
relation has been routinely interpreted as a fallacy in people’s rea-
soning. For example, Kao and Wasserman (1993, p. 1365) stated,
“It is important to recognize that unequal utilization of cell infor-
mation implies that nonnormative processes are at work,” and
Mandel and Lehman (1998) attempted to explain differential cell
utilization in terms of a combination of two reasoning biases.
Note that the traditional normative view of the task is a logical
one that leaves no room for ecological variables, such as how
rare the events are. Phi is a descriptive statistic that merely sum-
marizes the presented information. No information beyond the
four cell frequencies is considered relevant; it would be considered
an error if any additional information or beliefs were to influence
judgment.
An ecological Bayesian account can explain, in contrast, the
larger impact on perceived correlation of joint presence relative
to joint absence. If it is assumed that the presence of events is
rarer than their absence, then joint presence is more informative
than joint absence. The assumption is that (a) the observations
in the matrix are sampled from a larger population of interest, and
(b) there are competing hypotheses, for example, that there is either
a positive relationship (ρ = .5) or no relationship (ρ = 0) between the
variables. Observing the rare observation, Cell A, distinguishes
better between the competing hypotheses. If presence of the two
variables were rare, then it would not be surprising to see both vari-
ables absent, a Cell D observation, even if the variables were inde-
pendent. In contrast, observing their joint presence would be
surprising, especially if the variables were independent. Joint pres-
ence provides stronger support than joint absence for the hypothe-
sis that the variables are related.
Note, then, that if the presence of the two variables is rare, Cell
A is more informative than Cell D. Furthermore, depending on the
competing hypotheses, Cells B and C can fall between Cells A and
D in terms of informativeness (see Figure 12-2). Of course, this is

consistent with the robust finding that, in terms of participants’
reported subjective impact of different cells on judgment, the
ordering is A>B≈C>D. Thus, assuming that presence is rare, a nor-
mative Bayesian account can naturally explain the perceived dif-
ferences in cell informativeness (see also Anderson, 1990).
Does the presence of an event of interest tend to be rarer than its
absence? That is, might it be adaptive to assume that presence is
rare? The answer will probably vary across specific domains, but we
believe that in the vast majority of cases the answer is yes. Most
things are not red, most things are not mammals, most people do not
have a fever, and so on. Moreover, most things people bother to
remark on—whether “remarking on” something means noticing it or
communicating about it—are rare, or else they would not be worth
remarking on (see chapters 4 and 15 on such skewed environment
distributions and chapter 5 on what people talk about and thus rec-
ognize). We are not making a claim about metaphysics, but about
how people use language. Imagine two terms, “X” and “not-X” (e.g.,
red things and non-red things), where there is no simple non-negated
term for not-X. If (as we expect) not-X is usually a larger category
than X, then it is plausible that people learn early on that the pres-
ence of an event of interest is usually rarer than its absence, and
furthermore that observing the joint presence of two such events
is therefore usually more informative than observing their joint
absence. What looks like a bias in the laboratory might reflect deeply
rooted tendencies that are highly adaptive in the real world.
Is it possible to get participants to reverse their preference for
Cell A over Cell D? That is, might participants’ approach to
covariation assessment be adaptable as well as generally adaptive?
The most likely way to demonstrate adaptability would be to use
concrete variables that participants are familiar with. Ideally, par-
ticipants would already know how common the levels of each
variable are. Tapping into participants’ real-world knowledge about
rarity can have large effects on behavior in the direction predicted
by the Bayesian account (McKenzie & Mikkelsen, 2000; see also
McKenzie, 2006). To test this idea, McKenzie and Mikkelsen (2007)
asked participants in the concrete condition of their experiment to
imagine that they worked at a large high school and were trying to
uncover factors that determine students’ “high school outcome”:
whether they drop out or graduate. The factor being examined was
students’ “emotional status.” All students were said to undergo a
thorough psychological examination during their freshman year
and to be categorized as either emotionally disturbed or emotion-
ally healthy. Though it was assumed that participants knew that
dropping out and being emotionally disturbed are both rare events,
this was reinforced in the task instructions.
These concrete participants were told that they had access to the
records of former students in order to find out if there was a rela-
tionship between students’ emotional status and high school out-
come. Half of these participants were told that each record listed
whether the student was emotionally disturbed (yes or no) and
whether the student dropped out (yes or no). Thus, the presence
(i.e., the “yes” level) of each variable was rare, making a Cell A
observation rarer than a Cell D observation. When presented with
the two samples of nine observations (see Condition 3 in Table
12-1), one with many Cell A observations and one with many Cell
D observations, the Bayesian account predicts the same results that
have been found in earlier covariation studies, including the ones
reported above: Because presence is rare in this condition, partici-
pants should find the large Cell A sample as providing stronger
evidence of a relationship between emotional health and high
school outcome. Indeed, this is what McKenzie and Mikkelsen
(2007) found: Table 12-1 shows that more than 70% of participants
selected the large Cell A sample.
The key condition was the one remaining: Some participants
were presented with the same concrete scenario but simply had the
labeling reversed, just as in the abstract condition (see Condition 4
in Table 12-1). Rather than indicating whether each student was
emotionally disturbed and dropped out, the records indicated
whether each was emotionally healthy (yes or no) and whether each
graduated (yes or no). Thus, the absence of each of these variables
was rare, making Cell A more common than Cell D. The Bayesian
perspective leads to a prediction for this condition that is the oppo-
site of all previous covariation findings: Participants will find Cell D
information most informative. McKenzie and Mikkelsen (2007) again
found that the results were consistent with the Bayesian account. As
shown in Table 12-1, only 33% of these participants selected the
sample with the large Cell A frequency as providing stronger sup-
port; that is, most found the large Cell D sample most supportive.
This is the first demonstration of such a reversal of which we
are aware. The results provide strong evidence for the hypothesis
that the robust Cell A bias demonstrated over the past four decades
stems from (a) participants’ ecological approach to the task (consis-
tent with the Bayesian perspective), and (b) their default assump-
tion (perhaps implicit) that presence is rare. When there is good
reason to believe that absence is rare, Cell D is deemed more
informative, just as the Bayesian approach predicts. Note that the
behavior of both the concrete and the abstract groups is explained
in terms of their sensitivity to rarity: The former exploited real-
world knowledge about which observations were rare, and the
latter exploited knowledge about how labeling indicates what is
(usually) rare (see also McKenzie, 2004a).
Hypothesis Evaluation With Passive Observation
Suppose you are at an art museum with a friend who is unfamiliar

with art, and she occasionally remarks that she likes particular
pieces. Based on this information, you try to figure out what she
likes, and you are beginning to think, or hypothesize, that she likes
modern art. The next piece you encounter is from the Renaissance,
and your friend says nothing. Would this affect your confidence in
your hypothesis that your friend likes modern art?
In this example, you passively receive data and update confi-
dence in your hypothesis. Such hypothesis evaluation is a passive
form of hypothesis testing, to be distinguished from active hypoth-
esis testing (discussed in the next section), where you actively
choose which information to gather (e.g., you would decide which
pieces to ask your friend about; for reviews, see Klayman, 1995;
McKenzie, 2004b; Poletiek, 2001). Like covariation assessment,
hypothesis evaluation is concerned with the passive receipt of
information and can be thought of in terms of a 2 × 2 matrix. Your
friend not commenting on a Renaissance piece could be seen as a
Cell D observation, and her announcement of liking a piece of
modern art could be seen as a Cell A observation. Despite some
similarities, hypothesis evaluation and covariation assessment
tasks differ in potentially important ways. One is that the levels of
the variables in hypothesis evaluation are often symmetrical (e.g.,
introvert/extrovert), whereas in covariation assessment they are
traditionally asymmetrical (e.g., treatment/no treatment). In addi-
tion, the task instructions are different. In hypothesis evaluation,
participants are often asked to evaluate “If X, then Y” statements,
whereas in covariation assessment, participants are asked to assess
a relationship between variables.
Now imagine that you are a researcher investigating a possible
relationship between genetics and personality type. Assume that
everyone has either genotype A or genotype B and that everyone
has either personality type X or personality type Y. You are evaluat-
ing the following hypothesis: “If a person has personality type Y,
then he/she has genotype B” (or “Y → B”). Of the first two people
you observe, one has genotype A and personality type X (which we
will call AX) and one has genotype B and personality type Y (BY).
Both of these observations support the hypothesis, but which do
you think provides stronger support?
When McKenzie and Mikkelsen (2000) presented this unfamil-
iar, rather abstract task to participants, more than 70% of them
chose the BY observation as most supportive when forced to
choose between the BY and AX observations. Of participants asked
to evaluate the hypothesis “If a person has genotype A, then he/she
has personality type X” (or “A → X”), almost 80% selected the AX
100
Participants Choosing Rare Observation (%)
Rare Observation Mentioned in Hypothesis
90 Common Observation Mentioned in Hypothesis
80
70
60
50
40
30
20
10
0
Abstract Abstract + Concrete Concrete +
Statistics Statistics
Figure 12-3: Results for the hypothesise study (McKenzie &

Mikkelsen, 2000). Shown is the percentage of participants select-
ing the rare observation as a function of whether the task was
abstract or concrete, whether statistical information about rarity/
commonality was provided, and whether the rare observation was
mentioned in the hypothesis. (The “abstract” group had no infor-
mation about rarity.) Generally, participants were more likely to
correctly select the rare observation as more informative when the
task was concrete, statistical information was provided, and the
rare observation was mentioned in the hypothesis. Most interesting
is that participants in the “concrete + statistics” group (far right)
often selected the rare observation regardless of whether it was
mentioned in the hypothesis, which is in contrast to the traditional
finding that observations mentioned in hypotheses are considered
most informative.
observation as most supportive. The results for these two “abstract”

groups are illustrated on the left side of Figure 12-3. The tall first
column shows that most participants selected the BY observation
when testing Y → B, and the short second column shows that few
selected the BY observation when testing A → X. (Although the
abstract groups had no information regarding rarity, the BY obser-
vation is referred to as the “rare observation” in Figure 12-3 for
reasons that will become clear shortly.)
From the perspective of the logical approach to this problem,
these participants’ behavior is peculiar. The two hypotheses are
logically equivalent (one is the contrapositive of the other), and
therefore whichever observation supports one hypothesis most
strongly must also support the other hypothesis most strongly.
Nonetheless, participants selected different observations depending

on which logically equivalent hypothesis was presented to them. In
particular, note that the observation mentioned in the hypothesis is
usually considered most supportive in each case. That is, when test-
ing Y → B, the BY observation is seen as most supportive, and when
testing A → X, the AX observation is seen as most supportive. First
demonstrated decades ago, this phenomenon is known, depending
on the inferential context, as “confirmation bias,” “matching bias,”
or “positive testing” (Evans, 1989; Fischhoff & Beyth-Marom, 1983;
Klayman & Ha, 1987; McKenzie, 1994; Mynatt, Doherty, & Tweney,
1977; Wason, 1960; see also McKenzie, 1998, 1999; McKenzie,
Wixted, Noelle, & Gyurjyan, 2001). It is perhaps the most commonly
reported finding in the hypothesis-testing literature.
Note that the logical view of the task leaves no room for eco-
logical variables, such as how rare the events mentioned in the
hypothesis are. When testing P → Q, the logical perspective
considers irrelevant what P and Q are and any knowledge the
tester has about P and Q. An ecological Bayesian perspective, by
contrast, leaves room for considerations such as rarity. Are lay
hypothesis testers influenced by rarity information? To address this
question, McKenzie and Mikkelsen (2000) told additional par-
ticipants evaluating each of the above hypotheses that few people
have genotype B (and most have genotype A) and few have per-
sonality type Y (and most have personality type X)—information
that, from a Bayesian perspective, makes the BY observation most
supportive because it is rare. As shown in Figure 12-3, these
“abstract + statistics” participants were about as likely as the
“abstract” participants to select the BY observation when testing
Y → B (compare the first two light gray columns). However, this is
not too surprising because the BY observation is mentioned in the
hypothesis in both cases. More interesting are the results when the
BY observation was not mentioned in the hypothesis (dark gray col-
umns). As can be seen, the “abstract + statistics” group was about
twice as likely as the “abstract” group to select the BY observation
when testing A → X (compare the first two dark gray columns). That
is, participants were more likely to select the unmentioned obser-
vation if they were told that it was rare rather than told nothing.
The above results were for abstract, unfamiliar hypotheses. Even
the rarity information provided was arbitrary and probably had
little meaning for participants. One might expect that sensitivity
to rarity would increase when participants are presented with
familiar variables that tap into their real-world knowledge regard-
ing rarity. To this end, additional participants were told that
they were researchers examining a possible relationship between
mental health and AIDS. These participants tested one of two
concrete hypotheses: “If a person is HIV-positive (HIV+), then he/she
is psychotic” or “If a person is mentally healthy, then he/she is

HIV-negative (HIV−).” They then selected whether a person who is
HIV+ and psychotic or a person who is HIV− and mentally healthy
provided stronger support for the hypothesis they were evaluating.
Again, the two hypotheses are logically equivalent and both
observations support both hypotheses. However, the HIV+/psy-
chotic observation is relatively rare—and participants presumably
knew this. Figure 12-3 shows that when these “concrete” partici-
pants tested “mentally healthy → HIV−” almost half of them selected
the rare HIV+/psychotic person (dark gray column). That is, the
unmentioned observation was often seen as most supportive if it
was rare.
A final group of participants was given one of the two concrete
hypotheses to evaluate but was “reminded” that few people are
HIV+ (and most are HIV−) and that few are psychotic (and most are
mentally healthy). Figure 12-3 shows that almost 70% of these
“concrete + statistics” participants testing “mentally healthy →
HIV−” selected the HIV+/psychotic person—the unmentioned
observation—as most supportive (dark gray column). Regardless of
which hypothesis they were testing, “concrete + statistics” partici-
pants were about equally likely to select the HIV+/psychotic person.
When real-world knowledge was combined with a validation of
their beliefs about rarity, participants preferred the normatively
more supportive rare observation, regardless of whether it was
mentioned in the hypothesis.
In short, then, these results indicate that when participants eval-
uate abstract, unfamiliar variables and there is no explicit informa-
tion about rarity—that is, in the usual laboratory task—participants
deem the mentioned confirming observation most informative.
However, the unmentioned confirming observation was more likely
to be chosen (a) when concrete hypotheses were used, which
allowed participants to exploit their knowledge about rarity, and
(b) when explicit information about rarity was provided. The com-
bination of the concrete hypothesis and rarity “reminder” led most
participants to correctly select the rare confirming observation,
regardless of whether it was mentioned in the hypothesis.
Knowledge about rarity—which is traditionally considered irrele-
vant to the task but is crucial in an ecological framework—virtually
erased the common bias found in hypothesis testing.
One question remains. The above findings show that partici-
pants’ hypothesis-testing strategies are adaptable in that they
change in a qualitatively appropriate manner when information
about rarity is provided. However, what about the apparent default
strategy of deeming the mentioned confirming observation most
informative? Why is this the default strategy? Is it adaptive, reflect-
ing how the world usually works?
Indeed, one can make normative sense out of the default strategy
if, when testing X1 → Y1, X1 and Y1 (the mentioned events) are
assumed to be rare relative to X2 and Y2 (the unmentioned events).
If this were so, then the mentioned confirming observation
would be normatively more informative than the unmentioned
confirming observation. In other words, it would be adaptive to
treat mentioned observations as most informative if hypotheses
tend to be phrased in terms of rare events. Do laypeople tend to
phrase conditional hypotheses in terms of rare events?
Consider the following scenario: A prestigious college receives
many applications but admits few applicants. Listed in Table 12-2
is information regarding five high school seniors who applied last
year. Next to each applicant is a rating from the college in five
categories. In each category, one candidate was rated “high” and
the other four were rated “low.” On the far right is shown that only
one of the five candidates was accepted. Given the information,
how would you complete the statement: “If applicants ________,
then ________”?
You probably noticed that only SAT scores correlate perfectly
with whether the applicants were rejected or accepted. Importantly,
however, a choice still remains as to how to complete the state-
ment. You could write, “If applicants have high SAT scores, then
they will be accepted” or “If applicants have low SAT scores, then
they will be rejected.” Both are accurate, but the former phrasing
targets the rare events, and the latter targets the common ones.
McKenzie, Ferreira, Mikkelsen, McDermott, and Skrable (2001)
presented such a task to participants, and 88% filled in the condi-
tional with, “If applicants have high SAT scores, then they will be
accepted”—that is, they mentioned the rare rather than the common
events. Another group was presented with the same task, but the
college was said to be a local one that did not receive many appli-
cations and admitted most applicants. “Accepted” and “rejected”
Table 12-2: Example of a Scenario Used to Study How People

Phrase Conditional Hypotheses (McKenzie, Ferreira, et al., 2001)
GPA SAT Letters of Inter- Extra- Application
scores recom- view curricular outcome
mendation activities
Alice Low Low High Low Low Rejected
Bill Low High Low Low Low Accepted
Cindy Low Low Low Low High Rejected
Dennis Low Low Low High Low Rejected
Emily High Low Low Low Low Rejected
were merely reversed in the above scenario, as were “high” and

“low.” Everything else was the same. Now only 35% filled in the
conditional with “If applicants have high SAT scores, then they
will be accepted.” Most participants targeted the rare events, “If
applicants have low SAT scores, then they will be rejected.” Thus,
whether particular events were mentioned depended on whether
they were rare. Virtually identical results were found using other
scenarios with different content.
Thus, people appear to have a tendency—often a very strong
one—to phrase conditional hypotheses in terms of rare rather than
common events. We believe this answers the question of why
people consider mentioned confirming observations to be more
informative than unmentioned confirming observations: Mentioned
observations generally are more informative because they are rare.
The findings discussed earlier in this section indicate that
people are sensitive to rarity when evaluating hypotheses, that is,
that people’s intuitive hypothesis-evaluation strategies are adapt-
able. The findings discussed immediately above indicate that a
default strategy of deeming the mentioned confirming observation
most informative is also adaptive because such observations usu-
ally are most informative in the real world (see also McKenzie,
2004b). Understanding the environmental conditions under which
people typically operate, together with normative principles that
make sense given these conditions, thus can help explain why
people behave as they do.
Hypothesis Testing With Active Search
Suppose you think that hormone replacement therapy, which is

administered to some postmenopausal women, causes breast
cancer. How should you go about gathering information to test this
hypothesis? For example, would it be more useful to find out what
percentage of women who receive hormone replacement therapy
develop breast cancer or what percentage of women with breast
cancer have received hormone replacement therapy?
As every statistics textbook impresses on its readers, correlation
is not causation. But experts and laypeople alike take covariation
information such as that presented in Figure 12-1 into account
when making inferences. Whereas previous sections have exam-
ined how people make use of passively received data, the topic of
this section is how people should and do search actively for cova-
riation data in testing hypotheses about cause–effect relationships.
From an ecological perspective, rarity matters as much when people
actively search for information as when they observe it passively
(e.g., in the hypothesis-evaluation case from the previous section).
In fact, because searching for information is costlier than merely

registering it (see chapter 10 on inferences using search vs. infer-
ences from givens), sensitivity to the relationship between informa-
tiveness and rarity would seem to be even more advantageous in
active search contexts. Why expend effort looking for relatively
nondiagnostic data if more diagnostic data are available? Below we
explore whether people are sensitive to rarity under conditions of
active information search.
Hypothesis Testing the Hard Way

Adapted from a classic reasoning problem designed by Wason
(1968), the causal selection task simulates real-world information
search in a laboratory context. In its most common form, it allows
participants to perform up to four tests of a causal hypothesis relat-
ing a possible cause to an effect by examining up to four samples
of events: (a) events in which the cause is known to be present
(cause test), (b) events in which the effect is known to be present
(effect test), (c) events in which the cause is known to be absent
(not-cause test), and (d) events in which the effect is known to be
absent (not-effect test). In each case, the participant will find out
about the unspecified information (presence of cause or effect) in
the items in the sample. If in testing your hypothesis that hormone
replacement therapy causes breast cancer you chose the cause test,
you could ask, say, 100 women who received long-term hormone
replacement therapy (cause) whether they now have breast cancer
(effect). If you chose the not-effect test, you could ask 100 women
who do not have breast cancer whether they ever received long-
term hormone replacement therapy. When choosing what data to
gather, it is not informativeness but expected informativeness that
you should maximize.
To understand where the “expected” in expected informative-
ness comes from, let us first flesh out our ecological Bayesian
analysis by mapping the data relevant in causal hypothesis testing
onto a 2 × 2 matrix. In each matrix depicted in Figure 12-4, the
top and bottom rows correspond to the presence and absence of
the cause, respectively, while the left and right columns correspond
to the presence and absence of the effect, respectively. The cells
representing the four possible conjunctive pairs of these events can
be denoted A, B, C, and D and expressed as joint probabilities, as
in Figure 12-1. Let the effect be 10 times rarer than the cause:
p(cause) = .1 and p(effect) = .01. Assume for the moment that the
hypothesis under test, H0, corresponds to ρ = 0 (see the left panel
of Figure 12-4) and that the hypothesis against which it is being
compared, H1, corresponds to ρ = .1 (right panel in Figure 12-4).
(Because the cause occurs 10 times more often than the effect
H0: Cause and Effect Independent (ρ = 0) H1: Cause and Effect Dependent (ρ = .1)
Effect? Effect?
Yes No Yes No
Yes 0.001 0.099 0.1 Yes 0.004 0.096 0.1

Cause?
Cause?
A B A B
C D C D
No 0.009 0.891 0.9 No 0.006 0.894 0.9
0.01 0.99 0.01 0.99
Figure 12-4: Joint probability distributions representing a causal

hypothesis (H1) and its alternative (H0). In this example, hypoth-
esis H0 is that ρ = 0, and H1 is that ρ = .1. Note that p(cause) = 0.1
and p(effect) = 0.01 regardless of which hypothesis holds.
and a correlation of 1 would mean the cause and effect always

co-occur, the highest possible correlation between them is con-
siderably less than 1—specifically, ρ = .3. Relative to this maxi-
mum, the correlation under H1 is thus fairly strong.) To represent
the fact that the hypothesis tester has a sense of the marginal event
probabilities—from sources including daily experience and media
coverage—these remain the same regardless of which hypothesis
holds.
Comparison of the tables in Figure 12-4 makes it clear that the
data in the four cells discriminate between the hypotheses to differ-
ent degrees. For example, because both the cause and the effect
are rare (p < .5), Cell A is more informative than Cell D. The LLR
for Cell A is 2, whereas that for Cell D is 0.0049.
Not only do the four cells in the 2 × 2 matrix differ with respect
to how well they discriminate between hypotheses, but the four
tests in the causal selection task differ with respect to the pro-
bability of revealing cases in each cell. A cause test, for example,
can only reveal a case in Cell A or Cell B, whereas an effect test can
only reveal a case in Cell A or Cell C; neither can uncover a case in
Cell D. Moreover, although either test can turn up a case in Cell A,
the probabilities of observing a case in Cell A differ between them.
How can we express the probability of observing a case in Cell
A—that is, the conjunction of the cause and the effect—for each test
given that we do not know whether hormone replacement contrib-
utes to breast cancer (i.e., whether H0 or H1 is correct)? The answer,
as always in an information-theoretic analysis of a decision prob-
lem, is to calculate an average across the hypotheses weighted by
their prior probabilities.
The probability of observing a case in Cell A given that one per-

forms a cause test, p(A|cause test), is captured by the following
equation:
p(A|H0 ∩ cause test) p(H0) + p(A|H1 ∩ cause test) p(H1)
Assuming for the moment that the prior probabilities of H0 and

H1 are both .5, we obtain (1/100)(.5) + (4/100)(.5), or .025. The
probability of observing a case in Cell B given that one performs
a cause test is computed in the same way: (.99)(.5) + (.96)(.5),
or .975. The probabilities of observing a case in Cell A and a case
in Cell C given that one performs an effect test are .25 and .75,
respectively.
Using the probabilities of each datum given each test and the
definition of informativeness already presented, we can now com-
pute the expected LLR of the cause test using this equation:
cause test =
⎛ p (A |H1 ∩ cause test) ⎞
p (A |cause test ) log2 ⎜
⎝ p ( | H0 ∩ cause test)⎟⎠
⎛ p ( |H1 ∩ cause test)
t⎞
+ p ( |cause test ) log2 ⎜
⎝ p(B| H0 ∩ cause test)⎟⎠
Substituting in the appropriate values, we find that the expected

LLR of the cause test (which reveals either Cell A or B) is 0.093.
By the same procedure, the expected LLR of the effect test (which
reveals either Cell A or C) is 0.939. In this example, then, a Bayesian
hypothesis tester should prefer to perform the effect test because
it will reveal an average of 10 times as many bits of information
as the cause test (for alternative, but nonetheless similar, measures
of a test’s informativeness, see Baron, 1985, chapter 4; Klayman &
Ha, 1987; Nelson, 2005; Oaksford & Chater, 1994). As the equation
above illustrates, the expected LLR is harder to calculate than the
LLR because the expected LLR takes into account the probabilities
of a hypothesis test’s possible outcomes (e.g., observing a case in
Cell A) as well as the informativeness of those outcomes.
We have already presented considerable evidence that people
are sensitive to the relative informativeness of known data. Are
they also sensitive to the relative informativeness of unknown data,
for which the Bayesian calculations are considerably more com-
plex? And if so, what processes—complex Bayesian formulae or
simple heuristics—might people use to guide their information
search in this setting?
Hypothesis Testing Using Rarity-Sensitive Heuristics

To address this question, participants in a series of studies by
Chase (1999) were presented with scenarios involving events
hypothesized to cause health risks. In each scenario, the probabili-
ties of the effect and the possible cause were given in numerical
form and manipulated between participants. In most cases, par-
ticipants had to choose between performing a cause test and an
effect test. The measure of primary interest was the proportion of
participants who selected the cause test when the cause test had the
higher expected LLR minus the proportion of participants who did
so when the effect test had the higher expected LLR. A positive dif-
ference indicates sensitivity to changes in expected informative-
ness; a negative difference suggests a form of sensitivity to expected
informativeness that departs systematically from information-theo-
retic prescriptions; and a difference of zero indicates insensitivity
to informativeness. Chase predicted that the difference would be
positive—that is, that people would be sensitive to the expected
informativeness of the cause test relative to that of the effect test.
We use the results from the first of Chase’s studies to illustrate
the broader set of findings. Each participant received the same two
scenarios, one of them involving the possible relationship between
doing shift work and suffering from insomnia and the other between
drinking a specific beverage and having high blood pressure. The
expected LLR of the cause and effect tests was manipulated between
participants such that, for each scenario, some participants received
a version in which the cause test had the higher expected LLR and
other participants received a version in which the effect test had
the higher expected LLR. As already indicated, the expected LLR
was manipulated by varying the cause and effect probabilities
provided in the scenario. For example, some participants were
told that the probability of shift work was .1 and the probability
of insomnia was .01, while others were told that the probability of
shift work was .01 and the probability of insomnia was .1. Thus,
there were four unique problems, two of which were seen by each
participant. Consistent with our argument that people are sensitive
to the (expected) informativeness of data, the proportion of partici-
pants who chose the cause test when it had the higher expected LLR
was 29 percentage points higher than when the effect test had the
higher expected LLR in the shift work–insomnia scenario; in the
other scenario (where it was predicted that the difference would be
smaller, but still positive), the difference was 18 percentage points.
Other studies of causal hypothesis testing have likewise indi-
cated that lay hypothesis testing reflects an at least implicit under-
standing of expected informativeness (for a theoretical analysis of
the causal context, see Over & Jessop, 1998). In a causal selection
task similar to those used by Chase (1999), for example, Green and
Over (2000) asked participants to test the hypothesis that drinking
from a particular well causes cholera. Participants could choose
one or more of all four tests: the cause test, the effect test, the not-
cause test, and the not-effect test. The probabilities of people’s
drinking from the well and having cholera were manipulated
between participants with the verbal labels “most” and “few” (e.g.,
“Most people drink from the well”). Consistent with the evidence
already reviewed, Green and Over found that participants’ likeli-
hood of choosing a test increased with the test’s expected informa-
tiveness. Taken together, the results indicate that people are
sensitive to rarity not only when making inferences on the basis of
known data, but also when deciding what data to seek.
How Boundedly Rational Minds Can Act Like Ecological Bayesians
Earlier in the chapter, we argued that (a) in the absence of knowl-

edge about rarity, people are justified in behaving as if the events
mentioned in a hypothesis are rare (McKenzie, Ferreira, et al.,
2001); and (b) in the presence of knowledge about rarity, they
should abandon that rarity assumption, instead searching for and
weighting most heavily whatever events are rarest and therefore
most diagnostic. The literature review at the beginning of this chap-
ter indicates that people indeed seem to make a rarity assumption
but that they can also adapt their behavior in contexts where it is
clear that the assumption is violated (see also McKenzie & Mikkelsen,
2007). Adaptability in the causal selection task is particularly
impressive because it seems to call for highly complex Bayesian
calculations. But can we instead account for this within the frame-
work of bounded rationality, as the outcome of using simple heuris-
tics from the adaptive toolbox?
The most plausible explanation, in our view, is that people make
their choice of information to seek (in test cases) using rarity as a
cue to informativeness (Chase, 1999). Indeed, this behavior is
consistent with philosophies of science and formal models of
hypothesis testing for which the rarity of data is crucial (Poletiek,
2001, chapters 1 and 2). In the case of passively observing data in
order to discriminate between competing hypotheses, one need
only give more weight to rare conjunctions of events (e.g., a rare
prediction of a rare outcome). Recall, for example, that joint pres-
ence provides stronger evidence of a relationship between two
variables than does joint absence if the presence of the variables is
rare (p<.5). In the case of deciding which of two hypothesis
tests is more likely to reveal informative data, people need only
compare the probabilities of the tests’ conditioning events, where
the conditioning event is that known to have occurred. In the

causal context, for example, the conditioning event of a cause test
is the cause itself: The hypothesis tester knows that the cause
has occurred, and what remains to be discovered is whether the
effect has also occurred. Thus, people using the rarity heuristic to
estimate the relative informativeness of causal hypothesis tests
need only look at how rare the conditioning events are relative to
one another—p(cause)/p(effect)—to choose the test(s) with the
highest expected informativeness (Chase, 1999). This simple heu-
ristic enables them to behave in a way loosely consistent with the
complex Bayesian calculations shown in the previous section.
Conclusion
Our review of several areas of research indicates that the rarity of

events in the environment is an important factor in inference,
although logical (traditionally normative) and descriptive
approaches have ignored this crucial ecological variable. When
assessing covariation, participants usually deem the joint presence
of two events to be more informative than their joint absence.
Because the four cells of a 2 × 2 matrix contribute equally to calcu-
lations of correlation widely considered normative, the stronger
influence of joint presence has been routinely interpreted as non-
normative. A focus on joint presence makes sense from an ecologi-
cal perspective, however, if the presence of events is assumed to be
rarer than their absence (Anderson, 1990). Indeed, when it is made
clear to participants that presence is common rather than rare, their
preference for joint presence over joint absence can be reversed
(McKenzie & Mikkelsen, 2007). Thus, covariation assessment is
adaptable in that behavior changes when it is clear that presence is
common, and it is adaptive in that, in the absence of evidence to
the contrary, the rarity assumption is reasonable (see also Klayman
& Ha, 1987).
When testing hypotheses, participants typically find the men-
tioned confirming observation to be more informative than the
unmentioned confirming observation, and the traditional conclu-
sion has been that this is an error of logic. Guided by an ecologi-
cally informed Bayesian framework, we showed that this tendency
is drastically reduced when participants know that the unmen-
tioned observation is rare (McKenzie & Mikkelsen, 2000). Thus, lay
hypothesis evaluation is also adaptable: People’s testing behavior
changes in a qualitatively normative manner when it is clear
to them that the rarity assumption (Oaksford & Chater, 1994)—
which takes for granted that the observations mentioned in hypoth-
eses are rare—is violated. In addition, the results of McKenzie,
Ferreira, et al. (2001) indicate that, as a default strategy, assuming

that mentioned observations are more informative than unmen-
tioned observations is adaptive: Hypotheses tend to be phrased
in terms of rare events, and therefore mentioned observations usu-
ally are more informative (Grice, 1975). Finally, when testing hypo-
theses about cause–effect relationships, people tend to choose
tests conditioned on the rarest events available. In performing
those causal hypothesis tests that are most likely to discriminate
between the competing hypotheses (Chase, 1999; Green & Over,
2000), people thus behave roughly in accord with much more com-
plex Bayesian calculations.
Although the explanations and predictions of behavior in the
tasks reviewed here were derived from a Bayesian perspective
(together with the rarity assumption), there are good empirical
reasons (e.g., McKenzie, 1994) as well as good theoretical reasons
(e.g., Charniak & McDermott, 1985; Dagum & Luby, 1993) to doubt
that people perform the computations prescribed by Bayesian anal-
yses. However, to behave in a way that is qualitatively consistent
with Bayesian norms, participants need only consider rare events
more informative than common ones when interpreting data and
seek rare over common data when choosing among conditional
observations. In other words, mere sensitivity to rarity leads to
behavior that is qualitatively Bayesian.
Rarity is a factor in inference that can no longer be ignored by
experimenters. The results reviewed here have shown that using
abstract and unfamiliar materials in an attempt to eliminate real-
world interference simply leads participants to fall back on default
assumptions about rarity based on how the world usually works
(e.g., hypotheses mention rare events; the presence of events is rarer
than their absence). Lack of awareness of this problem has led many
experimenters to misinterpret adaptive responses as irrational.
Finally, our ecological account of human inference shows the
importance of taking context into account, something that is unnec-
essary from a logical point of view. People’s knowledge or assump-
tions about rarity are crucial, and what is perceived as rare depends
on the particular decision environment. A deeper understanding
of human inference can thus be achieved only through a better
understanding of environmental and task structures in conjunction
with decision mechanisms that make sense in a world where rare
things are precious.
13
Ecological Rationality for Teams
and Committees
Heuristics in Group Decision Making
Torsten Reimer
Ulrich Hoffrage
Good decision processes are the best hope for good

decision outcomes.
Jay Edward Russo and Paul Shoemaker
W hen was your last meeting? By one estimate, the number of

meetings held per day in the United States is more than 25 million
(Massachusetts Institute of Technology, 2003), and during the 1980s
executives spent, on average, as much as 40–50% of their profes-
sional time in meetings (Monge, McSween, & Wyer, 1989). In fact,
meetings play a key role in today’s world of business and politics,
in which many decisions are formed by work teams and commit-
tees. Yet, meetings often have a bad reputation. “Meeting’s over,
let’s get back to work”—who has not heard or made such a com-
ment at the end of a session? Participants in meetings often report
that too much of the time during their meetings is wasted. Their
estimates range from a third (Green & Lazarus, 1991) to half (Monge
et al., 1989; Mosvick & Nelson, 1987) of the time. Typical com-
plaints include that meetings are often called with too short notice,
last too long, and too often end without concrete results (Romano &
Nunamaker, 2001).
In this chapter, we ask whether there are efficient and effective
decision strategies that can be used by committees and groups to
come to a joint decision. We address this question in a series of
simulation studies, in which we compare the accuracy of informa-
tion-laden strategies that require intense processing with the accu-
racy of frugal heuristics that limit information processing (Reimer
& Hoffrage, 2005, 2006). We focus on one of the most popular
decision rules that is often used as a default strategy when groups
335
have to make a joint decision but cannot reach unanimity: the

majority rule. A nice feature of this rule is that it does not require
that a group exchanges, discusses, and integrates any cue informa-
tion and, thus, it can effectively help a group to stop endless
discussions. The only thing the majority rule requires is that group
members have formed individual opinions, which can then be
(implicitly or explicitly) integrated by the group.
Astonishingly, group researchers have not given much attention
to the first phase of majority-based decisions during which indi-
vidual group members form their own opinions, even though it is
obvious that the way in which the members of a group process
their information individually affects the group outcome (Davis,
1973; Sorkin, Hays, & West, 2001). For example, we know from the
literature on the Condorcet theorem (Groffman & Owen, 1986) that
the majority rule accentuates differences between individuals.
Specifically, the theorem states that when a group consists of mem-
bers who tend to favor a right (or wrong) decision, the majority
rule will yield even better (or worse) decisions on average (see
chapter 7; Reimer, Bornstein, & Opwis, 2005). Our simulations were
designed to test how sensitive the majority rule is to the informa-
tion-processing strategies that are used by group members to form
their individual opinions.
Our chapter is structured as follows: We first give examples from
the group literature indicating that groups often fail to take optimal
advantage of their resources, and we then describe a general frame-
work for classifying various types of information environments
and group decision strategies. We next report the results of our
simulations on the accuracy of the majority rule in groups where
members used different decision strategies to come to an individual
opinion; we also explore to what extent the group’s performance
depends on features of the information environment. Finally, we
discuss the widespread assumption that good group decisions
require information-intense strategies.
Benefits and Risks of Group Decision Making
Social and organizational psychologists have conducted extensive

research in the pursuit of identifying when groups are better
at solving problems and making decisions than a sum of indepen-
dent individuals (Vroom, 1969). Groups often have higher legiti-
macy than individual decision makers and group settings can
increase individuals’ motivation (Hertel, Kerr, & Messe, 2000), their
learning capabilities, and their performance (Brand, Reimer, & Opwis,
2003; Slavin, 1995). The most important reason why teamwork
has become increasingly popular in recent decades, though, can be
ECOLOGICAL RATIONALITY FOR TEAMS AND COMMITTEES 337
seen in the widespread belief that groups make better decisions

than individuals—research on group decision making and
problem solving suggests that groups usually perform better than
their average members (Hinsz, Tindale, & Vollrath, 1997; for excep-
tions see Janis, 1982; Reimer et al., 2005). This advantage is often
attributed to the fact that groups have access to more task-relevant
resources than individuals (Davis, 1973). When faced with a deci-
sion task, for example, groups typically have more information
about the choice alternatives and bring more background knowl-
edge and expertise to the table than any individual group member.
While groups usually perform better than their average member,
they do not perform as well as they could. A great number of
empirical studies have indicated that groups only rarely make full
use of their greater access to resources when solving cognitive
tasks (Hinsz et al., 1997; Steiner, 1972). The worst performance can
occur when a group is led by a person with high status but low
expertise and other group members are under pressure to conform
to this leader’s views (Janis, 1982). Such cases of group-think are
not the only situations, though, in which decision performance
in groups can suffer. So-called process losses can also occur in situ-
ations in which group members share a common goal and are
motivated to produce high-quality outcomes such as good deci-
sions or solutions to problems. For instance, several studies dem-
onstrate that brainstorming in groups, despite its positive reputation,
tends to suppress the production of ideas: People generate more
ideas if they brainstorm on their own than in a group setting (Paulus,
Dugosh, Dzindolet, Coskun, & Putman, 2002; Stroebe & Diehl,
1994). As a consequence, groups typically produce more ideas than
any one of their individual group members but fewer ideas than if
the very same members brainstorm on their own and pool their
ideas. Another well-studied example of process losses in groups is
the hidden-profile effect, which suggests that groups only rarely
find the best solution to a decision task if this solution was not pre-
ferred by at least one member prior to group discussion (Reimer,
1999; Wittenbaum & Stasser, 1996).
Which Intuition Do You Trust?
A common interpretation of these empirical findings is that good

group decisions require extensive information exchange among
group members. At first glance, this assumption has some intuitive
appeal. The argument goes as follows: The major advantage of a
group is its greater access to information resources, which can
only be capitalized on if these resources are pooled by members.
For decision tasks, this means that an ideal group would exchange
and consider as many pieces of relevant information as possible.

If group members do not share their unique knowledge on a task,
it cannot be considered in the group decision process and, as a
consequence, the group might overlook important aspects of a deci-
sion even though this knowledge is available to it. The idea that
good group decisions require information-intense strategies is
widespread in the literature, and several researchers have tried
to find interventions that stimulate the exchange of unique infor-
mation in group discussions (e.g., Larson, Foster-Fishman, & Keys,
1994; Mennecke, 1997; Schittekatte & van Hiel, 1996; Stasser,
Stewart, & Wittenbaum, 1995; Stasser, Taylor, & Hanna, 1989;
Stewart, Billings, & Stasser, 1998). Stasser and Birchmeier (2003,
p. 85) describe this mindset as follows:
From an information processing view…, the more informed a

decision, the better the decision. The more fully the group
explores the merits of the options, the more likely that they
will eliminate bad choices and pick good ones.
However, intuitions rarely come alone—they usually have a

twin that suggests the opposite. In this case, even though the notion
that ideal groups should be exhaustive information processors has
some intuitive appeal, it is at odds with two observations that also
accord with our intuition: First, it does not fit with the complaint
mentioned earlier that meetings often suffer from the lack of a stop-
ping rule that tells a group when to quit discussion—once initiated
it is hard to stop them. Second, it conflicts with recent findings
on individual decision making and the benefits of fast and frugal
heuristics as described in the chapters of this book.
This puzzle of contradictory observations and intuitions led to
our research question: Are there fast and frugal heuristics that
allow groups to make efficient and effective decisions? The idea
that individual group members might use noncompensatory heu-
ristics has been considered before (e.g., Gigone & Hastie, 1997;
Stasser, 1992), but this issue has not been systematically addressed.
Given the difficulty that groups have in pooling information
(Wittenbaum & Stasser, 1996) and the time-consuming nature of
communicating and processing unique information items that are
only known to individual group members, we decided to investi-
gate the benefits of fast and frugal heuristics in groups.
The Information-Processing Cube
There are several ways that one can extend the approach of fast
and frugal heuristics to a group context (Todd & Gigerenzer, 1999).
For example, members of a group may imitate other members by

choosing positions that are common in their group (“imitate the
majority”) or that are proposed by successful or high-status group
members (“imitate the successful”; Boyd & Richerson, 1985;
Laughlin & Ellis, 1986). Aside from classic group-think situations,
the imitation of successful group members may yield reasonable
decisions in many situations because success and status are often
positively correlated with expertise (e.g., Henrich & Gil-White,
2001). Here, we focus on a classic distinction in the group litera-
ture, between social combination-based and communication-based
heuristics (Baron, Kerr, & Miller, 1992; Reimer & Hoffrage, 2005,
2006). These two classes of decision strategies can be best charac-
terized in the framework of the information-processing cube dis-
played in Figure 13-1 (cf. Adamowicz et al., 2005).
The information-processing cube has three dimensions: the
members of the group, the choice alternatives, and the cues used
to describe the alternatives. A cell in this cube refers to the knowl-
edge a certain member has about a certain choice alternative on a
certain cue. For example, group members may belong to a hiring
committee and have some knowledge about a set of potential can-
didates for a position. One of the group members may know that a
certain candidate has specific language or computer skills, whereas
another group member may know which of the candidates has
prior work experience. In short, the cube represents the informa-
tion the group has on the decision task and how this information is
distributed among its members. From a formal perspective, a group
V11k V1jk Decision

V21k V2jk strategy
A1 V111 V121 V1j1

Group
A2 V211 V2j1
Alternatives
decision
Vi1k Vi2k Vijk
M k rs
Ai Vi11 Vi 21 Vij1 be
em
M2 m
C1 C2 Cj M 1 oup
r
G
Cues
Figure 13-1: The information-processing cube. Group members

(M) decide among alternatives (A) that are described by the values
(V) these alternatives have on cues (C). A decision strategy can
be interpreted as a rule that maps the cube onto a single group
decision.
decision rule can be defined as a mathematical function that maps

such an information cube into a single group decision (see
Adamowicz et al., 2005). However, not every logically possible
aggregation rule is psychologically plausible.
In the group literature, two types of group decision mechanisms
have been distinguished that aggregate across the dimensions of
the cube in different orders. Social combination rules such as the
majority rule assume that each group member first aggregates
across the cues and alternatives to form an individual decision, and
in a second step, the group aggregates across the preferences or
opinions of the individual members to form a group decision. Social
communication rules, in contrast, capture the idea that the mem-
bers of a group may pool their knowledge on the decision alterna-
tives. For example, group members may first all aggregate their
knowledge about each candidate and then collectively choose the
one with the best overall evaluation. Or a group may compare the
alternatives cue-wise by first communicating to reach a consensus
on which the most important cues are and by choosing the alterna-
tive that scores highest on the most important cues (Reimer &
Hoffrage, 2005). In short, social combination rules require that
group members form individual opinions first; social communica-
tion rules require group members to exchange and pool cue values
first. Other ways in which a group may aggregate across the
dimensions of the cube include combinations of the building blocks
of both of these types of strategies—for example, group members
could first individually limit the choice set and then together
apply a cue-based or alternative-based strategy. Other researchers
have focused on how groups select and identify the most successful
group members to follow (e.g., Baumann & Bonner, 2004; Laughlin
& Ellis, 1986). For instance, groups might first try to find out if one
of their members knows the solution to a problem and then go with
the decision of this more experienced member.
We developed and tested various heuristics for group decisions
that draw upon the building blocks of heuristics for individual
decision making (Reimer & Hoffrage, 2005, 2006). In the remainder
of this chapter, we describe our simulation-based comparison of
strategies that fit into the social combination approach. Because
we were interested in the effects of the individual group members’
strategies on group performance, we held the aggregation rule con-
stant by combining group members’ opinions on the basis of the
majority rule. This is different from the usual approach in the lit-
erature on social combination processes, which typically focuses
on comparisons of combination rules but does not pay much atten-
tion to the decision strategies of the individual members and their
influence on group performance (e.g., see Hastie & Kameda, 2005,
for a comparison of nine different combination rules). Similarly, in

chapter 7 on the role of the recognition heuristic in group decision
making, the focus is on aggregation rules that differ with respect to
the influence of members who can use the recognition heuristic
and those who cannot. Here, we assume that all group members
cannot use the recognition heuristic because they recognize all the
alternatives in the choice set. Gigerenzer, Todd, and the ABC
Research Group (1999) specified heuristics that can be used in such
a situation and compared their performance with decision strate-
gies that assume exhaustive information processing. We next test
how these strategies perform in a group decision setting when
combined by a majority rule, and how their accuracy is affected by
different distributions of information among group members—that
is, when applied to different information-processing cubes.
Testing the Group Impact of Individual Strategies
The Group Decision Task

To compare the effects of individual decision-making strategies on
group accuracy, we need a well-specified task in which different
strategies can be put to work. For this purpose, we adapted a classic
task from Davis (1973; see Reimer & Hoffrage, 2006, for details):
A four-member personnel committee has to decide which of three
candidates is best suited for a position. Usually, we do not have an
unequivocal external criterion when choosing whom to hire. In
research on group decision making, the alternative with the highest
overall sum score is often defined as the best choice (e.g., Stasser &
Titus, 1985, p. 1469), but this has several disadvantages (Reimer &
Hoffrage, 2006). In particular, groups that limit their information
search or use a different decision criterion will necessarily perform
worse than groups that use a model that is based on summing
all available cues. Therefore, in our simulations we deviated from
this procedure and generated instead a world in which the external
criterion was known. We constructed a reference class consisting
of 20 potential job applicants who were assigned different criterion
values such that they could be unequivocally rank ordered. This
means there was a correct decision for every potential triplet of
job candidates that the simulated hiring committee could face.
As in real life, the simulated group members did not know the
criterion values, but each had information on 20 dichotomous cues
for the candidates. The full matrix of 20 candidates each described
by their criterion value and these 20 cues make up what we refer to
as the information environment. Cue values were +1 or −1 and were
V1,1,4 … … … V1,20,4
V1,1,3 … … … V1,20,3
V1,1,2 … … … V1,20,2
V2,1,4 … … … V2,20,4
A1 V1,1,1 … … … V…
V2,1,3 … …1,20,1 V2,20,3
Alternatives
V2,1,2 … … … V2,20,2
V3,1,4 … … … V3,20,4
A2 V2,1,1 … … … V2,20,1 Group
V3,1,3 … … … V3,20,3
decision
be 4
V3,1,2 … … … V3,20,2
em M
rs
m 3
up 2 M
A3 V3,1,1 … … … V3,20,1
G 1 M
si M 4
de M 3 A
M
s
C1 C2 … … … C20
on
ro
A
ci
Cues
du 2
vi AM
al
In M1
A
di
Figure 13-2: The social-combination approach to group decision
making via the majority rule. Group members (here, M1 to M4) decide
among alternatives (A1 to A3) that are described by the values (V)
these alternatives have on cues (C1 to C20). First, each group member
makes an individual decision, and then the group integrates the
individual choices using the majority rule.
coded such that the cues were positively correlated with the crite-
rion, that is, a positive cue value indicated a higher criterion value
than a negative cue value did. The cube displayed in Figure 13-2
represents the knowledge a group might have about a given triplet
of candidates. It can be cut into four slices, where each slice repre-
sents a member’s knowledge of candidates’ cue values.
The Competing Individual Strategies

For the competing individual decision-making mechanisms we
used two strategies that are usually taken as benchmarks in group
research, and two simple heuristics. The benchmarks were the unit
weight model (or tallying, a linear model and also a heuristic) that
selects the candidate with the highest summed cue score, and the
weighted additive model (WADD), a linear model that sums up
weighted cue values and selects the candidate with the highest
score. The heuristics were minimalist, which selects candidates on
the basis of a randomly chosen cue, and take-the-best, which looks
up cue values in order of cue validities and decides on the basis of
the first discriminating cue (see Box 13-1 for an overview).
Box 13-1: Decision Strategies Used in the Simulations
Compensatory strategies
Tallying (alternatively, the unit weight model or “Dawes’s rule”) sums up the (equally
weighted) cue values of each candidate and chooses the candidate with the highest sum
score.
WADD (the weighted additive model or “Franklin’s rule”) proceeds like tallying, except
that cue values are weighted (multiplied) by their Goodman–Kruskal validities before
they are summed.
Noncompensatory heuristics
Minimalist looks up a randomly chosen cue. If one candidate has a positive value and
the remaining two candidates have negative values on this cue then information
search stops and the candidate with the positive value is chosen. If two candidates
have positive values and one negative, the latter is excluded from the choice set and
new cues are randomly drawn until one is found that discriminates between the two
remaining candidates, and the one with the positive value is chosen. If all cues have
been looked up and there is still more than one candidate left, choice between the
remaining candidates is made randomly.
Take-the-best is another lexicographic strategy that differs from minimalist only in

that cues are not chosen randomly but in the order of their validities.
Social-combination rule
The majority rule chooses the candidate with the most votes. If a tie arises because
two candidates are each favored by two members, one of these is chosen at random.
Minimalist corresponds to tallying in the way both treat cues:

Whereas the former selects cues with an equal probability, the
latter treats cues equally by using the same unit weight for each cue
before adding them up. A similar relationship holds between the
two other strategies, which both take cue validity into account:
Take-the-best selects cues in an order established by their validity,
and the version of WADD we used multiplies cue values by weights
that are a linear function of the cue validities. Specifically, the
weight we used was 2v–1, where v is the cue’s validity (this trans-
formation maps a validity of 50%, that is, chance level, to a weight
of zero in WADD; Martignon & Hoffrage, 2002). We also gener-
alized the strategies from a pair-comparison task to operate in a
situation in which the best out of three options had to be inferred
(see also Rieskamp & Hoffrage, 1999, p. 145).
The individual opinions were integrated by using the following
rule: Infer that the candidate with the most votes is the best. If there
is a tie with respect to the number of votes (two for one candidate
and two for another), then randomly choose the decision of one
group member.
The Information Environments
According to the concept of ecological rationality, the performance

of a particular strategy and hence the result of a comparison of
strategies depends on the environment in which this performance
is evaluated. Martignon and Hoffrage (2002) identified conditions
favoring simple, lexicographic individual decision strategies—
particularly, environments with scarce information—and condi-
tions favoring compensatory models—environments with abundant
information (see also chapter 8 for other conditions). They also
drew attention to another important dimension on which an envi-
ronment can be described, namely, the distribution of cue validi-
ties. Specifically, they pointed out (and proved) that a lexicographic
strategy such as take-the-best and a linear strategy such as WADD
always reach the same decision if the cues’ weights, as used in
the linear model, are exponentially decreasing in the order they
are used in the lexicographic model (their theorem 1). In the pres-
ent study, we extended this work by exploring the strategies’ eco-
logical rationality with respect to the distribution of cue weights (as
seen through the eyes of a linear model) or the cue order (as seen
through the eyes of a lexicographic model)—Martignon and
Hoffrage’s theorem does not predict which of these models perform
better in environments in which cue weights do not exponentially
decrease.
What do cue validity distributions in real-world environments
look like? To answer this question we reanalyzed the 20 environ-
ments used in an earlier strategy comparison (Czerlinski, Gigerenzer,
& Goldstein, 1999; see also chapter 8). These environments covered
disparate domains such as psychology, sociology, economics, and
demography. Most of them were taken from statistics textbooks
where they were used as examples for applying multiple regres-
sion, and they were not selected with respect to a comparison
between strategies or with respect to a particular distribution of
cue validities. Our reanalysis of those 20 environments revealed
that the cue validities tend to follow a J-shaped distribution, at least
on an aggregate level (see Figure 13-3). J-shaped distributions are
commonly seen for many continuous variables (such as income
across individuals or number of citations of scientific papers; see
chapter 15; Hertwig, Hoffrage, & Martignon, 1999). Moreover, it
can be proven that the cue validities of an artificially generated
Validity .9
.8
.7
.6
.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Rank in Cue Hierarchy
Figure 13-3: Distribution of the cue validities in the 20 real-world

environments used by Czerlinski et al. (1999). Each line shows
the cue validities for a given environment plotted against their
rank when rank ordered. The dark line shows the average validity
conditioned on those ranks; this average does not decrease mono-
tonically because the number of cues across which these averages
have been computed is not constant. (Note that the line starting at
the top border of the graph represents an environment having four
cues of validity 1.)
environment where each of the dichotomous cue values is ran-

domly generated are expected to follow a J-shaped distribution (see
Hoffrage, 2008).
To explore the ecological rationality of the strategies introduced
above and to check for the robustness of their performance across
environments, we ran our simulations in four different environments
that varied in their distribution of cue validities (see Figure 13-4)
to cover the range of distributions in the natural environments
(Figure 13-3). In two of the four environments, the function of cue
validities plotted against their rank was linear (L), and in the
other two this function was J-shaped (J). The cue values in these
environments were generated randomly, but with two constraints.
First, the validity of a cue was determined according to the curves
in Figure 13-4 and subsequently, the values of the 20 candidates
on this cue were determined randomly such that the desired valid-
ity resulted. The process of randomly generating values for one
cue was independent of the random process for the other cues; that
is, we did not manipulate or control for cue intercorrelations.
Second, to ensure that the four environments did not consist of
cues that systematically differ in their discrimination rates, we
standardized them as follows: In each environment and for each
1
L-high (M=0.80, SD=0.06)
.9 L-low (M=0.60, SD=0.06)
J-flat (M=0.56, SD=0.09)
J-steep (M=0.60, SD=0.12)
.8
Validity
.7
.6
.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Rank in Cue Hierarchy
Figure 13-4: Distributions of (rank-ordered) cue validities defin-

ing four different environments. L denotes “linear” and J denotes
“J-shaped.” For each of the four environments, the mean validities
(M) and standard deviations (SD) shown were computed across all
20 cues.
cue, 10 of the candidates had a positive value and 10 had a negative

value on each cue throughout and thus each cue discriminated in
100 of the 190 possible pairs of candidates (the maximum possible
discrimination rate).
Ecological Rationality of the Heuristics in Groups
Our first simulation served as a control condition, in which

we compared the accuracy of the decision strategies for the four
distributions of cue validities when groups were homogeneous.
Because each group member knew all available information in this
simulation, members had identical knowledge. Simulation 2 intro-
duced missing information, thereby creating heterogeneous
groups, in which members could have some common knowledge
and some unique knowledge that was not shared by other members
of their group. Finally, simulation 3 tested to what extent group
accuracy increases when values on the most valid cues have a
higher likelihood of being shared among group members than values
on less valid cues. This captures the idea that group members
may exchange and search for more information on high-validity
than on low-validity cues.
100
Tallying
WADD
80
Accuracy (% Correct)
Minimalist
Take-the-best
60
40
y=33.3
20
0
L-high L-low J-flat J-steep
Distribution of Cue Validities
Figure 13-5: Accuracy in percent correct of four strategies in four

types of environments with different distributions of cue validi-
ties (shown in Figure 13-4). The task is to select the best of three
job candidates. The dashed line indicates performance at chance
level (33.3%).
Simulation 1: Does the Distribution of Cue Validities Matter When All Group
Members Are Omniscient?
Accuracy of the decision strategies was determined by producing
all possible triplets of candidates in each of the four environments
and by counting how often the simulated groups made a correct
decision (see Reimer & Hoffrage, 2006, for details). As expected, the
performance of the strategies depended on the environment in
which they were evaluated—see Figure 13-5. Except in the linear-
high condition, in which cues had a much higher validity on aver-
age than in the other three environments (see Figure 13-4), the
strategies that took cue validity into account had a higher accuracy
than the corresponding strategies that ignored cue validities:
Among the linear strategies, the WADD strategy outperformed tally-
ing, and among the limited-search heuristics, take-the-best outper-
formed minimalist in these three environments. Thus, overall,
assigning higher weights to better cues paid off unless most cues
had a relatively high validity.
The comparison between the compensatory strategies (tallying,
WADD) and their corresponding noncompensatory heuristics (min-
imalist, take-the-best) revealed an interesting pattern. For the strat-
egies that ignored cue weights or cue validities, we observed that
tallying outperformed minimalist in each environment. In contrast,
the winner of the competition between WADD and take-the-best

depended on the environment: WADD performed better when cue
validities followed a linear distribution, whereas take-the-best had
a slightly higher accuracy than WADD when they followed a
J-shaped distribution. This finding adds to the simulation results
reported in Gigerenzer et al. (1999) as well as to the theoretical der-
ivations in Martignon and Hoffrage (2002) in that it demonstrates
that the distribution of cue validities is an important feature of infor-
mation environments that affects the comparison between compen-
satory and noncompensatory decision strategies. Specifically, if cue
validities were linearly distributed, the compensatory strategies out-
performed the frugal noncompensatory heuristics. Conversely, in
environments with a J-shaped distribution of cue validities, take-
the-best performed the best.
Simulation 2: Does the Amount of Missing Information Matter When the Individual
Group Members Have Incomplete Knowledge?
In the second simulation, we allowed missing information by deter-
mining randomly which group member had access to which cue
values. Importantly, the information that was known to the groups
as a whole was always held constant: Each cue value for each
alternative was known to at least one group member in every case.
This manipulation created heterogeneous groups, in which mem-
bers had different knowledge, thus capturing an important aspect
of group decision making. Introducing missing information and
thereby individual differences provides an interesting test of the
strategies’ ecological rationality: To what extent does their perfor-
mance depend on the amount of missing information and, con-
versely, shared knowledge? Intuitively, one would expect that
group performance declines when less information is available,
and, as a consequence, when less information is shared. But would
all strategies suffer to the same degree, and how is their loss in per-
formance modified by the information structure of the environment
in which the strategies are tested? In other words, are the differ-
ences between the strategies and their dependence on environmen-
tal structures that we observed in simulation 1 robust across
different percentages of missing information?
We tested two conditions: In the first, each of the four members
received 15 (25%) of the 60 cue values, and thus no single piece of
information was shared by group members. In the second condi-
tion, each group member knew 30 (50%) of the 60 cue values,
and thus a given piece of information was shared, on average, by
two group members. In addition, we compared the results of the
respective conditions from simulation 1, in which all members
Table 13-1: Accuracy in Percent Correct of Group Decisions

(and Individual Decisions in Parentheses) in Four Types of
Environments With Different Distributions of Cue Validities and
for Different Amounts of Available Information, in Percentage of
60 Cues Available
Distribution Percentages Decision strategy
of cue of members’
validities cue values Tallying WADD Minimalist Take-
available the-best
L-high 100 89 (88) 90 (90) 77 (70) 78 (78)

50 85 (80) 84 (80) 64 (55) 65 (61)
25 82 (72) 82 (72) 62 (53) 68 (59)
L-low 100 61 (60) 70 (70) 51 (46) 56 (56)
50 58 (55) 66 (60) 45 (41) 51 (49)
25 57 (50) 63 (53) 43 (40) 54 (47)
J-flat 100 55 (55) 71 (71) 47 (43) 73 (73)
50 53 (50) 68 (64) 42 (40) 67 (62)
25 51 (46) 65 (57) 42 (39) 66 (56)
J-steep 100 46 (46) 59 (59) 39 (38) 61 (61)
50 45 (42) 58 (55) 37 (36) 60 (56)
25 43 (40) 55 (48) 37 (36) 56 (49)
knew all 60 cue values (100%) and each piece of information was
shared by all four members (Reimer & Hoffrage, 2006).
As indicated in Table 13-1, missing information somewhat
impaired group performance, but much less than we had expected.
On average, across all decision strategies and environments, the
simulated groups performed 6 percentage points worse if each
member knew only 25% of the information and thus no cue value
was shared, as compared to the full knowledge condition of simula-
tion 1. Such a drop in performance may matter; however, compared
to studies on the hidden-profile effect in which information is dis-
tributed in a biased way and in which, as a consequence, up to 80%
of groups change their decisions when members do not have access
to all available information, the drop we observed seems surpris-
ingly small (Reimer, Hoffrage, & Katsikopoulos, 2007; Stasser &
Titus, 1985). If one considers that a reduction of accessible cue
values from 100% to 25%, that is, by 75 percentage points, leads to
a reduction in performance of only 6 percentage points, the strate-
gies’ performances appear to be quite robust against this manipula-
tion. The relationships between the decision strategies for the four
distributions of cue validities reported in the first simulation also
remained relatively stable and robust across different amounts of
missing information. Specifically, for all missing knowledge condi-

tions, the compensatory strategies outperformed the noncompensa-
tory heuristics in the two linear environments, whereas this was
again no longer true for the environments where the cue validities
followed a J-shaped distribution. Thus, the findings of simulation 1
can be generalized to various amounts of missing information, in
which groups aggregate across individual decisions by using a
majority rule.
One major reason for the finding that group performance was not
strongly affected by the amount of missing information is that the
majority rule compensated for impairments of performance on the
individual level. As shown in Table 13-1 (numbers in parentheses),
the effect of missing information was much larger on the level of
individual decisions: The average accuracy difference between
the conditions of 100% and 25% cue values was 12 percentage
points—twice as high as the observed difference on the group
level.
To understand this difference in the effect of missing informa-
tion on the accuracy of groups versus individuals, first consider
the case of no missing information. When the individuals had
access to all pieces of information and used either tallying, WADD,
or take-the-best, performance of the individual group members
and the group was almost identical (see the 100% conditions in
Table 13-1). The reason for this was that there was not much varia-
tion among the individual decisions in these cases—the four
members of a group formed identical decisions whenever their
decision strategy yielded an unequivocal decision (they could only
differ with respect to their decision when they had to guess).
Conversely, if the individual group members used minimalist or if
they did not share all available information items, there was much
more variation among the individual group members’ decisions.
Then, because the likelihood of an individual being correct was
above chance, the majority rule increased accuracy and groups out-
performed their members on average (see Reimer et al., 2005;
Reimer & Katsikopoulos, 2004).
Simulation 3: Does It Matter Which Information Is Shared?

So far, we have considered situations in which the members of a
committee shared all available information or in which informa-
tion on the candidates was randomly distributed among members
irrespective of whether values referred to cues with a high or
with a low validity. The rationale of the third simulation was to see
if it matters more which information is shared among the members
of a group rather than how many pieces of information are shared.
Specifically, what happens if group members talk more with

each other about the best cues, and so end up sharing more high-
validity cue knowledge? We posit that groups, particularly commit-
tees consisting of experts, usually know what the good cues are and
that they tend to possess, gather, and communicate more informa-
tion on those cues compared to the less valid cues.
To see what happens if members share more knowledge on
valid cues, in the third simulation, the available information was
distributed such that the most valid cue had a higher chance of
being shared. To facilitate comparison, the first line for each environ-
ment in Table 13-2 (situation A) again shows the result of the 50%
knowledge condition of simulation 2, where missing knowledge was
distributed over the 20 cues with equal probability. The second line
(situation B) represents a situation in which the available informa-
tion was first randomly distributed among group members as before
so that each group member had 25% of the cue values, which were
completely unshared. Subsequently, each member then filled up
her knowledge set to 50% by randomly getting additional informa-
tion about values of 5 of the 10 most valid cues. Thus, each group
member knew half of the available cue values as in the 50% condi-
tion of simulation 2, but group members were more likely to share
information on the 10 most valid cues. Further, we set up a condi-
tion (situation C) in which three of the four group members received
all cue values on the 10 most valid cues but no information on the
10 least valid cues and where all the remaining cue values were
given to the fourth group member. The last line for each environ-
ment in Table 13-2 (situation D) pushed this idea even further by
giving all information on the 3 most valid cues and nothing beyond
that to three group members, while giving all information on the
remaining 17 cues to the last member. Note that in each situation,
the group as a whole always possessed the entire set of information,
and in situations A–C, each group member had access to 50% of the
information. Situations B–D represent variants of the realistic case
in which committee members systematically differ in expertise.
The main results of this simulation were as follows (see Table 13-2):
First, in the high cue validity (L-high) environment, the two
compensatory strategies (tallying and WADD) were largely unaf-
fected by unequally distributing the knowledge of cue values. Only
in situation D was a drop in performance seen. The two noncom-
pensatory heuristics (minimalist and take-the-best), in contrast,
benefited from having a majority of expert members.
Second, the effects of distributing more information on the most
valid cues (particularly situations B and C compared to A) were
much smaller in the J-steep environment than in the other three
environments. A comparison between situations A and D within
Table 13-2: Accuracy (Percent Correct) of Group Decisions Based

on Four Strategies in Information Environments That Vary in How
Information Was Distributed Among Group Members
Distribution Overlap of Decision strategy
of cue group members’
validities knowledge Tally- WADD Mini- Take-
ing malist the-best
L-high (A) All cues likely 85 84 64 65
(B) 10 cues likely 85 84 67 68
(C) 10 cues shared 83 82 78 78
(D) 3 cues shared 76 75 73 73
L-low (A) All cues likely 58 66 45 51
(B) 10 cues likely 69 71 47 52
(C) 10 cues shared 73 71 56 56
(D) 3 cues shared 56 55 53 53
J-flat (A) All cues likely 53 68 42 67
(B) 10 cues likely 60 70 46 70
(C) 10 cues shared 65 73 52 73
(D) 3 cues shared 69 72 68 71
J-steep (A) All cues likely 45 58 37 60
(B) 10 cues likely 44 59 40 62
(C) 10 cues shared 46 58 42 61
(D) 3 cues shared 58 62 56 61
Note. (A) All cues equally likely to be known; (B) More information known on
10 most valid cues; (C) 3 group members share all values on 10 most valid cues;
(D) 3 group members share all values on 3 most valid cues. See text for details.
this environment revealed that the two strategies that considered

cue validities (WADD and take-the-best) were relatively unaffected
by this manipulation, whereas the two strategies that did not con-
sider validity information (tallying and minimalist) could benefit
from having a majority of members with very limited knowledge
restricted to the best cues.
Third, the comparison between situations A and D also revealed
a general pattern that is interesting from an ecological point of view:
Whereas there were large differences between the strategies when
cues were shared equally as reported above (linear strategies out-
performed take-the-best in environments with a linear distribution
of cue validities but take-the-best matched or outperformed the
linear strategies in environments with J-shaped distributions), the

differences between the strategies’ performance shrank when cue
knowledge was concentrated in situation D. In fact, in this
situation in which a majority of group members had full informa-
tion on the best three cues, group performance was almost unaf-
fected by the strategy individual members were using to arrive at
their individual decisions.
Taken together, these analyses show that it might matter more
which pieces of information are shared by the members of a com-
mittee than the mere amount of information group members
have access to. Further, the results indicate that the effects of the
quantity of shared information on group accuracy depend on the
information environment, sometimes much more than the informa-
tion-processing strategy used by individuals.
Implications for Group Decision Making
Social combination rules for group decision making can be said to

rely on four facets: what there is to know, which of the individual
group members knows what, how group members process this
knowledge, and how the group puts the individual conclusions
together. While the first two facets describe environmental struc-
tures, the last two relate to the human mind, acting both alone and
socially. In this chapter we have concentrated on the first three
facets, and hence also on the interrelation between environment
and mind: What there is to know was manipulated such that
the information environments had different distributions of cue
validities, who knows what was considered by generating various
distributions of knowledge across the group, and individual group
members’ information processing was systematically varied by
using four different decision strategies. The fourth facet, how indi-
vidual group members integrate their individual decisions, was
held constant by using the majority rule. Here, we consider some of
the implications of what we have found.
The Ecological Rationality of Group Members’ Strategies

The relative performance of the individual decision strategies in
our simulations depended on the environment in which they
were tested. The results of our first simulation revealed that the
distribution of cue validities is an important feature of the informa-
tion environment that can strongly affect the accuracy of compen-
satory and noncompensatory strategies. This simulation extends
the work by Martignon and Hoffrage (2002) and is consistent with
findings by Hogarth and Karelaia (2005b, 2006b; see also chapter 3),
who classified environments according to the degree to which the

weights of the cues are noncompensatory. Hogarth and Karelaia
(2005b) found that the superiority of take-the-best over tallying
was most pronounced in environments that were strictly noncom-
pensatory as defined by Martignon and Hoffrage (2002), that is,
where the weight of any particular cue is higher than the sum of all
lower cue weights. The more the distribution of cue weights devi-
ated from such a noncompensatory set, becoming more evenly
spread, the less pronounced was the superiority of take-the-best
over tallying, until at some point tallying started to yield better
performance. This result is consistent with our observation that
take-the-best performed better than tallying in environments in
which cue validities followed a J-shaped distribution, while the
latter was superior in those environments in which cue validities
were linearly distributed. On Hogarth and Karelaia’s (2005b) scale
ranging from strictly noncompensatory to fully compensatory
environments, our J-shaped environments would be located toward
the noncompensatory end, whereas our environments with linear
distributions of cue validities would be located toward the com-
pensatory end.
How far are the results of our simulation likely to generalize?
Keep in mind that we fixed the number of alternatives and the
number of cues. Previous research has shown that in environments
with scarce information, that is, environments with fewer (binary)
cues than the logarithm of the number of objects, take-the-best is
likely to outperform tallying (Martignon & Hoffrage, 2002). Thus,
increasing the number of candidates and decreasing the number
of cues compared to the environment we used here will likely favor
take-the-best more than in our results.
The Ecological Rationality of the Majority Rule

Group decisions can often be well predicted on the basis of a major-
ity rule, and it yields robust decisions in many environments
(see Hastie & Kameda, 2005; Sorkin et al., 2001). The majority rule
compensates for errors of individual members as long as the erring
members do not form a majority faction. However, as with indi-
vidual rules, there is not one single omnipotent group decision
strategy that performs best across all types of information environ-
ments. The majority rule itself has systematic weaknesses (see
Hastie & Kameda, 2005, for various examples); for instance, when
information about alternatives is distributed among group mem-
bers in a biased way such that shared information favors one alter-
native but unshared information favors another, the majority rule
will systematically fail (Stasser, 1992). When the best alternative
has a hidden profile in this way, with the information about this
alternative being unshared, no single group member is likely to
infer that this is the best choice. As a consequence, when members
integrate their individual opinions on the basis of a majority rule,
this tendency to miss the hidden best alternative will be accentu-
ated and groups will be even less accurate than the average indi-
vidual. Simulation studies as well as empirical studies indicate
that groups are better off when they use a communication-based
strategy in such a situation (Reimer & Hoffrage, 2005; Reimer,
Kuendig, Hoffrage, Park, & Hinsz, 2007; Reimer, Reimer, & Hinsz,
2010).
Even though we held the social combination rule constant in our
investigations, some of our results on the individual group mem-
bers’ accuracies are relevant to the question of when groups should
use the majority rule and when they should use another combina-
tion rule. The basic insight is that a cue is for the individual member
Table 13-3: Decision Strategies for Individuals and Corresponding

Social Combination Rules at the Group Level, With Interpretations
and Examples for the Social Combination Rules
Individual Social Interpretation Example
strategy combination
rule
Tallying Majority/plurality Choose option that Voting
(unit weight rulea is preferred by
model) most
Take-the-best Best member rule Follow the most Group leader
experienced or decides
highest status
member
Minimalist Random Adopt the Groups decide
member rule decision of a on the basis of
(proportionality)b randomly a raffle among
chosen member members
Weighted Weighted Weight and add Votes are
additive plurality the individual weighted on
model votes the basis of
(WADD) seniority or
shares
a
The majority rule requires an absolute majority of votes in favor of one alternative
whereas the plurality rule only requires a relative majority.
b
If a group follows the opinion of a randomly chosen member, the probability of
an alternative being chosen equals the relative frequency of group members who
favor the respective alternative (proportionality—see Davis, 1973).
what the individual member is for the group. The information-

processing cube displayed in Figure 13-1 is helpful in understand-
ing similarities between strategies that aggregate across cues and
strategies that aggregate across member opinions. Thus, each of the
four strategies for the individual group members has a sibling—a
corresponding social combination rule (see Table 13-3). In this
regard, the majority rule is nothing but tallying implemented on a
higher level: Group members’ opinions are integrated in a compen-
satory way with each having the same weight. (If there is no abso-
lute majority, tallying corresponds to the plurality rule, which
adopts the decision of the relative majority.) Both strategies count
the number of cues (tallying) or members (majority) that favor one
alternative and go with the alternative that has the most cues or
votes, respectively. The WADD rule for individuals corresponds to
a weighted plurality strategy that requires groups to weight each
member’s vote according to their rank. The hierarchy of group
members can be defined, for instance, by expertise or seniority. The
two noncompensatory individual heuristics correspond to group
strategies that follow the opinion of one particular member. Take-
the-best is analogous to following the member with the highest
rank, as practiced in the military, or the most experienced member
(the best member rule). Minimalist corresponds to going with the
opinion of a randomly drawn member (which results in propor-
tional choice among the set of opinions), as used, for instance, by a
group of friends who repeatedly engage in common activities and
thus have to decide on a regular basis which of the various sugges-
tions to follow.
These formal similarities between the individuals’ strategies
and social combination rules allow us to extrapolate some of the
lessons from the ecological rationality of decision strategies for
individuals to the ecological rationality of social combination
rules. For instance, in a situation in which the variability of group
members with respect to expertise, knowledge, or decision accu-
racy is large, adopting the choice of the “best” member pays off
(also see Hastie & Kameda, 2005). Conversely, in a situation in
which members have, by and large, the same level of expertise, the
majority rule may yield better decisions than the best member rule.
Finally, in a situation in which all members have very high exper-
tise, a group would easily reach unanimity, and accuracy would be
high irrespective of which social combination rule the group uses.
Therefore, it may be better to save resources and to ask any one
individual to decide on this issue.1
1. Alternatively, results for the 100% condition in Table 13-1 (numbers

in parentheses), which refer to the accuracy of individual group members
with all available information, can also be interpreted as the accuracies of a
How often does each of these situations appear in everyday life?

Given the high prevalence of J-shaped distributions, one may spec-
ulate that the distributions of expertise in real groups also often
follow such a distribution. When evaluating social combination
rules, however, it is important to consider that decision accuracy
is only one criterion on which group decisions can be evaluated.
Vroom (1969), for example, stressed that the time a group needs to
come to a decision is another important factor. Furthermore, despite
the formal similarities between environmental structures and the
performance of aggregation mechanisms at the individual and
group levels, there are also some basic differences between these
levels that may be psychologically meaningful. For example, mem-
bers’ commitment to a final decision may be higher in groups
who reach their decision on the basis of the majority rule than with
the best member rule. Also, members of a committee may have a
good understanding of what the most valid cues are so that they
can each use take-the-best when forming their individual decisions,
but they may not know how the expertise is distributed within
their group, so that they cannot together apply the best member
rule. In such a situation, it could be reasonable to apply a noncom-
pensatory heuristic at the individual level but to combine mem-
bers’ opinions at the group level with a compensatory social
combination rule such as the majority rule.
The Relevance of Simple Heuristics for Real Groups

We started out with the question: Are there efficient and effective
group decision strategies? The answer is that it depends on the
environment, which means (among other things), how cue validi-
ties are distributed and how knowledge across group members is
distributed. This can help in selecting appropriate strategies from
the adaptive toolbox. Our simulations have shown that in environ-
ments in which cue validities follow a J-shaped distribution, a
lexicographic strategy provides a powerful tool for making fast
committee of 20 members that integrates the individual opinions on the basis

of a majority/plurality rule (which corresponds to tallying), the weighted
member rule (which corresponds to WADD), the best member rule (which
corresponds to take-the-best), or the random member rule (which corre-
sponds to minimalist). In this interpretation, the four distributions of cue
validities refer to group members’ expertise, and members do not have to
decide in favor of one single alternative but are allowed to do an approval
vote (i.e., each member approves or rejects each of the three candidates of
a triplet). As is indicated by the respective results, the best member rule
has a much higher accuracy than the majority rule (18 percentage points
difference) when the distribution of group members’ expertise is (steeply)
J-shaped.
and frugal individual decisions that may foster good decisions at

the group level, as well, and thus deserves further attention from a
prescriptive point of view. Research on group decision making
has revealed that groups often mainly discuss what is already
known by all members at the outset of the decision process, whereas
unique, unshared information is less likely to be mentioned during
discussions (see Wittenbaum & Stasser, 1996, for an overview).
As a consequence, recent research has focused on variables that
moderate this excess sampling of shared information and on inter-
ventions that may instigate the exchange of unshared information.
These studies are based on the assumption that more information
yields better decisions. But most of these interventions have been
found to have only marginal effects (Larson et al., 1994; Mennecke,
1997; Stasser et al., 1989, 1995; Stewart et al., 1998; for effective
interventions see Hollingshead, 1996; Schittekatte & van Hiel,
1996). Therefore, we must ask more generally whether this goal
of more information is appropriate for discovering methods that
support and improve group performance.
The findings for individual decision making described else-
where in this book as well as the results on group decision making
in this chapter point in the opposite direction, showing that fast
and frugal heuristics can compete well in some environments with
compensatory strategies that combine all available information.
This is particularly so in environments where cue validities follow
a J-shaped distribution, which, as indicated by our reanalysis of
real-world environments, may be common (see Figure 13-3). The
difficulty that groups have in pooling information (Stasser, 1992)
and the time-consuming nature of communicating many pieces of
information also point to the potential advantages of using such
heuristics.
This conclusion is also supported by our findings that the
consideration of more information does not necessarily increase
group performance. Indeed, the effect of mere quantity of cue values
known in common by simulated committee members was negligible
compared to the environmental effect of different distributions of
cue validities. For example, in the J-flat condition (Table 13-1),
groups whose individuals used take-the-best outperformed groups
using tallying even when the simulated committee members in the
former did not share a single cue value (nonetheless achieving 66%
accuracy) but committee members in the latter shared all cue values
(55% accuracy). Thus, having more information was trumped by the
interaction of appropriate heuristics and environment structure.
Given the attraction that the frugal, simple, noncompensatory
heuristics may have for decision-making groups, more empirical
studies are needed to determine the extent to which groups do (or
can) use heuristics when forming a decision (e.g., chapter 7).
Furthermore, we also need to explore how sensitive groups and

their individual members are to particular structures in their envi-
ronment, such as the characteristics of the cues (e.g., Reimer,
Kuendig, et al., 2007). By adjusting the way information is distrib-
uted among members and the individual heuristics that members
use to process their information, groups may be able to achieve a
better match to their environment structure and enhanced ecologi-
cal rationality in their decisions.
14
Naïve, Fast, and Frugal Trees
for Classification
Laura F. Martignon
Jan K. Woike
It is tempting to say that tree diagrams seem so natural, so

inevitable, that they are a part of our birthright, part of
our inborn mental architecture.
Ian Hacking
T rees for classification were proposed—at least implicitly—at

an early stage of our human intellectual development: Aristotle
classified items by a method later called “genus et differentiam”
(i.e., by placing together items of the same type and differentiating
according to features) that, although not represented by a tree, used
an underlying process corresponding to tree-based classification.
Trees became systematic graphical visualizations in the 13th cen-
tury, as, for example, in the works of Boethius1 and the Byzantine
monk Nicephorus Blymmides (see Hacking, 2003).
The 20th and 21st centuries have seen a proliferation of tree
models for classification. Some, like CART (Classification and
Regression Trees proposed by Breiman, Friedman, Olshen, & Stone,
1993), are extremely simple in their execution but nevertheless
computationally challenging in their construction. Trees stemming
from statistics, machine learning, and artificial intelligence have
been proposed by and for practitioners. Yet, most such trees are far
more complex than those people commonly use when in a hurry
and when information search is costly. Here, we will concentrate
on strikingly simple classification trees—in fact, trees that cannot
be simplified any further if the number of cues is fixed.
1. Boethius used a “drawn” tree to represent the “Arbor Porphyrius,”

the tree process developed by Porphyry (235–305 A.D.) for describing
Aristotelian categorization.
360
NAÏVE, FAST, AND FRUGAL TREES FOR CLASSIFICATION 361
The trees we will introduce are naïve, fast, and frugal. By “naïve”
we mean that the trees ignore conditional dependencies between
cues when ordering the cues. “Frugal” means that the trees do
not necessarily use all cues; as a result, these trees are also “fast”
(a more precise definition of these terms will be given later). The
trees implement one-reason classification in analogy with heuris-
tics for one-reason decision making (Gigerenzer & Goldstein,
1999). In sum, the trees are naïve in construction and fast and
frugal in execution, and we will show that their accuracy for clas-
sification is surprisingly high in both fitting and predictive general-
ization, when compared to far more complex standard models.
In this chapter, we will first discuss the problem of classification
and how trees can be used to solve it. We will then discuss
how naïve fast and frugal trees can be constructed and we will com-
pare them analytically with more complex classification models.
Finally, we will test the ecological rationality of different types of
trees in a simulation across a wide range of environments, to see
when each type works well.
Fast and Frugal Classification in the Real World
When several cues have to be processed, full Bayesian classifica-

tion becomes infeasible even for computers, making models that
prune complexity away and capture essential features of the cues
indispensable. During the second half of the 20th century, the
Bayesian community produced a flurry of subtle models that
reduce complexity while remaining good approximations of the
Bayesian ideal. Among the most successful are Bayesian networks,
radical simplifications of the full Bayesian procedure that are able
to detect and ignore spurious conditional dependencies between
cues and thereby achieve high predictive accuracy. Yet, construct-
ing Bayesian networks can also be computationally intractable
(Cooper, 1990).
There has been a lot of work on human classification for per-
ceptual tasks in the laboratory, such as classifying an object based
on its color and shape (Ashby, 1992; Nosofsky, 1984). But it may
well be that perceptual categorization is different from classifica-
tion that aims to support real-world decision making, such as
classifying patients in order to decide their medical treatment
(Berretty, Todd, & Martignon, 1999; Martignon, Katsikopoulos, &
Woike, 2008).
How do doctors classify a patient with severe chest pain as one
who has a high risk of heart disease and should thus be treated in
the coronary care unit (CCU)? In what follows, we report how

this question was investigated by Green and Mehr (1997), and this
will serve as a focal example of our exploration of fast and frugal
trees. In a Michigan hospital, doctors used to rely on their clinical
judgment to decide whether a patient with intense chest pain had a
high or a low risk of heart disease. The doctors sent about 90%
of the patients to the CCU. This is a “defensive” strategy that can
protect the doctors against the risk of lawsuit (Gigerenzer & Engel,
2006; Studdert et al., 2005). But it led to excessive costs (too
many people in the CCU), decreased the quality of care provided
(the unit was overcrowded), and introduced risks for patients who
should not have been in the unit (it is one of the most dangerous
places in a hospital due to the risk of potentially fatal secondary
infections).
To solve this problem, Green and Mehr (1997) trained physicians
to use the Heart Disease Predictive Instrument (HDPI; Pozen,
D’Agostino, Selker, Sytkowski, & Hood, 1984), which consists of a
chart with more than 50 probabilities (Figure 14-1) and a pocket
calculator with a logistic regression program. If the score from the
HDPI is higher than a critical value, then the patient is classified
as being at high risk and sent to the CCU, otherwise, not.
A quick glance at the chart makes it clear why physicians are not
happy using this and similar systems. Physicians do not readily
Figure 14-1: The Heart Disease Predictive Instrument (HDPI; Pozen

et al., 1984; Green & Mehr, 1997). EKG is electrocardiogram; ST is a
particular electrocardiogram feature; MI is myocardial infarction; and
NTG is nitroglycerin used for chest pain. (Adapted from Gigerenzer,
2007.)
understand these calculations, which are not transparent and

appear to suggest that clinical judgment should depend on a pocket
calculator (often a touchy point for clinicians; see also Katsikopoulos,
Pachur, Machery, & Wallin, 2008). A dilemma emerged: Should
patients in life-and-death situations be classified by physicians’
defensive judgments or by complex calculations that are alien to
them but possibly more accurate? To study this, Green and Mehr
(1997) employed an ABAB design in which they first let the
physicians make the decision by clinical judgment (condition A),
then gave them the HDPI (condition B), then withdrew the instru-
ment and left the physicians on their own once more (condition A),
and so on.
Green and Mehr (1997) had expected that the quality of decision
making would be relatively low in condition A and high in
condition B, and oscillate overall. In fact, quality first increased
from A to B, as expected, but then surprisingly stayed at this level
even when the physicians had no access to the chart and the pocket
calculator. The physicians seemed to be using some of the informa-
tion in the charts—such as which symptoms are diagnostic and of
those, which are more diagnostic than others—but ignoring the
quantitative aspects of the information in terms of exact proba-
bilities. This surprising observation suggested a new possibility,
namely, that the probabilities and computations of the logistic
regression may not matter much at all for achieving high accuracy.
Based on this observation and on work on the take-the-best
heuristic (Gigerenzer & Goldstein, 1996), Green and Mehr (1997)
constructed what we call a fast and frugal tree, which can be
described in terms of three building blocks: (a) ordered search,
(b) fast stopping rule, and (c) one-reason decision making. The
tree is shown in Figure 14-2. It asks only a few yes-or-no questions
and has an exit—that is, a final decision that could be reached—at
every level—that is, for each question asked. If a patient has a
certain anomaly in his electrocardiogram—the so-called ST seg-
ment is elevated—he is immediately classified as being at high
risk and is assigned to the CCU. No other information is searched
for. If that is not the case, a second question is asked: Is the patient’s
primary complaint chest pain? If this is not the case, he is classified
as low risk and put in a regular nursing bed. No further information is
considered. If the answer is yes, a final question, namely, whether
any of four remaining symptoms beyond chest pain is present, is asked
to classify the patient and decide on the appropriate treatment.
Unlike the logistic regression used in the HDPI, the structure of
the fast and frugal tree is transparent both to emergency room
physicians and patients. In fact, because physicians know the
cues well, they can memorize the classification tree structure easily
and communicate clearly with their patients about the decision
ST segment elevated?
yes no
High Risk Chest pain main symptom?
yes no
Other symptoms? Low Risk
yes no
High Risk Low Risk
Figure 14-2: Green and Mehr’s (1997) fast and frugal tree for
classifying patients as having a high or a low risk of heart disease.
process. Transparency, however, is not the only criterion of perfor-

mance—time, money, and accuracy also matter in a hospital with
limited resources. Because the simple tree uses only part of the
information required by the HDPI, it does save time and costs.
But how accurate is this one-reason classification? Would you want
to be classified by a few yes-or-no questions in a situation with
such high stakes? Or would you rather be evaluated by the logistic
regression, or perhaps by a physician’s intuition?
There are two aspects to accuracy (Figure 14-3). On the y-axis we
graph the proportion of patients correctly assigned to the CCU—the
hit rate—measured by considering which patients subsequently
had a heart attack. On the x-axis we graph the proportion of patients
incorrectly assigned to the CCU—the false-positive rate. The diago-
nal line represents chance performance. In a perfectly predictable
world, a diagnostic method would reach a point in the upper left-
hand corner, but in our uncertain world, such perfection is more
of a fantasy, and heart attacks cannot be predicted with this preci-
sion. The physicians’ initial performance—marked by a triangle in
Figure 14-3—was at chance level, even slightly below it: They sent
about 90% of the patients to the CCU but could not discriminate
between those who should be there and those who should not
(though after being exposed to the HDPI, the false-positive rate
decreased greatly). Green and Mehr (1997) tested the HDPI—shown
by the squares in Figure 14-3—and found it to be substantially
better than chance and the physicians’ initial judgments. The reason
why there is more than one square is that researchers can adjust
the critical value for assigning people to the CCU and this leads
to trade-offs between the values on the two axes. The curve on
Proportion Correctly Assigned (Hit Rate)

.9
.8
.7
.6
.5
.4
.3
.2 Physicians
HDPI
.1 Fast and Frugal Tree
0
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Proportion Incorrectly Assigned
(False-Positive Rate)
Figure 14-3: The performance of unaided physicians, the Heart

Disease Predictive Instrument (see Figure 14-1), and the fast and
frugal tree of Green and Mehr (1997; see Figure 14-2), for classify-
ing patients as having a high or a low risk of heart disease. The
diagonal line represents chance performance.
which the squares lie is called the receiver operating characteristic

(ROC) curve.
How did the fast and frugal tree perform? The surprising result
was that it was overall more accurate in classifying heart attack
patients than both the physicians’ defensive decisions and the
logistic regression. The fast and frugal tree—indicated by the star in
Figure 14-3—led to a very high proportion of patients correctly
assigned to the CCU. At the same time, it had a comparatively low
false-positive rate. The logistic regression drew on much more
information than the heuristic and could make use of sophisticated
statistical calculations, yet simplicity paid off.
Fast and frugal trees have also been proposed for other tasks,
such as deciding whether to prescribe antibiotic treatment to young
children suffering from community acquired pneumonia. Fischer
et al. (2002) presented a tree that uses at most two cues, whose
values can be easily obtained. The tree is shown in Figure 14-4.
One first checks whether the child has had fever for more than
two days. If the answer to this is “no,” it is immediately concluded
that macrolides (the appropriate form of antibiotics) should not be
prescribed. Only if the answer to the first question is “yes” is the
second level of the tree invoked, checking whether the child is
older than 3 years. If the answer to this second question is “no,” it
Fever for more 2 days?

yes no
Child older than 3 years? No macrolides
yes no
Prescribe
No macrolides
macrolides
Figure 14-4: Fischer et al.’s (2002) fast and frugal tree for deciding
whether to prescribe antibiotic treatment to young children suffer-
ing from community acquired pneumonia.
is concluded that macrolides should not be prescribed, and if “yes,”

then macrolides are recommended. This tree can also be expressed
as a rule that is easy to memorize and quick to apply: “Prescribe
macrolides only if the child has had fever for more than 2 days and
is older than 3 years.” Additionally, the fast and frugal tree does
not sacrifice much in accuracy. When evaluated on real data, it
classified as “high risk of microstreptococal pneumonia infection”
72% of those children who actually were at high risk, while a scor-
ing system based on logistic regression identified 75% of them.
Let us look at another example from a nonmedical context.
Employment agencies need simple decision mechanisms for estab-
lishing whether job applicants should be assigned to vocational
retraining programs. There are a number of tests that can be admin-
istered for decision support. Some of them measure aspects of
cognitive achievement, including tests of basic arithmetic com-
petencies, tests of applied arithmetic competencies, and tests of
concentration ability. Wottawa and Hossiep (1987) reconstructed
the decision procedures of employment agents in terms of a fast
and frugal tree. They showed that this tree fit the decision data very
well (accuracy over .90) and also made good predictions in cross-
validation.
Basic Concepts of Fast and Frugal Trees
According to Hacking (2003), the first mathematical theorem

about trees was proven by Arthur Cayley in 1857. Cayley, who pub-
lished more than 900 papers spanning most fields of modern math-
ematics, wrote about trees while working as a lawyer. Hacking
emphasizes how Cayley avoided providing a definition of trees and
simply provided drawings instead. We present the theory of fast

and frugal trees by alluding, when possible, to the visual properties
of trees and avoiding the formalisms that we present elsewhere
(Martignon, Vitouch, Takezawa, & Forster, 2003; Martignon et al.,
2008). Although we restrict ourselves to the treatment of binary
classifications by means of binary cues, generalizations to the
case of more categories and n-valued cues are straightforward.
A little rigor will be required, however, for the definition given
below. First, we define an exit in a tree as a node that terminates
information search and specifies the action to be taken. Second, we
define a level by the presence of a question node. Next, let us
establish the conventions that, in the trees we consider, an exit
node reached by a “yes” answer hangs to the left side of the ques-
tion node immediately above it and that a “yes” answer for the ith
cue ci is denoted by ci = 1 (for “no” answers the exit hangs to the
right and ci = 0). For any object let us denote by x its cue profile,
so that xi is the value of x on the ith cue. The following definition
formalizes fast and frugal trees.
Definition 1: A classification tree is fast and frugal if and only

if it has at least one exit at each level.
The tree in Figure 14-2 is fast and frugal: First, the ST segment
is checked and if it is elevated a classification of the patient’s
condition is immediately made and the tree is exited with a specific
action (CCU) based on that cue’s value; but if the ST segment is not
elevated, next chest pain is checked, which again provides the
opportunity to make a classification and exit; and finally other
symptoms are checked only if necessary, and appropriate exit and
action is taken at that point. If a second question were asked for
patients with elevated ST segment, rather than exiting immediately
at that point, then the tree would not be fast and frugal.
The labels fast and frugal have precise meanings (Gigerenzer &
Goldstein, 1996): The frugality of a tree for classifying a set of
objects is the mean number of cue values (across objects) it uses for
making a classification and hence reaching a decision—fewer cues
used means greater frugality. The speed of a tree is the mean number
of basic operations—arithmetic and logical—needed for making a
classification. It is clear that, with these definitions, replacing ques-
tion nodes with exit nodes makes a tree faster and more frugal. Fast
and frugal trees are also “minimal” among those constructed for a
given set of cues, with the fewest number of question nodes made
using the cues when tested one at a time.
Figure 14-5 illustrates how the fast and frugal tree of Green and
Mehr (1997) is related to the natural frequency tree (see also
89
ST=1 ST=0
33 56
CP= 1 CP=0 CP=1 CP= 0
23 10 27 29
OC =1 OC=1 OC=1 OC= 1

OC=0 OC=0 OC=0 OC= 0
21 2 8 2 17 10 19 10
9 12 1 1 3 5 0 2 2 15 0 10 0 19 0 10
Figure 14-5: The natural frequency tree of the Green and Mehr
(1997) study. Shaded nodes correspond to people with infarctions
(and nonshaded nodes to people without). Bold branches outline
the fast and frugal tree of Figure 14-2. The bold vertical bar corre-
sponds to the split of cue profiles, according to the fast and frugal
tree, into those with high risk of heart disease (left of bar) and those
with low risk of heart disease (right of bar).
Martignon et al., 2003) on a data set provided by Green and Mehr,2

which will allow us to compare fast and frugal trees with the
Bayesian approach to classification. In this sample, 89 patients
were checked on three binary cues (ST segment elevation, abbrevi-
ated ST; chest pain or CP; and other conditions or OC). The fourth
level of the tree shows how many patients share each one of the
eight possible combinations of cue values. From now on, we refer
to a combination of all cue values as a cue profile. For example,
there are 17 patients with the cue profile [0, 1, 1] indicating
that they do not have an elevated ST segment (ST = 0), have severe
chest pain (CP = 1), and have other symptoms (OC = 1). The fifth
level of the natural frequency tree shows how many patients, in
each cue profile, did turn out to be sick (shaded nodes) and how
many turned out to be healthy (nonshaded nodes) as measured by
a subsequent infarction (heart attack).
How would a Bayesian approach to classification work based on
the Green and Mehr data? For a new patient (not one of the 89
patients in the Green and Mehr study), a Bayesian physician would
first check each cue and determine the patient’s cue profile. Then,
2. Green and Mehr kindly provided us with a representative subsample

extracted from their much larger data set.
the Bayesian physician would assess the probability of an infarc-

tion for the patient’s cue profile based on the information on the
fifth level of the natural frequency tree. For example, the probabil-
ity of infarction is 2/17 for a patient with a cue profile [0, 1, 1], that
is, with ST = 0, CP = 1, and OC = 1. Finally, a threshold can be used
to decide if the probability of infarction for the new patient is
high enough to classify the patient as having a high risk of heart
disease. For example, if the threshold is .1, then new patients with
cue profiles [1, 1, 1], [1, 1, 0], [1, 0, 1], or [0, 1, 1] would be classified
as having a high risk of heart disease.
In Figure 14-5, the fast and frugal tree of Figure 14-2 is outlined
by using bold font for some of the branches of the natural frequency
tree. In the fifth level of the natural frequency tree, the branches
“lead” to a bold vertical bar. In order to classify a new patient as
having a high risk of heart disease, the fast and frugal tree splits the
cue profiles in two parts: those that are to the left of this bar and
those that are to the right. The cue profiles to the left correspond to
patients who are classified as having a high risk of heart disease.
That is, new patients with cue profiles [1, 1, 1], [1, 1, 0], [1, 0, 1], [1,
0, 0], or [0, 1, 1] are classified as having high risk. Note that this
strategy handles the cue profile [1, 0, 0] differently from the natural
frequency tree. The cue profiles on the right of the bar correspond
to patients who are classified as having a low risk of heart disease.
What are the relative advantages of the natural frequency and
fast and frugal trees? One advantage of the natural frequency tree is
that the classification of a new patient is tailored to the patient’s
cue profile (by using all information on patients, with or without
infarction, who share this profile). There are downsides to this
procedure though. For one thing, it needs a lot of data—all ques-
tions on cue values have to be asked for each patient. And yet, the
available data may not be enough—for some profiles the decision
is based on a tiny number of cases that may make the decision
procedure quite brittle. For example, the decision for the profile
[1, 0, 0] has to be based on only two cases.
In contrast, the fast and frugal tree is clearly less savvy than the
natural frequency tree. In fact, a large number of patients are
assigned to a category after only one cue value has been checked.
This feature, though, allows it to be much faster. Furthermore, in
contrast to the Bayesian approach, some classifications are made
jointly for patients with different cue profiles. For example, the
classification for the profile [1, 0, 0] is made jointly with the profiles
[1, 1, 1], [1, 1, 0], and [1, 0, 1]. This practice, one may expect, con-
tributes to robustness, that is, good generalizations to new groups
of patients. Finally, because the fast and frugal tree is transparent
and easy to apply, it suits the hospital environment and may be
more appropriate for guiding behavior in emergencies.
Fast and Frugal Trees Are Lexicographic Classifiers
We now show that fast and frugal trees can be characterized as

models that classify objects lexicographically. Here again some
mathematical rigor is required. We will assume n binary cues and
two classes or categories, C1 and C0 (corresponding in the Green
and Mehr, 1997, study to the presence of heart disease and absence
of disease, respectively). Without loss of generality, we also assume
that cues are inspected in the order c1, c2, . . ., cn and that for i = 1,
2, . . ., n, cues are coded so that if an object x exits the tree at the ith
level, it is assigned to category C1 if xi = 1 and to C0 if xi = 0. We next
provide the definition of lexicographic comparison.
Definition 2: A cue profile x is lexicographically larger than a

cue profile y (x >l y) if and only if there exists 1 d i d n such
that xi = 1 and yi = 0 and xj = yj for all j < i. If neither x >l y nor
x >l y, then x and y coincide (x = y).
The following result establishes a characterization of fast and frugal

trees as lexicographic classifiers (Martignon et al., 2008).
Result 1: For every fast and frugal tree f there exists a unique
cue profile S(f)—called the tree’s splitting profile—so that
f assigns x to C1 if and only if x >l S(f). For every cue profile S
there exists a unique fast and frugal tree f, such that S(f ) = S.
In Figure 14-5, let x1 = 1 if and only if the ST segment is elevated,

x2 = 1 if and only if chest pain is the main symptom, and x3 = 1 if
and only if any of the other four symptoms are present. Also let C1
represent high risk and C0 low risk. The splitting profile of this tree
is [0, 1, 0]. The bold vertical bar marks the position of the splitting
profile. In Figure 14-5, all cue profiles to the left of the bar are lexi-
cographically larger than the splitting profile. As result 1 says, these
cue profiles are assigned to the high risk category C1.
Result 1 says that fast and frugal trees implement one-reason
classification, in analogy with fast and frugal heuristics that imple-
ment one-reason decision making for paired comparisons
(Gigerenzer, Todd, & the ABC Research Group, 1999): Classifying
an object by using a fast and frugal tree reduces to comparing its
cue profile x with the tree’s splitting profile.3
3. Another model for classification, RULEX, has also been linked to one-
reason heuristics: “We find the parallels between RULEX and these one
reason decision-making algorithms to be striking. Both models suggest that
human observers may place primary reliance on information from single
dimensions” (Nosofsky & Palmeri, 1998, p. 366).
The splitting profile concept can help us intuit the possible

shapes of a fast and frugal tree. Note first that for any splitting pro-
file S, one of the following must be true:
Si = 1 for all i < n or

Si = 0 for all i < n or
Si ≠ Sj for some i, j < n.
These cases correspond to different tree shapes. Trees with a split-

ting profile of the first two types are called pectinates (meaning
combs) or rakes and are used in biology for species identification.
The tree for prescribing antibiotics proposed by Fischer et al. (2002)
in Figure 14-4 has such a short rake form. Dhami (2003) provides
evidence that the decisions of mock juries on whether to grant bail
can also be described by rakes. The “trunk” of a rake is a straight line.
Splitting profiles of the third type generate zig-zag trees. The Green
and Mehr (1997) tree in Figure 14-2 is a zig-zag tree. The “trunk” of
a zig-zag tree exhibits, as the name suggests, a series of turns.
Fast and Frugal Trees and Linear Models
Fast and frugal trees are also connected to linear models. In linear
models for classification, each cue ci has a weight wi > 0 and for
n
each cue profile x = [x1, x2, . . ., xn], the score R ( x ) = ∑x w
i =1
i i
is
computed. A scalar threshold h > 0 defines the categories: x is assigned

to C1 if and only if R(x) > h. Tallying is a linear model with all weights
wi = 1. The relation between linear and lexicographic inference has
been analyzed previously for paired comparisons (Hogarth & Karelaia,
2005b, 2006b; Katsikopoulos & Fasolo, 2006; Katsikopoulos &
Martignon, 2006; Martignon & Hoffrage, 1999, 2002). Here we relate
linear and lexicographic inferences for classifications.
Result 2: For every fast and frugal tree f there exist h > 0 and
wi > 0 where
w i > ∑ w k for i = 1, 2, …, n − 1
k i
so that f makes identical classifications with the linear model
with weights wi and threshold h. For every linear model with
weights wi > 0 so that
w i > ∑ w k for i = 1, 2, …, n − 1
k i
and a threshold h > 0, there exists a fast and frugal tree f that
makes identical categorizations.
For example, the Green and Mehr (1997) tree in Figure 14-2 makes
identical classifications with the linear model with R(x) = 4x1 + 2x2 +
x3 and h = 2 (they both assign [0, 0, 0], [0, 0, 1] and [0, 1, 0] to C0 and
all other cue profiles to C1).
Linear models with w i > ∑ w k are called noncompensatory (Ein-
k i
horn, 1970; Martignon & Hoffrage, 2002). Result 2 says that fast and
frugal trees are equivalent to noncompensatory linear models in the
sense that the two make the same classifications. Note, however,
that result 2 does not imply that it is impossible to distinguish
empirically between fast and frugal trees and noncompensatory
linear models. The process predictions of fast and frugal trees,
including ordered and limited information search, are distinct from
those of linear models, which use all available information in no
specified order.
To summarize our results so far, fast and frugal trees are a simple,
heuristic way of implementing one-reason classification. They can
be represented as lexicographic classifiers with a fixed splitting
profile or as noncompensatory linear models for classification.
Also, the trees form a family of transparent models for performing
classification with limited information, time, and computation. But
how can we build accurate fast and frugal trees? Our next step is to
present construction rules for ordering cues in fast and frugal trees.
Constructing Fast and Frugal Trees With Naïve Rankings
Fast and frugal classification is lexicographic: Cues are looked up one

after the other and at each step one of the possible cue values will lead
to a classification (and exit the tree). The ranking (ordering) of cues
determines how accurate classifications will be. What are effective
procedures for determining good cue rankings? One can, of course,
seek the optimal ranking when fitting the trees to known data, that is,
the ranking that achieves the highest accuracy on the classification
task. But finding optimal rankings is NP-hard (Schmitt & Martignon,
2006). Are there simple procedures for ranking cues that provide
good lexicographic classifications? A first important observation is
that in the case of rake-shaped trees, the order of cues has no influence
on the classifications made (Martignon et al., 2003), which means
that we can focus on cue-ranking procedures for zig-zag trees.
Result 3: In a rake-shaped fast and frugal tree, all orderings of

cues produce exactly the same classifications.
Tackling the question of rankings in the general case requires

distinguishing between rankings that are learned online and those
established through some sort of batch learning. In online learning,
an agent can begin by classifying objects lexicographically with a
random cue ranking and then update her ranking by adjusting the
order of cues according to the success of her subsequent ongoing
classifications over time (Todd & Dieckmann, 2005; see chapter 11).
Here, the rankings of all cues are updated together and good cues
will rise through the ranks as decisions are made. In many realms
of individual classification such online learning of cue rankings
may be effective, for instance, where people can learn over time
what cues are best for classifying products as worth buying.
The contexts we consider here are of a different kind, where
practitioners have to classify items based on their values on cues
that are costlier to assess, such as test results or symptoms. In these
contexts, because of the costs involved, the predictive value of
one cue is typically assessed independently of other cues, and
hence in a batch mode. Following the treatment of Bayes classifiers
in the literature (Domingos & Pazzani, 1997), we call the attitude of
looking at cues separately, making use of just their individual pre-
dictive values to calculate the final cue order, naïve.
It is precisely this attitude that underlies the construction of
naïve fast and frugal trees. Batch learning of individual cue validi-
ties has been extensively investigated in the context of causality
and diagnostics (Pearl, 2000; Waldmann, Holyoak, & Fratianne,
1995; Waldman & Martignon, 1998). Cognitive psychologists study-
ing human approaches to diagnostics and causality have also
shown empirically that humans tend to assess the effect of two
(or more) cues in a naïve fashion, ignoring conditional interdepen-
dencies. Participants appear to implicitly assume conditional
independence of cues (i.e., effects) for a given criterion4 (i.e., cause),
and they will only incorporate estimates of interactions after they
consistently receive strong negative feedback on their assessments
(Waldmann et al., 1995; Waldmann & Martignon, 1998). The naïve
cue rankings for fast and frugal trees presented here assume both
that the validities of cues are assessed for each cue individually by
some sort of batch learning on a training set of items, and that cues
are conditionally independent given the criterion.
How can we construct naïve rankings of cues? Assume that we
have a binary decision to make and we call one of the outcomes posi-
tive (e.g., absence of disease) and the other negative (e.g., presence
4. Conditional independence is relevant in situations where there is

a “common cause” as described by Suppes (1984)—for instance, breast
cancer is a common cause for a positive mammogram and a positive ultra-
sound test.
of disease). A cue may err in one of two ways—it may be positive

when the outcome is negative, or vice versa. We will therefore con-
sider both the positive and the negative cue validity. Both validities
are assessed by analyzing the performance of the cue in a sample of
cases. The validity values can be easily derived from natural fre-
quency trees or, equivalently, from a contingency table for each cue,
as shown in Table 14-1. Given the numbers of observations for each
one of the four combinations of cue and outcome values, a, b, c, and
d, shown in Table 14-1, the two cue validities are specified by two
simple ratios: The positive validity of a cue is the proportion of objects
with a positive outcome value (a) among all those that have a positive
cue value (a + c). Likewise, the negative validity of a cue refers to the
proportion of objects with a negative outcome value (b) among all
those that have a negative cue value (b + d).
Using these cue validities, positive and negative, we can now
specify two naïve tree construction rules. One way to proceed is by
what we call the maximum validity (in short, MaxVal) approach.
Cues are ranked according to the greater of their two validities
(i.e., their individual maximum validity) and then assigned to the
tree’s levels in this ranked order, with the following decisions
made at each level: If the positive validity of a cue is higher than
its negative validity, a positive exit (i.e., corresponding to a “yes”
answer) will be placed at its level; if its negative validity is higher,
then a negative exit (corresponding to a “no” answer) will be used.
If the validities are equal, a coin is tossed and one exit is chosen
accordingly. The very last cue will always lead to both types of
exits. The resulting tree is fast and frugal, having an exit on every
level. A tree built according to MaxVal uses each cue in the direc-
tion in which it is more valid. Because the validities used in its
construction were computed independently of each other, MaxVal
is also a naïve rule.
Table 14-1: Contingency Table for Computing Positive and

Negative Cue Validity From Number of Observations of
Combinations of Cue and Outcome Values
Outcome Cue value
Positive Negative
Positive a b
Negative c d
Positive cue validity = Negative cue validity =
a/(a + c) d/(b + d)
But MaxVal can run into problems: If for every cue the positive
validity is higher than the negative validity (or vice versa), then the
resulting tree will be a rake, and if in addition the number of
cues is high, this means that nearly all objects will be classified as
belonging to the same category. To avoid such possible extreme
cases, we also consider a construction rule that strikes a balance
between the categories that objects are classified into.
The alternative approach, called Zig-Zag, produces trees that
follow a zig-zag pattern—the direction of the exit nodes alternate
between positive and negative classifications, and correspondingly
the cue with the greatest positive or greatest negative validity will
be chosen at each step. This procedure is implemented starting
at the top level and proceeding downward until the last remaining
cue is assigned to the last level with exits for both categories. If
the distribution of objects according to the two criterion values is
more or less even, as in the Green and Mehr data, this procedure
seems both natural and reasonable. If the distribution of objects is
extremely uneven, a couple of extra steps may be incorporated by
the Zig-Zag rule to even out the asymmetries (or, in the jargon of
data mining, to “tame the distribution”; see Martignon et al., 2008,
for the technical details).
In sum, both the MaxVal and Zig-Zag ranking procedures
ignore conditional dependencies of cues given outcome values,
and both base their rankings simply on positive and negative cue
validities (and for the Zig-Zag method, possibly also on the relative
size of the object classes).
As an application, consider the Green and Mehr data in Figure 14-5.
Using the formulas in Table 14-1, the positive and negative validities
for the three cues, ST, CP, and OC, are as shown in Table 14-2. Given
these validities, MaxVal creates a rake, while Zig-Zag builds the
zig-zag tree proposed by Green and Mehr (1997; see Figure 14-2).
Comparing Trees and Other Classification Models
Fast and frugal lexicographic and one-reason decision heuristics do

very well in environments where they are ecologically rational, as
Table 14-2: Positive and Negative Validities of the Three Cues for
the Green and Mehr (1997) Dataset (Shown in Figure 14-5)
Cue ST CP OC
Positive validity .39 .24 .22
Negative validity .96 .92 .93
Note. ST: Elevated ST segment; CP: chest pain; OC: other conditions
the other chapters in this book attest. How about fast and frugal
trees? In this section we use computer simulations to test their
accuracy and robustness in various environments, compared to
other classification models, which we describe next.
Logistic regression is a typical statistical regression model for
binomially distributed dependent variables. It is a generalized
linear model that classifies by means of comparing a weighted sum
of cue values with a fixed threshold. Logistic regression is exten-
sively applied in the medical and social sciences.
CART (Breiman et al., 1993) builds classification and regression
trees for predicting numerical dependent variables (regression) or
categorical predictor variables (classification). The shape of the
trees it constructs is determined by a collection of rules that are
selected based on how well they can differentiate observations, in
the sense of information gain. No further rules are applied when
CART establishes that no further information gain can be made.
CART shares common features with fast and frugal trees because
its strict rules for construction lead, in general, to trees that have
fewer nodes than the natural frequency tree. Yet, the construction
of CART trees is computationally intense because information gain
has to be assessed conditionally on previous rule applications.
We tested logistic regression and CART against our MaxVal and
Zig-Zag tree construction methods on 30 datasets, mostly from the
UC Irvine Machine Learning Repository (Asuncion & Newman,
2007). We included very different problem domains (from medi-
cine to sports to economics), with widely varying numbers of
objects (from 50 to 4,052) and cues (from 4 to 69). The accuracy of
each model was evaluated in four cases—fitting all the data, and
generalizing to new data (prediction) when trained on 15%, 50%,
or 90% of all objects (for estimating model parameters)—and tested
on the remaining objects (see Figure 14-6).
We also compared the models on a restricted set of just 11 of the
30 data sets from Figure 14-6 that were from medical domains,
which could better match the conditions discussed earlier for
batch learning situations. The performance of the naïve fast and
frugal trees on these medical data sets was markedly better than on
the 30 data sets overall, as shown in Figure 14-7.
As was to be expected, the more complex models (logistic regres-
sion and CART) were the best performers in fitting. But a good
classification strategy needs to generalize to predict unknown
cases, not (just) explain the past by hindsight. In prediction, the
simple trees built by Zig-Zag match or come close to the accuracy
of CART and logistic regression while MaxVal lags a few percentage
points behind. CART appears to overfit the data, losing 17 percent-
age points of accuracy from the fitting to the 15% training situation
in the 30 data sets of Figure 14-6. In the 11 medical data sets of
100
CART

93
Logistic Regression
90 Zig-Zag
85
MaxVal
82 81
79 80 79
80 78 78
76 75 75
74 73 73 72
70
60
50
Fitting Prediction Prediction Prediction
from 90% from 50% from 15%
Training Set Training Set Training Set
Figure 14-6: Average performance of four classification models

(CART = classification and regression trees, logistic regression, and
fast and frugal trees with cues ordered by MaxVal or Zig-Zag rules),
across 30 datasets, in fitting and three cases of prediction (general-
izing to new data based on training sets of 90%, 50%, or 15% of the
whole dataset).
100
CART
91 Logistic Regression
90 Zig-Zag
MaxVal
82
80 78 79
78
75 76 76
74 74 74 74 74
72 72 72
70
60
50
Fitting Prediction Prediction Prediction
from 90% from 50% from 15%
Training Set Training Set Training Set
Figure 14-7: Average performance of four classification models

(CART, logistic regression, and fast and frugal trees with cues
ordered by MaxVal or Zig-Zag rules), across the 11 medical data
sets out of the 30 data sets of Figure 14-6, in fitting and three cases
of prediction (generalizing to new data based on training sets of
90%, 50%, or 15% of the whole dataset).
377
Figure 14-7, Zig-Zag even outperforms CART in predictive accu-

racy. In sum, these simulations show that fast and frugal trees
can be very competitive when trained on small samples and do not
fall too far behind more complex models from machine learning
and statistics when training sets grow larger.
Of course, we would like to be able to say more about when
and where fast and frugal trees will perform so well—what kinds of
environments allow them to be ecologically rational? The ecologi-
cal rationality of lexicographic heuristics for paired comparison,
such as take-the-best, has been studied analytically (Baucells,
Carasco, & Hogarth, 2008; Hogarth & Karelaia, 2007; Katsikopoulos
& Martignon, 2006; Martignon & Hoffrage, 2002; see also chapter 3).
Because of result 1 presented earlier, which states that classification
by a fast and frugal tree is formally equivalent to that of a lexico-
graphic classifier, one would expect that previous formal work
on the ecological rationality of take-the-best could be translated,
mutatis mutandis, into the framework of fast and frugal trees. This
is essentially true and is the core of future work.
Conclusion
Fast and frugal trees are a simple alternative to classification

methods that rely on copious information and heavy computation.
The trees are naïve in construction and fast and frugal in execution.
In this chapter, we showed that there is little price to be paid by
using these trees: In a number of classification environments, they
performed about as well as CART and logistic regression, even
besting them in predictive accuracy in some cases when generaliz-
ing from small amounts of information. The ecological rationality
of fast and frugal trees for classification can be studied by exploit-
ing their formal equivalence to lexicographic heuristics for paired
comparison, such as take-the-best. Beyond being fast and frugal, a
major advantage of the trees is their transparency, making them
appealing to practitioners. Simplicity, accuracy, and transparency
thus combine, rather than trade off, to make fast and frugal trees a
potent part of the adaptive toolbox.
15
How Estimation Can Benefit From
an Imbalanced World
Ralph Hertwig
Ulrich Hoffrage
Rüdiger Sparr
Both organism and environment will have to be seen

as systems, each with properties of its own, yet both
hewn from basically the same block.
Egon Brunswik
M uch of the world is in a state of predictable imbalance. This is

a notion that is commonly attributed to the Italian economist
Vilfredo Pareto, who was a professor of political economy at the
University of Lausanne in Switzerland in the 1890s. He first intro-
duced what is now known as the Pareto law of income distribu-
tion in his Cours d’Économie Politique (Pareto, 1897) where he
described the finding that income and wealth distributions exhibit
a common and specific pattern of imbalance across times and
countries. In qualitative terms, the predictable imbalance in income
and wealth distributions is that a relatively small share of the
population holds a relatively large share of the wealth.
For an illustration, let us turn to the exclusive circle of the global
rich. Each year, Forbes magazine publishes its famous annual rank-
ing of the wealthiest people around the globe. The 2008 listing
included a total of 1,125 billionaires, among them not only the
“usual suspects” such as Bill Gates and Warren Buffett, but also
newcomers such as Mark Zuckerberg, founder of the social net-
working site Facebook, and at age 23 years possibly the youngest
self-made billionaire ever (Kroll, 2008). Even in this highly selec-
tive group of the world’s super-rich, the distribution of wealth is
highly unbalanced. One measure of this imbalance is the share of
the collective net worth of these wealthiest people that goes to the
top 1% of them. In 2008, the 11 richest billionaires’ collective for-
tune amounted to as much as that of the 357 “poorest” billionaires.
379
One consequence of this predictable imbalance is that if somebody

were to estimate the net worth of a billionaire, say, Donald Trump,
a good starting point would be to assume that the fortune in ques-
tion is modest. Why? Because most billion-dollar fortunes in this
skewed world of incomes and wealth are small.
The goal of this chapter is to analyze how valuable the assump-
tion of systematic environment imbalance is for performing rough
estimates. By such estimates, we mean the routine assessment of
quantities (e.g., frequencies, sizes, amounts) in which people regu-
larly engage when they infer the quantitative value of an object
(such as its frequency, size, value, or quality). To this end, we first
outline how systematic environment imbalance can be described
using the framework of power laws. Then, we investigate to what
extent power-law characteristics as well as other statistical proper-
ties of real-world environments can be allies of simple heuristics in
performing rough-and-ready estimates, thereby leading to ecologi-
cal rationality. For this purpose we will introduce two heuristics:
The first, QuickEst, uses simple building blocks for ordered cue
search and stopping and is particularly suited for skewed environ-
ments. The second, the mapping model or mapping heuristic, is
built on the simplifying decision mechanism of tallying and can be
applied to a broader range of distributions.
The Ubiquity of Power-Law Regularities
The Pareto law belongs to the family of power laws. A power-law

distribution of the sizes of objects (on some dimension) implies a
specific relationship between the rank of an object and its size.
Let us illustrate this relationship with a graph (adopting Levy &
Solomon’s, 1997, approach to analyze power-law distribution of
wealth). Suppose one takes all the billionaires in the Forbes 2008
(Kroll, 2008) listing, ranks them by their wealth, and then plots the
billionaires’ wealth against their rankings. Figure 15-1a shows the
resulting J-shaped distribution (where the “J” is rotated clockwise
by 90 degrees), which reveals that a great many billionaires have
“small” fortunes, and only very few have resources much greater
than those small fortunes. This picture becomes even more inter-
esting if it is redrawn with logarithmic horizontal and vertical
axes. As Figure 15-1b shows, the resulting rank–size distribution
(Brakman, Garretsen, Van Marrewijk, & van den Berg, 1999) on a
log–log scale is quite close to a straight line.1 This inverse linear
1. Of course, this line is by definition downward sloping (because

the rank variable represents a transformation of the fortune variable that
entails a negative correlation between the two variables). The fact that one
(a) 70
Fortune (in Billion $) 60
50
40
30
20
10
0
1 100 200 300 400 500 600 700 800 900 1000 1100 1200
Rank Order According to Fortune
(b) 12
Fortune (in log Billion $)
11
10
9
0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
Rank Order According to Fortune (log)
Figure 15-1: The world’s 1,125 billionaires in 2008 rank ordered by

fortune. (a) Absolute data. (b) Same data but with the logarithmic
values (base 10) of ranks and fortune. The solid line corresponds
to the least-square fit and has a slope of −.78. The approximate
straight-line form implies that the distribution follows a power law
(see also Levy & Solomon, 1997). Data from the Forbes magazine
2008 survey (Kroll, 2008).
381
relationship between the log of the magnitude of a billionaire’s

fortune and the person’s logarithmic rank suggests that the wealth
distribution in the Forbes list follows a power-law distribution
(Levy & Solomon, 1997).
Perhaps the most well-known instance of a power-law distribu-
tion in the social sciences is Zipf’s law. In his book Human Behavior
and the Principle of Least Effort, George Kingsley Zipf (1949)
observed that rank–size distributions in domains as diverse as city
sizes and word frequencies can be described by a straight line in a
log–log plot, whose slope q equals −1. In the context of city sizes,
this slope means that the population of a city is inversely pro-
portional to its rank: Consequently, the second-ranked city in a
country has half the population of the biggest city, the third-ranked
city one-third that population, and so on. The rank–city size
distributions for cities within one country appear to fit Zipf’s law
remarkably well.2 In terms of a probability distribution, this means
that the probability that the size of a city (or any other object) is
greater than some S is proportional to 1/S: P(Size > S) ∝ Sq, with
q ≈ −1 (Gabaix, 1999).
Power-law distributions occur in an extraordinarily diverse
range of domains, for instance, the sizes of earthquakes, firms,
meteorites hitting the earth, moon craters, solar flares, and com-
puter files; the intensity of wars; the frequency of use of words in
any human language or of occurrence of personal names in most
cultures; the numbers of papers that scientists write, of citations
received by papers, of hits received by websites, of telephone calls
made; the sales of books and music recordings; the number of species
observes a straight line, however, is not trivial because there is no tautol-

ogy causing the data to automatically follow a straight line. As Newman
(2005) pointed out, few real-world distributions follow a power law over
their entire range. This is particularly true for smaller values of the variable
being measured or for very large values. In the distribution of city sizes, for
instance, the political capitals, say Paris or London, are much larger than the
line drawn through the respective distribution of cities would lead one to
expect—they are “essentially different creatures from the rest of the urban
sample” (Krugman, 1996). In Figure 15-1b, the 30 richest billionaires’
wealth deviates from the fitted straight line: Their wealth is less large than
theoretically expected.
2. Zipf’s law and the Pareto distribution differ in several respects (see
Newman, 2005). Pareto was interested in the distribution of income and
asked how many people have an income greater than x. The Pareto law is
given in terms of the cumulative distribution function; that is, the number of
events larger than x is an inverse power of x: P(X > x) ∝ x−k. In contrast, Zipf’s
law usually refers to the size y of an occurrence of an event (e.g., the size of a
city or the frequency of use of a word). Another difference is the way the dis-
tributions were plotted: Whereas Zipf made his plots with rank on the hori-
zontal axis and size on the vertical axis, Pareto did it the other way round.
HOW ESTIMATION CAN BENEFIT FROM AN IMBALANCED WORLD 383
in biological taxa; and the likelihood that a record in memory will be

needed (see Bak, 1997; Buchanan, 1997; Krugman, 1996; Lehman,
Jackson, & Lautrup, 2006; Newman, 2005; Schroeder, 1991).
Although Pareto’s notion of “predictable imbalance” originally
referred to income distributions, we use it here to describe the phe-
nomenon of pronounced environmental skewness that is character-
istic of power-law distributions: Few objects take on very large
values (e.g., frequency, intensity, size) and most take on medium to
small values. In high-energy physics, for instance, about half of all
papers receive two or fewer citations, and the top 4.3% of papers
produces 50% of all citations, whereas the bottom 50% of papers
yields just 2.1% of all citations (Lehman et al., 2006). Income
inequality is not just a phenomenon found in the exclusive circle of
billionaires but also among street gangs. In one analysis of a Chicago
street gang, the Black Disciples, the top 120 men—representing just
2.2% of the gang membership—took home well more than half the
money the gang accrued (Levitt & Dubner, 2005, p. 103). Environment
imbalance is also ubiquitous in consumer markets. Take, for exam-
ple, the success of Hollywood movies measured in terms of their
box office gross. According to Anderson (2006), an estimated 13,000
feature films are shown in film festivals each year in the United
States alone. They can be arranged into three groups. The first
includes the 100 movies with the highest revenue, the ones that
knocked out audiences. The second group of movies, those of rank
101 to 500, make low but not quite zero revenues, and the sorry
remainder, rank 501 to 13,000, have no box office gross (mostly
because they did not even garner mainstream commercial distribu-
tion). Anderson referred to such a distribution as “the Long Tail”
(adapting the notion of long-tailed distributions from statistics),
and he saw them everywhere in markets.
The question that concerns us here is this: Given that predictable
imbalance is such a ubiquitous environmental structure, could it
be that particular human cognitive strategies have evolved or been
learned to exploit it?
QuickEst: A Fast and Frugal Estimation Heuristic in a World Full

of Power-Law Regularities
Enrico Fermi, the world-renowned physicist and one of the leaders

of the team of physicists on the Manhattan Project that eventually
led to the development of the atomic bomb, had a talent for
quick but reliable estimates of quantities. Legend has it that in the
Alamogordo Desert in the state of New Mexico, while banks of
spectrograph and ionization chambers waited to be triggered into
action to assimilate the complex signals of the first atomic explosion,
Fermi was awaiting the same detonation from a few thousand yards
away. As he sheltered behind a low blast-wall, he tore up sheets of
paper into little pieces, which he tossed into the air when he saw
the flash. After the shock wave passed, he paced off the distance
traveled by the paper shreds, performed a quick back-of-the-enve-
lope calculation, and arrived at an approximately accurate figure
for the explosive yield of the bomb (Logan, 1996). For Fermi, one of
the most important skills a physicist ought to have is the ability
to quickly derive estimates of diverse quantities. He was so con-
vinced of its importance that he used to challenge his students with
problems requiring such estimates—the fabled canonical Fermi
problem was the question: “How many piano tuners are there in
Chicago?”
Being able to make a rough estimate quickly is important not
only for solving odd Fermi problems. There is ample opportunity
and need for people to rely on quick and easy estimates while
navigating through daily life (e.g., how long will it take to get
through this checkout line?). How do people arrive at quick quanti-
tative estimates? For instance, how do they swiftly estimate the
population size of Chicago—a likely first step toward an estimate of
the number of piano tuners in Chicago? Previously, we have argued
that cognitive estimation strategies, specifically, the QuickEst heu-
ristic, may have evolved to exploit the predictable imbalance of
real-world domains so as to reduce the computational effort and
informational demands needed to come up with competitively
accurate estimates (Hertwig, Hoffrage, & Martignon, 1999). In this
chapter, we analyze the ecological rationality of this heuristic in
more precise terms: First, we quantify the degree of imbalance
across a total of 20 real-world domains using the parameter q, the
slope of the straight line fitting the log–log rank–size distribution.
Second, we analyze to what extent this degree of imbalance and
other statistical properties of those environments hinder or foster
the accuracy of the QuickEst heuristic. Before we turn to this analy-
sis, we describe QuickEst in more detail.
The QuickEst heuristic is a model of quantitative inferences
from memory (Gigerenzer & Goldstein, 1996; Gigerenzer, Hoffrage,
& Goldstein, 2008), that is, inferences based on cue information
retrieved from memory. It estimates quantities, such as the size
of Chicago or the number of medals that Russia won at the most
recent Olympic summer games. In general, it estimates the value of
an item a, an element of a set of N alternatives (e.g., objects, people,
events), on a quantitative criterion dimension (e.g., size, age,
frequency). The heuristic’s estimates are based on M binary cues
(1, 2, . . ., i, . . ., M), where the cue values are coded such that 0
and 1 tend to indicate lower and higher criterion values, respec-
tively. As an illustration, consider the reasoning of a job candidate
who is subjected to a brainteaser interview by a company recruiter.
One task in the interview is to quickly estimate the net worth

of, say, Donald Trump. To infer an answer the candidate may rely
on cues such as: “Did the person make the fortune in the computer
industry?”
To operate, QuickEst needs a set of cues put into an appropriate
order. This order is based on the following measure: For any binary
cue i, one can calculate the average size si– of those objects that do
not have the property that cue i represents. For instance, one can
calculate the average net worth of all billionaires who are not
entrepreneurs in the computer industry. The QuickEst heuristic
assumes that cues are ranked according to the sizes of the values s–,
with the smallest value first.
In addition to the search rule, QuickEst also includes stopping
and decision rules. The complete steps that the heuristic takes to
estimate the criterion for object a are as follows:
Step 1: Search rule. Search through cues in the order of the

sizes of the value s–, starting with the smallest value.
Step 2: Stopping rule. If the object a has the value 0 on the
current cue (indicating a low value on the criterion),
stop searching and proceed to step 3. Otherwise (if the
object has cue value 1 or the value is unknown), go
back to step 1 and look up the cue with the next small-
est si–. If no cue is left, put the object into the catchall
category.3
Step 3: Decision rule. Estimate the size of the object as the si–
of the cue i that stopped search, or of the size of the
catchall category (see Hertwig et al., 1999, p. 225).
Estimates are finally rounded to the nearest spontane-
ous number.4
QuickEst’s structure maps onto the predictable imbalance of

many real-world J-shaped environments (as in Figure 15-1). First,
its asymmetric stopping rule—stop when a cue value of zero is
found for the object—limits search most strongly in environments
in which zero (or absent) cue values are plentiful (cf. chapter 10).
Second, by also first looking up the “small” cues—those cues
3. When the heuristic is initially set up, only as many cues (of all those
available) will be used in the cue order as are necessary to estimate the cri-
terion of four-fifths of the objects in the training set. The remaining one-fifth
of the objects will be put in the catchall category.
4. By building in spontaneous numbers, the heuristic models the obser-
vation that when asked for quantitative estimates (e.g., the number of wind-
mills in Germany), people provide relatively coarse-grained estimates (e.g.,
30,000, i.e., 3 × 104, rather than 27,634). Albers (2001) defined spontaneous
numbers as numbers of the form a × 10i, where a ∈ {1, 1.5, 2, 3, 5, 7} and
i is a natural number.
i whose absence is associated with small criterion values s–—

QuickEst has an in-built bias to estimate any given object as rela-
tively small. This is appropriate in the many J-shaped environments
in which most objects have small values on the criterion, and only
a few objects have (very) large values. Finally, QuickEst’s cue order
also enables it to estimate small objects (with predominantly zero
values on the cues) by looking up only one or a few (known) cues
before providing an estimate—making it fast and frugal.
How Accurate Is QuickEst?
Can such a simple and fast estimation strategy nonetheless arrive

at competitively accurate inferences? We compared QuickEst to
two other estimation strategies, namely, multiple regression and
an estimation tree that we designed (see Hertwig et al., 1999, for
a detailed description of the estimation tree). Briefly characterized,
multiple regression is a computationally powerful competitor
insofar as it calculates weights that minimize least-squares error,
and consequently it reflects the correlations between cues and
criterion and the covariance between cues. The estimation tree
arrives at estimates by collapsing objects, say cities, with the same
cue profile (i.e., the same cue value on each of the available cues)
into one class (for more on tree-based procedures, see Breiman,
Friedman, Olshen, & Stone, 1993). The estimated size for each city
equals the average size of all cities in that class (the estimate for a
city with a unique cue profile is just its actual size). When the tree
encounters a new, previously unseen city whose cue profile matches
that of one or more previously seen cities, its estimated size is the
average size of those cities. If a new city has an entirely new
cue profile, then this profile is matched to the profile most similar
to it. The estimation tree is an exemplar-based model that keeps
track of all exemplars presented during learning as well as their cue
values and sizes. As long as the test set and training set are identi-
cal, this algorithm is optimal. Yet, when the training set is large,
it requires vast memory resources (for the pros and cons of exem-
plar-based models, see Nosofsky, Palmeri, & McKinley, 1994).
All three strategies were tested in the environment of 82 German
cities with more than 100,000 residents (excluding Berlin). The task
was to predict the cities’ number of residents. This demographic
target criterion follows a power law, thus exhibiting the property
of predictable imbalance (remember that city size distributions
were one of the classic domains in which Zipf, 1949, observed his
law). To examine the strategies’ robustness, that is, their ability
to predict new data (here, cities), Hertwig et al. (1999) distinguished
between two sets of objects: the training set and the test set. The
strategies learned their parameters (e.g., si– or beta weights) on
the basis of the training set. The test set, in turn, provided the
test bed for the strategies’ robustness. The training samples con-
sisted of 10%, 20%, . . ., 90%, and 100% of the 82 cities, comprising
their population sizes and their values on eight cues indicative of
population size. The test set encompassed the complete environ-
ment of 82 cities. That is, the test set included all cities in the
respective training set, thereby providing an even harder test for
QuickEst, because parameter-fitting models like multiple regres-
sion are likely to do relatively better when tested on objects they
were fitted to.
In the environment of German cities, QuickEst, on average, con-
sidered only 2.3 cues per estimate as opposed to 7.3 cues used by
multiple regression and 7.1 (out of 8) used by the estimation tree.
Despite relying on only about a third of the cues used by the other
strategies, QuickEst nonetheless exceeded the performance of
multiple regression and the estimation tree when the strategies
had to rely on quite limited knowledge, with training sets ranging
between 10% and 40%. The 10% training set exemplified the
most pronounced scarcity of information. Faced with such dire
conditions, QuickEst’s estimates in the test set were off by an aver-
age of about 132,000 inhabitants, about half the size of the average
German city in the constructed environment. Multiple regression
and the estimation tree, in contrast, erred on average by about
303,000 and 153,000 inhabitants, respectively.
When 50% or more of the cities were first learned by the strate-
gies, multiple regression began to outperform QuickEst. The edge in
performance, however, was small. To illustrate, when all cities were
known, the estimation errors of multiple regression and QuickEst
were 93,000 and 103,000 respectively, whereas the estimation tree
did considerably better (65,000).5 Based on these results, Hertwig
et al. (1999) concluded that QuickEst is a psychologically plausible
estimation heuristic, achieving a high level of performance under
the realistic circumstances of limited learning and cue use.
How Robust Is QuickEst’s Performance Across Diverse Environments?
Although QuickEst competitively predicted demographic quantities,

we did not know how well its competitiveness would generalize
to other environments—in particular, to environments that exhibit
5. In fact, when the training set (100%) equals the generalization set, the
estimation tree achieves the optimal performance. Specifically, the optimal
solution is to memorize all cue profiles and collapse cities with the same
profile into the same size category. In statistics, this optimal solution is
known as true regression. Under the circumstances of complete knowl-
edge, the estimation tree is tantamount to true regression.
different degrees of predictable imbalance. Our first goal in this

chapter is to investigate this issue. To this end, we test QuickEst,
multiple regression, and the estimation tree with a collection of
20 different real-world environments. As previously, we take from
each environment increasingly larger portions from which the
strategies can learn. This emphasis on learning reflects the typical
situation of human decision making, an issue to which we return
shortly. Again, the training sets consist of 10%, 20%, . . ., 90%,
and 100% of each environment. To arrive at psychologically plau-
sible sets of limited object knowledge, we also assume that the
probability that an object belongs to the training set is proportional
to its size (thus capturing the fact that people are more likely to
know about larger objects than smaller ones). The predictive accu-
racy of the strategies is tested on the complete environment (i.e.,
the test set; as in Hertwig et al., 1999, the training set is a subset of
the test set). To obtain reliable results, 1,000 random samples are
drawn for 9 of the 10 sizes of the training set (in the 100% set, train-
ing set equals test set, and thus sampling error is of no concern).
For the environments, we make use of the collection of real-
world data sets that Czerlinski, Gigerenzer, and Goldstein (1999)
compiled to test the performance of fast and frugal choice strate-
gies. This collection includes such disparate domains as the num-
ber of car accidents on a stretch of highway, the homelessness
rate in U.S. cities, and the dropout rates of Chicago public high
schools. The environments ranged in size from 11 objects (ozone
levels in San Francisco measured on 11 occasions) to 395 objects
(fertility of 395 fish), and included 3 to 18 cues. All cues were
binary or were made binary by dichotomizing them at the median.
One particularly attractive aspect of this collection of environments
is that Czerlinski et al. did not select them to match any specific
distribution of the criterion, with many of these environments
taken from textbook examples of the application of multiple regres-
sion. On average, these environments were not as skewed as, for
instance, the myriad real-world environments from which Zipf
(1949) derived his eponymous law. The median q in this set of envi-
ronments is −0.54, and thus substantially smaller in magnitude
than the q ≈ −1 that Zipf observed (see also Newman, 2005, who
found a median exponent of −2.25 in his broad set of distributions
of quantities measured in physical, biological, technological, and
social systems).
How Frugal Are the Strategies?

QuickEst is designed to make estimates quickly, using few cues. This
ability became manifest in the present simulations. Figure 15-2
shows the number of cues that QuickEst considered as a function
Number of Cues Looked Up 7
5
Estimation Tree
4 Multiple Regression
QuickEst
3
0
10 20 30 40 50 60 70 80 90 100
Figure 15-2: The frugality of the strategies as a function of size of

training set, averaged across 20 environments. Frugality is mea-
sured as the number of cue values looked up to make an estimate.
of the size of the training set. Across all environments, 7.7 cues,
on average, are available. QuickEst considers, on average, only
two cues (i.e., 26%) per estimate—a figure that remains relatively
stable across various sizes of training set size. In contrast, multiple
regression (which here uses only those cues whose beta weights
are significantly different from zero) and the estimation tree use
more and more cues with increasing training sets. Across all train-
ing set sizes, they use an average of 5.1 (67%) and 5.9 (77%) of all
available cues, respectively.
How Robust Are the Strategies?

What price does QuickEst pay for betting on J-shaped environment
structures, and for considering substantially fewer cues than its
competitor strategies? The first benchmark we use to answer this
question is robustness. Robustness describes the strategies’ ability
to generalize from small training sets to the test set. We first calcu-
late the strategies’ absolute errors (i.e., absolute deviation between
actual and estimated size) separately for each environment and
training set. Then, we define each strategy’s performance in the
100% training set as the strategy’s maximum performance and
express the absolute errors observed in all other training sets as a
percentage of this maximum-performance benchmark (e.g., if a
strategy makes errors of 60,000 with the 100% training set and
90,000 with the 40% training set, then for the latter it would have
a normalized error of 150%). Finally, we average these normal-

ized estimation errors (which must by definition be above 100%)
across all environments, separately for each strategy and each
training set size.
Based on this mean, we can define robustness as the resistance
to relative decline in performance as training sets become smaller.
Figure 15-3 shows the normalized estimation error (averaged
across the 20 environments). QuickEst proves to be a robust
strategy. When only 40% of the environments’ objects are learned,
QuickEst still performs about as well as when all objects are
known. Moreover, when QuickEst is required to rely on a very thin
slice of the environments, as exemplified by the 10% training set,
its error is only about 1.5 times the magnitude of its maximum-
performance error. Multiple regression and the estimation tree, in
contrast, are less robust. When 50% of the objects are known, for
example, their respective errors are about 1.5 and 3 times the mag-
nitude of their maximum-performance error. Their relative lack
of robustness becomes most pronounced under extreme scarcity
of information. In the 10% training set, their error is more than
2 times (multiple regression) and 6 times (estimation tree) the size
of their maximum-performance errors.
In generalizing to unknown territory, QuickEst thus suffers less
than do some computationally and informationally more expensive
700
Error (%; Standardized Within Strategy)
Estimation Tree
QuickEst
500
400
300
200
100
0
10 20 30 40 50 60 70 80 90 100
Figure 15-3: The estimation error (standardized within each strat-

egy) as a function of size of training set, averaged across 20 environ-
ments. For each strategy, we standardized its accuracy by expressing
its error per training set relative to its estimation error made in the
100% training set (i.e., the error of each strategy under complete
knowledge was assumed to be 100%).
strategies. The ability to generalize to new data appears to be a

key property of efficient human decision making. In most real-
world environments people cannot help but act on the basis of
scarce knowledge.6 In fact, scarcity of knowledge is a crucial human
condition, as is suggested by, for instance, Landauer’s (1986) analy-
sis of how much information is accumulated in a single human’s
memory over the course of a normal lifetime. Basing his calcula-
tions on various bold assumptions (e.g., about the rate at which
people can take in information), he estimated that the “functional
learned memory content” is “around a billion bits for a mature
person” (p. 491). In comparison, an institutional memory of human
knowledge such as the Library of Congress with 17 million books
is estimated to contain about 136 terabytes—about 1,088 trillion
bits, more than one million times the estimated magnitude of
human memory (Lyman & Varian, 2003). Although Landauer’s
figure is an audacious (if scientifically informed) estimate, it sup-
ports the notion that most of human decision making occurs under
conditions of scarcity of information and knowledge. Upon these
terms, frugality and robustness appear to be key properties of com-
petitive cognitive strategies.
How Accurate Are the Strategies?

Although the previous analysis demonstrates QuickEst’s robust-
ness, measured in terms of how little its performance deteriorates
with smaller and smaller training sets, it says nothing about the
heuristic’s accuracy relative to its competitors. In fact, if we equate
needing less information with involving less effort, the well-known
effort–accuracy tradeoff (Payne, Bettman, & Johnson, 1993) would
predict that this decreased effort goes along with decreased accu-
racy. So does QuickEst’s robustness come at the price of lower
accuracy compared to its more effortful competitors? To test for
this possibility, we next compare QuickEst’s estimation accuracy
with that of its rivals. To this end, we now treat QuickEst’s
maximum performance (with the 100% training set) as the bench-
mark and express its own performance and that of its competitors
relative to this benchmark set at 100%. Figure 15-4 shows the
6. There are different definitions of scarcity of information. In the

present analysis, we define scarcity in terms of the number of objects
on which a strategy is trained compared to the total number of objects
in an environment (on which the strategy can be tested). Martignon
and Hoffrage (1999, 2002) defined information scarcity in terms of
the ratio of the number of binary cues to the number of objects in an
environment.
200
Estimation Tree
Error (%; Standardized With
QuickEst
150
Respect to QuickEst)
125
100
75
50
25
0
10 20 30 40 50 60 70 80 90 100
Figure 15-4: The estimation error (standardized with respect

to QuickEst’s performance) as a function of size of training set,
measured across 20 environments. For each strategy, we standard-
ized its accuracy by expressing its error per training set relative
to QuickEst’s estimation error made in the 100% training set
(i.e., QuickEst’s error under complete knowledge was assigned to
be 100%).
strategies’ relative estimation error as a function of the training set

size (the line for QuickEst being the same as in Figure 15-3).
Several results are noteworthy: QuickEst’s performance under
scarcity of knowledge is not inferior to that of its competitors. On
the contrary, it is here that QuickEst outperforms the other strate-
gies. In the 10% training set, for instance, QuickEst’s error amounts
to 1.45 times the size of the error it produced with the 100%
training set. In contrast, errors with multiple regression and the
estimation tree in the 10% training set are 1.6 and 1.7 times higher
than for the 100% training set, respectively. Moreover, as long as the
training set encompasses less than 50% of the environment,
QuickEst either outperforms its competitors or matches their per-
formance. Only when the training set is 50% and larger does
QuickEst fall behind. In fact, under the circumstances of com-
plete knowledge (100% training set), QuickEst is clearly behind
multiple regression and the estimation tree: The magnitude of
their error is about 0.7 and 0.4 times the size of QuickEst’s error,
respectively.
In sum, QuickEst outperforms multiple regression and the esti-
mation tree when knowledge is scarce. In the psychologically
less plausible situation of abundant knowledge (i.e., 50% or more of

the environments’ objects are known) QuickEst, however, clearly
falls behind the performance of its competitors. All these results
are based on the strategies’ performance averaged across 20 quite
different environments. Now, we turn to our next question: Which
statistical properties of the environments predict differences in
performance between QuickEst and the other strategies?
Which Environment Properties Determine QuickEst’s Performance?
We focus on three important properties of environments: variabil-

ity, skewness, and object-to-cue ratio (see chapter 4 for a discussion
of the first two). Variability refers to how greatly the objects in an
environment vary from the mean value of that set of data. We quan-
tify this property by calculating each environment’s coefficient of
variation (CV):
SD
CV =
mean
which is the ratio of the standard deviation (SD) of the set of object
criterion values to its mean value.
The next property, skewness, captures how asymmetric or
imbalanced a distribution is, for instance, how much of a “tail” it
has to one side or the other. Skewness can be measured in terms of
the parameter q, estimated with the following method (Levy &
Solomon, 1997): We sort and rank the objects in each environment
according to their criterion values, and fit a straight line to each
rank–size distribution (plotted on log–log axes). We then use the
slope q of this fitted regression line as an estimate of the environ-
ment’s skewness.
The final property in our analysis is the object-to-cue ratio (i.e.,
the ratio between the number of objects and number of cues in an
environment), which has been found to be important in the analy-
sis of inferential heuristics such as take-the-best (see Czerlinski
et al., 1999; Hogarth & Karelaia, 2005a). To assess the relationship
between the statistical properties of the environments and the
differences in the strategies’ performance, we first describe the
results regarding skewness for two environments in detail, before
considering all 20 environments.
Two Distinct Environments: U.S. Fuel Consumption and Oxygen in Dairy Waste
Does an environment that exhibits predictable imbalance, or
skew, such that few objects have large criterion values and most
objects take on small to medium values, foster the performance of

QuickEst? And, vice versa, does a more balanced, that is, less
skewed environment impair QuickEst’s performance? The most
imbalanced environment in our set of 20 is the oxygen environment
(q = −1.69; with a fit of the regression line of R2 = .98). Here, the
task is to predict the amount of oxygen absorbed by dairy wastes
from cues such as the oxygen required by aerobic micro-organisms
to decompose organic matter. The fuel consumption environment,
in contrast, is relatively balanced, with a q parameter that is about
eight times smaller (q = −0.2; R2 = .87). Here, the task is to predict
the average motor fuel consumption per person for each of the
48 contiguous U.S. states from cues such as state fuel tax and per
capita income. The environments’ markedly different degree of
imbalance is illustrated in Figure 15-5. The rank–size distributions
(in logarithmic scales) yield the characteristic negative-sloping
linear relationship, thus suggesting that the power law provides a
good model for both environments.
Is the difference in environmental skewness predictive of the
strategies’ performance? Figure 15-6 shows the strategies’ relative
error as a function of the training set and the two environments.
Figure 15-6a plots the results for the highly skewed oxygen envi-
ronment. QuickEst’s performance is strongly competitive: Across
all training set sizes, QuickEst consistently outperforms multiple
4
Criterion Value (log)
2
Fuel Consumption
Oxygen
1
0
0 0.5 1.0 1.5 2.0
Rank (log)
Figure 15-5: Log–log scale plot of the distribution of dairy wastes

rank ordered by their amount of oxygen absorbed (oxygen), and the
distribution of 48 U.S. states rank ordered by their average motor
fuel consumption (fuel consumption). Each plot also shows the
regression line fitted to the data.
(a) Oxygen Environment
600
Estimation Tree
QuickEst
400
300
200
100
0
10 20 30 40 50 60 70 80 90 100
(b) Fuel Consumption Environment

150
QuickEst
125 Estimation Tree
Multiple Regression
100
75
50
25
0
10 20 30 40 50 60 70 80 90 100
Figure 15-6: The strategies’ relative estimation error as a function

of size of training set in the (a) oxygen and (b) fuel consumption
environments. For each strategy, we standardized its accuracy by
expressing its error per training set relative to QuickEst’s estimation
error made in the 100% training set (i.e., QuickEst’s error under
complete knowledge was assumed to be 100%).
395
regression. In addition, the estimation tree can only outperform

QuickEst (and by a small margin) when it learns about 70% or
more of the objects in the environment. Finally, under the psycho-
logically unlikely circumstance of complete knowledge (100%
training set), QuickEst’s performance is only six percentage points
below the estimation tree’s performance. The picture looks strik-
ingly different in the far less imbalanced fuel consumption envi-
ronment (Figure 15-6b). Except for the 10% training set, multiple
regression and the estimation tree consistently outperform QuickEst.
This contrast between the two environments suggests that
QuickEst’s performance, relative to that of its competitors, hinges
on environmental skewness. We shall now see to what extent this
observation generalizes across all environments.
Can Environmental Skewness and Variability Explain QuickEst’s

Failures and Successes?
The environmental parameter q is a measure of the amount of
skewness in the criterion distribution: The smaller q is, the flatter
the distribution, and vice versa. In our set of 20 environments,
skewness varies widely, ranging from −0.02 to −1.69, with a median
of −0.54. Does greater skewness in the criterion distribution con-
tribute to better QuickEst performance, relative to its competitors?
Figure 15-7 shows that QuickEst’s performance indeed depends
on the environments’ skewness: Its advantage over multiple
regression (measured in terms of QuickEst’s relative error minus
multiple regression’s relative error) is most pronounced in environ-
ments with large (negative) q. Relatedly, multiple regression tends
to outperform QuickEst in environments with small q. The cor-
relation between the difference in the strategies’ errors and the
magnitude of q is .86. For illustration, the largest magnitudes of
q and hence greatest skewness occur in the oxygen (q = −1.69), bio-
diversity (q = −1.6), and mammals’ sleep environments (q = −1.14).
It is in these environments that the largest advantage of QuickEst
over multiple regression can also be observed. In contrast, the
largest advantages of multiple regression over QuickEst coincide
with q values that are an order of magnitude smaller than those
observed in the most skewed environments (obesity environment:
q = −0.08; body fat environment: q = −0.02). This pattern also gen-
eralizes to the comparison of QuickEst and the estimation tree (not
shown): Here, the correlation between the difference in the strate-
gies’ relative errors and q amounts to .8.
Environmental skewness implies variability in the criterion dis-
tribution, but variability does not necessarily imply skewness.
Therefore, variability, independent of skewness, may be predictive
of QuickEst’s performance. In our set of environments, the coefficient
75 Body Fat
(QuickEst – Multiple Regression)

Difference in Performance Obesity
Oxidant
Fuel Consumption
Homelessness
0
House Price
Rainfall
Biodiversity
Mammals’ Sleep
Oxygen
−75
−1.8 −1.6 −1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0
Environmental Skewness (q)
Figure 15-7: QuickEst’s performance relative to multiple regres-

sion (in terms of the relative estimation error for QuickEst minus
that for multiple regression; see Figure 15-4), plotted against skew-
ness parameter q (the slope of the straight line fitted into the rank-
size distributions of the current collection of environments) for the
20 environments. Negative values on the y-axis indicate an advan-
tage of QuickEst over multiple regression; positive values indicate
a disadvantage.
of variation varies widely, ranging from the oxygen environment, in

which the standard deviation is twice as large as the mean (CV = 2),
to the body fat environment, in which the standard deviation is a
tiny fraction of the mean (CV = 0.019). We found that QuickEst has
a clear advantage over multiple regression in environments with
high variance (with advantage again measured in terms of the dif-
ference between QuickEst’s relative error and that of multiple
regression). Across all environments, the correlation between the
difference in the two strategies’ relative errors and the CV is .87
(for the comparison with the estimation tree the correlation amounts
to .8). In the current collection of environments, however, CV does
not explain more regarding QuickEst’s performance than does envi-
ronmental skewness. This is not too surprising given that across
environments, the Pearson correlation between parameter q and
the coefficient of variation is −.96.
Is the Ratio of Objects to Cues Indicative of QuickEst’s Performance?

When multiple regression is used as a strategy to model choice
between two objects, it typically estimates first the criterion value (e.g.,
salary) separately for each object and then compares the objects.
Thus used, estimation is a precursor to choices. In the context of

choices, in turn, it has been shown that multiple regression can be
outperformed by simpler strategies (with unit weights) when the
ratio between objects and cues becomes too small (Dawes, 1979;
Einhorn & Hogarth, 1975; Schmidt, 1971; see also chapter 3). A
statistician’s rule of thumb is that unit weights will outperform
regression weights if the latter are based on fewer than 10 objects
per cue. The reason is that multiple regression is likely to grossly
overfit the data when there are too few objects for the number of
cues (see also Czerlinski et al., 1999).
Is the object-to-cue ratio also indicative of performance in the
present context in which the task is to estimate the quantitative
value of an individual object? Across the 20 environments, there is
no substantial correlation (.08) between the object-to-cue ratio
and the difference in relative errors between multiple regression
and QuickEst. The correlation, however, increases (to .42) if one
excludes the fish fertility environment, in which the object-to-cue
ratio is extreme with 395 objects and three cues. This higher
correlation suggests that QuickEst (like unit-weight decision heu-
ristics) tends to have an advantage over multiple regression when
there are fewer objects per cue.7 Yet, compared with the impact
of skewness and variance, the object-to-cue ratio is a mediocre pre-
dictor of QuickEst’s performance.
In sum, we examined several properties of ecological structures
and found one that proved outstanding in its ability to predict
QuickEst’s performance (see also von Helversen & Rieskamp, 2008):
The more skewed (and in the set we evaluated, the more variable) an
environment, the better QuickEst performs in relation to its competi-
tors. The correlation between the skewness q and the performance of
QuickEst relative to that of multiple regression was .86; the correla-
tion for QuickEst relative to the estimation tree was .8.
How Can People Tell When to Use QuickEst?
A heuristic is not good or bad, not rational or irrational, in itself,

but only relative to an environment. Heuristics can exploit regu-
larities in the world, yielding ecological rationality. QuickEst
wagers that the criterion dimension is distributed such that few
objects are very large, and most objects are relatively small (Hertwig
et al., 1999). If QuickEst’s wager on the environment structure
matches the actual structure of the environment, it can perform
7. The number of objects per cue is a poor predictor of QuickEst’s perfor-

mance in relation to that of the estimation tree (regardless of whether the fish
fertility environment is included in the analysis).
well. If QuickEst mismatches the environment structure, it will

have to foot the bill for its bet.
Looking at the characteristics of particular environments in
which the different estimation strategies excel, we found that
QuickEst outperforms—even under conditions of abundant knowl-
edge—multiple regression and estimation trees in environments
with pronounced skewness and variability: The more skewed and
variable the criterion value distribution in an environment, the
better QuickEst’s performance was relative to its competitors.
Given their fit to particular environment structures, using fast
and frugal heuristics successfully means using them in the proper
domains. But how can people tell what is a proper domain for a
particular strategy, and what is improper? We suggest that the task
of strategy selection may not be as taxing as it is often conceived.
Let us distinguish between two kinds of “proper” environments.
One is the class of environments in which people can muster
little to medium knowledge. As the current simulations and those
involving other fast and frugal strategies (Gigerenzer, Czerlinski, &
Martignon, 1999) have shown time and again, the more limited
the knowledge about an environment is, the more competitive
simple strategies are. Their simplicity renders the heuristics robust
and successful relative to more complex information-demanding
strategies—even if the heuristics’ match to the environment is not
perfect.
A second class of “proper” environments is one in which users
of, for instance, QuickEst can intuit that the structure of the envi-
ronment maps onto the structure of the heuristic. To be able to do
so, however, does not mean that people need to fit a power-law
model to their knowledge, thus estimating the skewness of the
environment. There are simple shortcuts instead that can gauge
skewness. For instance, in environments with a very pronounced
level of predictable imbalance, most objects one knows will have
criterion values below the average (see the example of above-average
drivers in chapter 4). Thus we propose that a mean value that sub-
stantially exceeds the median value may trigger the use of QuickEst.
For instance, if a decision maker applied QuickEst in only those
environments in which the mean value is, say, at least 50% greater
than the magnitude of the median value, then in the current collec-
tion of 20 environments (and averaged across all training sets),
QuickEst would be employed in four environments. In all of those
QuickEst outperforms multiple regression, whereas multiple regres-
sion outperforms QuickEst in 13 of the remaining 16 environments.
Thus, the ratio mean-to-median is a good proxy for the relative
performance of the two strategies. This is consistent with our previ-
ous analysis, according to which skewness and the coefficient
of variation proved to be good predictors of QuickEst’s relative
performance—the ratio of mean-to-median correlates highly with

both environmental properties (−.81 and .92, respectively).
On the basis of these two classes of “proper” environments, one
can also deduce a class of environments that is “improper” for
simple heuristics. It encompasses those environments in which
people possess much knowledge and in which the structure of the
heuristic mismatches that of the environment (e.g., for QuickEst
this would mean that there is little skew in the distribution of
criterion values). But the chance of erroneously applying a fast and
frugal strategy like QuickEst in such improper environments may
be slim, because having abundant knowledge should make it more
likely that people have a sense of the environment’s structure.
However, do people always rely on QuickEst if the environment is
skewed? And what strategies are used in environments that are
not skewed? Next, we introduce another tool of the adaptive tool-
box, the mapping heuristic (von Helversen & Rieskamp, 2008),
which can be successfully employed in environments with differ-
ent types of structure.
The Mapping Heuristic: A Tallying Approach to Estimation
Like QuickEst, the mapping heuristic is a simple strategy for making

quantitative estimations from multiple cues, and it, too, relies on
binary cue information.8 The estimation process is split into a
categorization phase and an estimation phase. First, an object is
categorized by counting all the positive cue values it has. Then, the
mapping heuristic estimates the object’s size to be the typical
(median) size of all previously seen objects in its category, that is,
with the same number of positive cues. This estimation strategy
implies that all cues are treated as being equally important. Thus,
in contrast to QuickEst, which considers cues sequentially, the
mapping heuristic takes a tallying approach. It includes all rele-
vant cues but weights each cue the same, ignoring the different
predictive values of the cues. The two heuristics represent different
approaches to simplifying the estimation process—ordered and
limited cue search (see chapter 10) versus equal-weight tallying
of all cues. How do the two approaches compare in terms of their
performance in different environments?
To test when QuickEst and the mapping heuristic perform well
and how much their performance depends on the structure of
the environment (in terms of the distribution of the criterion), von
8. We are grateful to Bettina von Helversen and Jörg Rieskamp for their
valuable input on the following sections.
Helversen and Rieskamp (2008) conducted a simulation study.

Two types of environment were used, one with a skewed criterion
(based on a power function y = bxa, with a = −1, b = 100) and one
involving a uniformly distributed criterion (based on a linear
function, y = bx + c, with b = −2 and c = 102). For each distribution,
several instances of the corresponding environments were gener-
ated, systematically varying the average correlation of the cues with
the criterion and the number of positive cue values. Each environ-
ment consisted of 50 objects and five binary cues.
In addition to evaluating QuickEst and the mapping heuristic,
the simulations also compared the estimation performance of
multiple linear regression and an exemplar-based model (Juslin,
Olsson, & Olsson, 2003) similar to the estimation tree. The accu-
racy of the models was determined by using a split-half cross-vali-
dation procedure, with each data set split 100 times in two halves.
The models were fitted to the first half, the training set, to deter-
mine the values of the models’ parameters. With these parameters
the models made predictions for the second half of the data, the
test set. The accuracy of these predictions was evaluated by deter-
mining the root mean square deviation (RMSD) between them
and the actual criterion values, averaged separately across all
skewed and uniform environments.
As expected, the more complex models, multiple linear regres-
sion and the exemplar model, achieved a better fit than the simpler
QuickEst and the mapping heuristic on the training sets in both
types of environments (Table 15-1). However, when generalizing
to predictions in the test set, both heuristics outperformed the
complex models. Von Helversen and Rieskamp found that, consis-
tent with the results of the simulations reported earlier in this chap-
ter, QuickEst predicted best in the skewed environments, whereas
Table 15-1: Average Model Accuracy (RMSD) for Different

Environment Structures (as Criterion Distributions)
Model Environment
Skewed Uniform
Training set Test set Training set Test set
M SD M SD M SD M SD
QuickEst 14.8 1.7 14.9 1.1 24.8 3.5 28.3 3.5
Mapping 14.3 3.5 15.3 1.6 21.6 5.1 25.9 6.4
Regression 14.0 2.4 16.5 1.2 20.9 4.7 27.7 6.3
Exemplar 12.0 3.5 15.8 1.7 17.5 4.9 27.2 6.2
Note. Lower values denote better performance.
the mapping heuristic predicted best when the criterion was uni-
formly distributed. In addition, the mapping heuristic performed
better than the regression model in both types of environments
and thus was less dependent on the distribution of the criterion
than QuickEst.
Which Strategy to Select From the Adaptive Toolbox?
When should people use QuickEst or the mapping heuristic? Which

heuristic people apply should depend on the characteristics of the
environment they are facing. This suggests that QuickEst should
be chosen in skewed criterion distributions and the mapping heu-
ristic should be recruited in uniform or less skewed distributions.
In addition, we would like to introduce a second environmental
structure that could influence the choice between QuickEst and
the mapping heuristic: the dispersion of the cues. For inference
strategies, it has been shown that a lexicographic heuristic like
take-the-best, for instance, performs especially well when the cues
have diverse validities and when the intercorrelations between
the cues are high. In contrast, in situations with equally valid cues
and low intercorrelation, a tallying heuristic that integrates the
information of all available cues performs well (Dieckmann &
Rieskamp, 2007; Hogarth & Karelaia, 2007; Martignon & Hoffrage,
2002; see also chapters 3, 8, and 13). Analogously, the cognitive
processes that take place when people make estimations may
depend on environmental features similar to those used in the
selection of take-the-best or tallying. Thus, QuickEst could be par-
ticularly suited for skewed distributions with highly dispersed cue
validities, whereas the mapping heuristic might be most suited
when the cues have similar validities.
Do People Use Heuristics for Estimation?
Given these predictions about when each estimation strategy

should be used to achieve ecological rationality, we can next ask
whether people actually do use QuickEst and the mapping heuris-
tic in particular appropriate environments. First, three recent
experiments have looked at how well QuickEst describes the mem-
ory-based estimates that people make (as opposed to inferences
from givens9). Woike, Hertwig, and Hoffrage (2009) asked people to
9. Inferences from givens (i.e., using displayed information) are an unsuit-

able test-bed for memory-based heuristics like QuickEst. Inferences from
estimate the population sizes of all 54 countries in Africa and, in

addition, probed their knowledge of numerous cues and cue values
indicative of population size (e.g., membership in the Organization
of the Petroleum Exporting Countries, location in the Sahel zone,
etc.). People’s actual estimates of the countries’ population sizes
were then compared to predictions from three distinct strategies,
made using each individual’s often very limited cue knowledge.
The strategies were QuickEst, multiple regression, and Probex, an
exemplar-based strategy that has been found to successfully model
people’s estimates of quantities such as city sizes (Juslin & Persson,
2002). The psychological models, QuickEst and Probex, both pre-
dicted people’s estimates better than the statistical model, multiple
regression. More specifically, QuickEst better predicted actual
estimates of about three-fourths of the participants, whereas Probex
proved to be the better model for the remaining quarter. In their
second study using the same methodology, Woike et al. (2009)
asked participants to estimate either African countries’ population
size (a J-shaped distribution) or their respective rate of illiteracy
(a uniform distribution). In addition, participants indicated their
knowledge of six cues related to either population size or illiteracy
rate. As expected, QuickEst fared better than Probex in capturing
people’s estimates in the J-shaped environment, whereas Probex
scored better in the uniform environment.
In another experiment asking participants to estimate city popu-
lation sizes, Hausmann, Läge, Pohl, and Bröder (2007, Experiment
1) found no correlation between how long people took to arrive at
an estimate of the size of a city and its estimated size. They took
this to be evidence against the use of QuickEst, which they con-
jectured would predict a positive correlation because the heuris-
tic’s cue search should stop earlier for smaller than for larger cities.
The correlation between size of cities and response time, however,
is likely to be moderated by at least one factor, the retrieval speed
of cue values. In fact, using a set of 20 German cities, Gaissmaier
(2008) analyzed the retrieval speed of cue values as a function of
city size. He found that the larger a city, the faster the retrieval of
its cue values (regardless of whether the cues indicated absence or
presence of a property), and that it takes longer to retrieve the
absence of a property (e.g., has no airport) for a small city than to
givens do not invoke the costs associated with search in memory—

including cognitive effort, time, and opportunity costs—which are likely to
be key triggers for the use of QuickEst and other heuristics (e.g., Bröder &
Schiffer, 2003b; see also chapter 9). Hausmann and colleagues (2007;
Experiment 2) and von Helversen and Rieskamp (2008) tested QuickEst in
the unsuitable context of inferences from givens.
retrieve the presence of a property (e.g., has an airport) for a large

city. These links between retrieval speed of cue values and size of
objects can be understood within Anderson’s ACT-R framework
(Adaptive Control of Thought–Rational—see Anderson & Lebiere,
1998; Hertwig, Herzog, Schooler, & Reimer, 2008; see also chapter
6). Based on these observations, one can predict that the time one
saves from the heuristic’s frugality for small cities may be consumed
by the longer retrieval times of small cities’ cue values, relative to
those for large cities. Counterintuitively—but consistent with the
data of Hausmann et al.—QuickEst may therefore take equally long
to arrive at estimates for small and large cities.
Two other experiments looked at how well the mapping heuris-
tic predicted people’s estimates (von Helversen & Rieskamp, 2008).
These experiments involved inferences from givens rather than
from memory, and participants used the given cues to make esti-
mates in a task with either a skewed or a uniform criterion distribu-
tion. The mapping heuristic’s prediction ability was then compared
with two other estimation strategies: multiple regression and an
exemplar-based model similar to Probex (Juslin et al., 2003). In both
criterion distributions, von Helversen and Rieskamp found that
the mapping heuristic, on average, predicted the estimates as well
as or better than its two competitor models. Thus, the experi-
mental evidence so far indicates that in both situations of inference
from memory and inference from givens, simple fast and frugal
mechanisms—whether QuickEst or the mapping heuristic—are
often better at accounting for the estimates that people make than
are more complex strategies.
How Does Predictable Environment Imbalance Emerge?
We used Pareto’s notion of “predictable imbalance” to refer to the

ubiquitous phenomenon of environmental skewness characteristic
of power-law distributions: In many domains, few objects take on
very large values (e.g., in frequency, intensity, size) and most take
on medium to small values. What is the origin of such distribu-
tions? This is a hotly debated question, and the explanations of
how such power-law distributions might arise in natural and man-
made systems range from domain-general explanations such as
“self-organized criticality” (e.g., Bak, 1997) to domain-specific
explanations such as models of urban growth (e.g., Simon, 1955b)
or the reasons for the rarity of large fierce animals (Colinvaux, 1978;
see Newman, 2005, for a review of various explanations). In what
follows, we briefly describe these two domain-specific accounts of
predictable imbalance.
Simon’s (1955b) model of urban growth aims to explain why

rank–size distributions of city populations are often but not always
nicely approximated by a straight line with a slope q = −1 (for
examples see Brakman et al., 1999). It is assumed that new migrants
to and from cities of particular regions arrive during each time
period, and with a probability π they will form a new city, and with
a probability of 1–π they will settle in a city that already exists
(for an exposition of Simon’s model, see Krugman, 1996). The prob-
ability with which any given city attracts new residents is propor-
tional to its size. If so, this model will generate a power law, with
exponent q = −1/(1–π), as long as π is very close to 0. In other words,
if new migrants almost always join existing cities, then
q will converge toward −1. This elegant explanation of Zipf’s law
for city-size distribution has, however, a number of drawbacks
that various authors have pointed out (e.g., Krugman, 1996; Brakman
et al., 1999).
In his book Why Big Fierce Animals Are Rare, the ecologist
Paul Colinvaux (1978) concluded that body mass and metabolic
demands of large animals set limits to their frequency. Indeed, as
Carbone and Gittleman (2002) have shown, the relationship between
the number of carnivores per 10,000 kg of prey and carnivore
body mass itself follows a power function, with an exponent
of −1. For illustration, 10,000 kg of prey biomass cannot even sup-
port in perpetuity one polar bear whose average body mass amounts
to 310 kg, whereas it supports 146 Channel Island foxes, which
have an average mass of about 2 kg. An adult male killer whale,
with a daily caloric demand of 287,331 calories, must guzzle down
five male or seven female sea otters per day, thus a single pod of
killer whales (composed of one male and four females) could ingest
over 8,500 sea otters per year (Williams, Estes, Doak, & Springer,
2004). Clearly, high caloric demands require a large intake of
prey, and the question of why big fierce animals are rare comes
down to whether these animals can find as much food as they need
to survive.
Both domain-specific and domain-general scientific explana-
tions have been proposed for ubiquitous types of statistical dis-
tributions, whether they be, for instance, power-law or Gaussian
distributions. Assuming the human mind contains an adaptive
toolbox of simple cognitive strategies (Gigerenzer, Czerlinski, et al.,
1999), one unexplored issue is whether people have intuitive
theories about the emergence of specific distributions—for exam-
ple, “there need to be many, many more small animals than big
animals, because any big one preys on many small ones”—and to
what extent such theories play a role in triggering cognitive strate-
gies that bet on specific types of distributions.
Conclusion
Power-law distributions face us from all sides. Chater and Brown

(1999) pointed out their ubiquity in environmental features that we
perceive. Based on this, they argued that many psychological laws
governing perception and action across domains and species
(e.g., Weber’s law, Stevens’s law) reflect accommodation of the per-
ceptuo-motor system to the skewed world. The same type of rela-
tionship to J-shaped environments has also been argued for the
structure of memory (Anderson & Schooler; 1991; Schooler &
Hertwig, 2005; see also chapter 6). Similarly, we take as a starting
point the observation that power-law regularities hold across a
wide range of physical, social, and economic contexts. Assuming
not only that the perceptuo-motor and memory systems are built to
represent the statistical structure of imbalanced environments
(Anderson, 1990; Shepard, 1994/2001) but also that the cognitive
system has been similarly constructed, we have proposed QuickEst,
a fast and frugal heuristic for making estimations. Its architecture
exploits the world’s frequent predictable imbalance. In the study of
mental tools (including heuristics) as well as mental structures
(including perception and memory) we begin to discern that the mind
looks very much matched to key structures of the world.
Part VI
DESIGNING THE WORLD
16
Designed to Fit Minds
Institutions and Ecological Rationality
Will M. Bennis
Daniel G. Goldstein
Anja Dieckmann
Nathan Berg
Reform the environment, stop trying to reform the people.

They will reform themselves if the environment is right.
Buckminster Fuller
O nly about 12% of Germans have given legal consent to donate

their organs when they die. In contrast, in the neighboring country
of Austria more than 99% are potential donors. To explain this large
difference in consent rates for organ donation, social scientists
using the standard decision-making model in economics have
looked to differences in expected benefits and costs while con-
trolling for income, education, and religion (Gimbel, Strosberg,
Lehrman, Gefenas, & Taft, 2003). Regression models based on the
benefit–cost theory, however, show little evidence that large differ-
ences in actual organ-donor consent rates are statistically or caus-
ally linked to perceived benefits and costs. Critics of the economic
model have attempted to explain cross-country behavioral differ-
ences in terms of culture, social norms, and history. But the mostly
small differences between Austria and Germany on these dimen-
sions seem unlikely candidates for explaining the large gap in their
donor consent rates.
Johnson and Goldstein (2003) did, however, identify an impor-
tant institutional difference between Austria and Germany that
seems to explain differential consent rates much better than eco-
nomic, sociological, and historical approaches: different defaults
written into law regarding organ donation consent status. In presumed
409
410 DESIGNING THE WORLD
consent countries such as Austria, individuals are from birth

considered to be potential organ donors, which means there is
effective legal consent for their organs to be harvested upon death
for transplant to the living. Explicit consent countries such as
Germany, on the other hand, use the opposite default: No organs
can be legally harvested from the dead unless individuals opt in to
organ-donor status by giving their explicit consent.
Switching away from either default is not especially costly in
terms of time or effort. In Germany, according to current law, one
can switch from the nondonor default to donor status by submit-
ting this wish in writing.1 In Austria, opting out of consent status
requires a bit more effort and physical resources, but not much
more: submitting an official form to the Austrian Federal Health
Institute via post or fax, requiring approximately 5 minutes and
perhaps a stamp. The main implication of these small switching
costs is that, according to the stable preference assumption of
standard economic theory, defaults should not influence behavior.
For example, someone who has stable preferences that rank donor
over nondonor status—and whose difference in payoffs across
these two states more than offsets the cost of switching away from
the default—should choose to be an organ donor regardless of how
defaults are set. Yet, contrary to economic theory, defaults are
strongly correlated with actual consent rates. Figure 16-1 shows
consent rates for a range of countries, making clear the large differ-
ence in potential organ donation rates between presumed consent
countries and explicit consent ones.
Johnson and Goldstein (2003) suggested a simple heuristic model
of individual behavior that fits the data in Figure 16-1 much better
than rival explanations investigated elsewhere in the literature.
Their default heuristic consists of the following procedure: When
faced with a choice between options where one of them is a default,
follow the default. This heuristic—in contrast to other explana-
tions—does not rely on inherent differences inside the minds of
decision makers in different countries: It predicts distinct behavior
on the part of Austrians and Germans because it depends on an
institutional variable set to different values in those countries,
namely, defaults regarding consent. The heuristic model does not
rely on a theory of inherent preferences, and it attributes none of
1. The law is the Gesetz über die Spende, Entnahme und Übertragung
von Organen, BGBI 1997, Article 2631. A German government website
(www.organspende-kampagne.de/) provides an official form that one
can use for the purpose of changing donor status. The official form is not
required, however, nor any formal registration. In some cases where rela-
tives have been clearly informed of the individual’s wish to become an
organ donor should the occasion arise, verbal consent may even substitute
for written consent.
DESIGNED TO FIT MINDS: INSTITUTIONS AND ECOLOGICAL RATIONALITY 411
Explicit Consent Presumed Consent

100
Potential Organ Donors (%) 90
80
70
60
50
40
30
20
10
0
Denmark
Germany
Netherlands
United Kingdom
Austria
Belgium
France
Hungary
Poland
Portugal
Sweden
Figure 16-1: Population rates of potential organ donors by country.
The first four bars indicate explicit consent countries, where indi-
viduals are assumed not to be organ donors but can take action to
opt in to organ donor status. The remaining bars indicate presumed
consent countries, where the default presumes that individuals
are organ donors while allowing them to opt out if they choose.
(Adapted from Johnson & Goldstein, 2003.)
the observed differences in behavior to essentialist concepts resid-

ing solely within individuals or exclusively outside. In this
chapter, we explore cases such as this where ecological rationality
can emerge—or be obscured—through interactions between the
decision heuristics of individuals and the choice environments
they face, which in turn have been structured by institutions with
incentives that may or may not match those of the individual. (See
chapter 17 for further examples of this interaction in health care.)
The institutional environment structures that shape people’s
behavior can be surprisingly subtle. To show this in the case of
organ donation decisions, Johnson and Goldstein (2003) ran the
following experiment. Participants were randomly assigned to two
groups. One group saw the following opt-in cover story:
Imagine that you just moved to a new state and must get a new
driver’s license. As you complete the application, you come
across the following. Please read and respond as you would if
you were actually presented this choice today. We are inter-
ested in your honest response: In this state every person is
considered not to be an organ donor unless they choose to be.

You are therefore currently not a potential donor. If this is
acceptable, click here. If you wish to change your status,
click here.
The second group saw the same message changed to an opt-out

scenario with the script modified to read: “In this state every person
is considered to be an organ donor unless they choose not to be.
You are therefore currently a potential donor. . . .” The default has
simply been changed. How much difference will this make for
choices between the same two important outcomes?
In this environment constructed in the laboratory, 82% of par-
ticipants in the opt-out scenario chose to be potential donors, while
only 42% in the opt-in scenario did. This large gap between exper-
imental consent rates mirrors the differences between European
countries seen in Figure 16-1.
This experiment shows that the small change of adding or
removing the word “not” on the organ donation form, thereby
changing the default, has a large impact on the aggregate outcome
as measured by consent rates. Similarly drawing on heuristic
models of behavior, researchers have achieved large changes in
aggregate behavior by modifying default settings of institutional
parameters in other domains, such as personal savings (Thaler &
Benartzi, 2004; Thaler & Sunstein, 2008). Additionally, using two
natural experiments and two laboratory studies, Pichert and
Katsikopoulos (2008) showed that defaults have a dramatic influ-
ence on whether people in Germany subscribe to a “green” electric-
ity provider. On the other hand, large campaigns hoping to increase
donation rates by providing information about costs and benefits,
but without changing defaults, do not seem to work.2 Such failed
attempts to influence the public’s behavior implicitly draw on the
standard economic model of individual decision making as the
rationale for intervention, which assumes that individual decisions
result from systematic weighing of costs and benefits and so are
best influenced by changing individuals’ benefit and cost parame-
ters. Following this economic model, for example, the Netherlands
2. This is not to say that educational campaigns and increased knowl-

edge about the issues cannot make a difference or that a default heuristic
explains everything. Many people do not know they face an organ dona-
tion decision at all (including one author of this paper who thought he was
a donor but discovered he needed to send in a letter in addition to marking
his preference on his driver’s license application). But for those (many)
who do know they have a choice, most go with the default. If people assume
that defaults were designed to represent the average person’s preference or
the greater good, and if this assumption is generally correct, then following
the default heuristic would be appropriate.
undertook a broad educational campaign that included sending out

a mass mailing to more than 12 million people asking them to reg-
ister their organ donation preference. The result: Donation consent
rates did not improve (Oz et al., 2003). Consequently, calls are
increasing to adopt the simpler and more effective path of follow-
ing psychology and changing defaults as one way to overhaul ailing
health care systems (e.g., in the U.S., as heralded in the New York
Times—see Rose, 2009) and address other policy issues (Goldstein,
Johnson, Herrmann, & Heitmann, 2008).
Heuristics Versus Standard Economic Approaches to Decision Making
In evolutionary game theory, strategies or behavioral rules that

yield suboptimal payoffs are usually assumed to die out under
competitive pressure from agents using strategies with higher aver-
age payoffs. Thus, decision processes such as the default heuristic,
which are not derived as solutions to optimization problems, are
often considered uninteresting. The logic behind this dismissive
attitude is that heuristic behavior is unstable because it is
likely to be supplanted by superior decision strategies, and there-
fore it need not be studied, since one would not expect to observe
what is unstable for long. This exclusive focus on stable out-
comes in standard economic theory has attracted its share of critics
(e.g., Hayek, 1945; Schumpeter, 1942) yet remains a core tenet
of economics as it is taught and practiced throughout most of
the world.
Those who study heuristics as an alternative to the standard eco-
nomic model must acknowledge that the viewpoint of economic
theory poses a fair question: Why would someone use heuristics?
In the case of the default heuristic, it is easy to see that it is
well adapted to environments where institutional designers (i.e.,
those in charge of choosing defaults) have the interests of default
users in mind and communicate their recommendations through
their choice of available defaults. Of course, this confluence of
interests will not always be the case, as in countries such as Germany
and the United States, where 70–80% of those surveyed say they
want to be an organ donor and yet consent defaults are not set to
match this majority preference (Gallup Organization, 1993). Social
preferences may also play a role in explaining why people follow
defaults, for example, if people perceive social value in matching
the action taken by the majority, or if they fear negative social
consequences from behaving out of line with the majority (Ariely
& Levav, 2000). Defaults may codify social norms or provide a
coordination mechanism by which users of the default heuristic
successfully wind up in the majority. The default heuristic also
greatly reduces decision costs of time and deliberation, which are

common benefits of fast and frugal decision making (Gigerenzer &
Todd, 1999). Finally, the case of organ donation also raises the pos-
sibility that deliberating over some choice sets is inherently dis-
tasteful, forcing individuals to consider unpleasant contingencies
such as one’s own death, which may be substantially avoided by
ignoring the full choice set and accepting defaults.
In this chapter we take up the theme of institutional design
through the lens of ecological rationality instead of standard eco-
nomic theory. Heuristics are models of individual behavior based
on psychological plausibility and ecological effectiveness rather
than axioms of logical consistency from economic theory. As the
examples in this chapter are intended to show, the study of heuris-
tics allows us to analyze institutions that economic theory would
never predict and provides new explanations for the functioning
of existing institutions according to institutional objectives, such
as simplicity and transparency, that are difficult to motivate using
standard informational assumptions of economic theory.
As critics (e.g., Hayek, 1945; Simon, 1955a) and defenders (e.g.,
Becker, 1978) have both pointed out, neoclassical economics and
game theory are based on a well-defined, singular model of human
behavior. This benefit–cost model assumes that choice sets are
searched exhaustively, alternative choices are scored in terms of
benefits and costs, and finally these scores are integrated to deter-
mine an optimal action or decision (for foundational examples,
see Savage, 1954; von Neumann & Morgenstern, 1947). One key
implication of the economic model is that behavior, which is taken
to result from the process of optimization just described, should
depend systematically on perceived benefits and costs. A second
important implication that follows from this is that institutional
modifications that leave choice sets and their net benefits unaltered,
as do default rules for organ donation consent (apart from the
costs of switching away from the default), should have no effect on
observed behavior. Similarly, logically equivalent representations
of a given set of information should not, according to the economic
model, influence behavior (see chapter 17).
But once one considers the possibilities for designing institu-
tions to fit actual human minds and the processes they follow
rather than fictitious agents pursuing the economic model of opti-
mization, new challenges and new possibilities arise. Some institu-
tions that would not work in a world populated by economic agents
work surprisingly well in the real world. For example, economists
consider it something of a puzzle why voluntary compliance with
income tax laws is so high, and why littering in some very clean
public parks is not more of a problem, given that governments
invest so little in enforcement. In other cases, institutions that
assume forward-looking behavior, full information, and costless

information processing encounter obvious problems when con-
fronted with the human realities of limited information and
cognition, as demonstrated by the case of organ donations and by
numerous instances of well-intentioned institutions incorrectly
assuming that complete information and unhindered choice is
the best way to help people make good decisions (Thaler & Sunstein,
2008). The examples that follow illustrate a range of real-world
institutions that one would never expect to be designed in the
way that they are if the hypotheses built into the economic model
of human behavior were universally valid. Our analysis pro-
vides initial steps toward an ecological rationality perspective on
institutional design, exploring how the structure of institutions can
fit or exploit the structure of tools in the mind’s adaptive toolbox.
Transparency Without Trade-offs in Traffic and Soccer
When making a decision based on a list of factors, perhaps the most

common recommendation in the decision sciences is to weigh
many factors. The decision maker is supposed to apply implicit
weights to various factors and trade off the relative value of one
factor against another. Weighing many factors embodies the essence
of oft-repeated adages about good decision making that insist on
considering all the evidence, carefully analyzing trade-offs, not
rushing to make snap decisions, and so on.
In this section, we examine two institutions that help agents to
make transparent decisions without weighing many factors.
Decision rules that require no trade-offs are referred to as noncom-
pensatory, because decision factors have a fixed ranking of impor-
tance, and factors that are less important cannot overrule, or
compensate for, higher ranking factors. The way we alphabetize
words in the dictionary provides a good example of a particular
type of noncompensatory decision strategy called a lexicographic
rule, with the letters in each word representing the potential factors
that contribute to the decision of which word is ordered first. In
ordering the words azimuth and babble, for example, the first letter,
or factor, by itself leads to an unequivocal decision: azimuth comes
before babble because the first letter of the former comes before the
first letter of the latter—the subsequent letters do not matter, even if
they point in the “opposite” direction (e.g., “z” comes after “a”).
This is precisely what allows us to alphabetize words quickly, with-
out comparing all their letters.
Lexicographic rules have proven successful in the design of
institutions in environments where decisions must be fast and at
the same time transparent, that is, readily predictable by others so
as to minimize uncertainty and misunderstanding in interactions.

Speed and transparency are especially valuable when smooth tem-
poral coordination between individual actors is required, as in the
following brief analysis of traffic rules.
Determining Right-of-Way
Ancient Rome was a city of perhaps a million people, but it lacked
traffic signs (let alone stoplights) to guide the many pedestrians,
horse riders, and chariots on its roads. Right-of-way was deter-
mined by wealth, political status, and reputation. In case of ambi-
guity about which of these cues was more important, the issue was
decided by how loudly accompanying slaves could yell, or by
physical force. This led to much confusion and conflict on the
roads of Rome. Historian Michael Grant even controversially
hypothesized that traffic chaos pushed Nero over the edge, leading
him to burn the city in the year 64 A.D. with hopes of subse-
quently building a more efficient road system (Gartner, 2004).
In contrast to the compensatory system of Nero’s time that required
simultaneous consideration of multiple factors, right-of-way through-
out most of the world is now governed by noncompensatory lexico-
graphic rules that leave far less room for ambiguity, although the
details differ between countries. In Germany, for example, the right-
of-way rules for deciding which of two cars approaching an inter-
section gets to go through first include the following hierarchy:
If you come to an intersection with a police officer regulating

traffic, follow the officer’s directions and ignore everything
else.
Otherwise, if there is a traffic light or stop sign, follow it and
ignore everything else.
Otherwise, if there is a yellow-diamond right-of-way sign,
proceed.
Otherwise, if there is a car approaching from the right, yield to
it.
Otherwise, proceed.
So, for example, the stopping gesture of a police officer cannot be

overruled by any combination of lesser priority cues suggesting that
one may drive through an intersection, including a green light,
right-of-way sign, and being to the right of other approaching cars.
This is the hallmark of a lexicographic system.
If drivers had to apply weights to various factors or cues and
compute weighted sums to decide whether to drive through any given
intersection, disastrous consequences would surely follow. Individ-
ual decision processes would slow down as more information
would need to be looked up and processed. The possibility of over-

looking information, computational errors, and individual varia-
tion in weights assigned to cues would make it almost impossible
to anticipate how other drivers might act. Processing cues in a
simple lexicographic fashion, and relying on other drivers to do so
as well, frees cognitive resources for other important driving tasks
and makes the roads safer. Noncompensatory rules also help settle
arguments about fault quickly when accidents do occur. These
benefits of the transparency of noncompensatory regulation can
also be found in a variety of other institutions—for example, decid-
ing outcomes in sports.
Making It to the Next Round

The International Federation of Football Associations (FIFA) is the
governing body of the soccer world. It manages a number of major
soccer competitions, including the World Cup, which attracts more
than a billion television viewers around the world. Economists
have studied the design of sports tournaments, focusing on designs
that maximize profits (Knowles, Sherony, & Haupert, 1992), or
whether tournament rules satisfy certain axioms (Rubinstein, 1980).
As it turns out, FIFA also employs lexicographic rules to increase
transparency and minimize controversy.
World Cup tournaments involve a group and a knock-out stage. In
the latter knock-out stage, teams are eliminated with a single loss. In
the group stage, however, teams are usually arranged in groups of
four, where each team plays all others in the group, and a single loss
is not necessarily fatal. To determine which team advances to the
next stage, FIFA uses a point system (with points being distinct from
goals). The winner of each match is awarded three points, regardless
of the final score, and the loser receives zero points. If a match’s final
score is a tie, then each team gets one point. After all group-stage
matches are played, teams in each group are ranked according to
points to determine who advances to the knock-out stage.
Because ties in these point totals can occur at the group stage,
FIFA had to develop a system to produce an unambiguous ranking
when a tie arose. FIFA considers multiple cues for ranking teams
at the group stage. Following a lexicographic rule similar to take-
the-best (Gigerenzer & Goldstein, 1996, 1999), a team is ranked
above its competitor when it is favored by one of the following
cues, considered in the listed order (starting with the point totals),
taken from the FIFA 2010 tournament regulations (Regulations,
2010, pp. 47–48):
1. More points earned in all group matches;

2. Larger goal differential in all group matches;
3. More goals scored in all group matches;

4. More points earned in group matches against teams with
the same values on cues 1, 2, and 3;
5. Larger goal differential in group matches against teams
with the same values on cues 1, 2, and 3;
6. More goals scored in group matches against teams with the
same values on cues 1, 2, and 3;
7. Random tie-breaker: If two or more teams tie according to
the first six cues, then the ranking is made at random by
drawing lots.
A similar set of cues was employed in the lexicographic rule

used to decide the notorious “Shame of Gijón” group ranking in the
1982 World Cup in Spain, comprising teams from Algeria, Austria,
Chile, and Germany. Only two teams were to advance to the
next stage, but according to FIFA’s group-stage point system,
Germany, Algeria, and Austria all had four overall points, while
Chile had zero.3 Further cues were applied in order and determined
that Austria and Germany would advance to the next round. But
this result led to widespread suspicion and criticism, because
the group-stage game between these two neighbors took place
after the first five group-stage matches were finished. Germany and
Austria knew, even before their match began, that a 1:0 result for
Germany would allow both to advance. Many fans suspected that
the teams somehow colluded to ensure their joint success over
Algeria. After this incident, FIFA redesigned the timing of matches
so that its ranking rule could not be exploited. The last two group-
stage games now take place simultaneously.
Why does FIFA use a lexicographic rule to produce group-stage
rankings rather than weighting and adding all the cues? Unlike
the right-of-way example in fast-moving traffic, plenty of time and
computing resources are available to process the final group-stage
scores and arrive at rankings using more complex, compensatory
ranking schemes. One reason a more complex method is not used,
though, appears to be transparency. The hypothesis is that when
stakeholders in any ranking scheme clearly understand the process
by which results are obtained, they accept those rankings—or, as
in the Shame of Gijón, are able to spot problems with them—more
readily than they do when complex algorithms are employed. This
is based on the idea that rankings, like tax schemes and constitu-
tions in democracies (Berg, 2006), require a large degree of shared
belief in their legitimacy in order to coordinate action effectively.
3. In 1982 the winner of a game was allocated two points (not three as is
the case at the time of this writing).
The basic principles behind FIFA group-stage rankings are easy to

understand: Points earned are more important than goal differen-
tials, goal differentials are more important than goals scored, and
all arguments about how much more important one cue is than
the next are moot.
One way to measure the simplicity of a ranking device is via its
informational requirements. A ranking device based on a regres-
sion model with the cues described above would rely on all avail-
able information to make any pair-wise comparison: Plug in cue
values for two teams, apply beta weights from the regression model,
and rank the team with the higher score ahead of the other. In con-
trast, the lexicographic ranking rule that FIFA uses operates much
more frugally, in the sense that most pairs of teams can be ranked
based on a single reason, without looking up each team’s values
for all cues. This reliance on typically little information also makes
the application of the rule more transparent.
To determine how informationally frugal the FIFA strategy is, we
calculated an empirical frequency distribution of how many cues
in the list given above would have been needed historically
to determine pair-wise team rankings.4 In the 18 World Cups played
before 2010, there were 88 groups and a total of 529 pair-wise
rankings. For each of these 529 cases, we determined how many
of the seven cues in that order would need to be looked up to
specify the ranking. As can be seen in Figure 16-2, most of the time
(471 out of 529 cases), the first cue alone (overall points earned)
sufficed to specify the ranking. One ranking was decided by chance
(i.e., cue 7, after no other cues were decisive). The average num-
ber of cues looked up was 1.2, indicating a high degree of infor-
mational frugality. This was due in large part to the high
discrimination rate of the first cue (which was so high because the
cue is nonbinary), allowing it to determine most of the ranking
decisions.
Transparency is chief among the virtues of FIFA’s lexicographic
ranking rule. On the other hand, many organizations, such as casi-
nos, are strategically designed for nontransparency—so that their
customers, such as gamblers, cannot easily see how they operate.
We next investigate the nontransparency of casinos and show how
their strategies can be understood in terms of heuristic models of
behavior that depart from the standard economic model.
4. Again, note that this set of cues is not exactly the same as that used in
some of the World Cups we analyzed.
500
471
400
Frequency
300
200
100
38
16
3 0 0 1
0
1 2 3 4 5 6 7
Number of Cues Looked Up
Figure 16-2: Frequency distribution of the number of cues looked

up to determine pair-wise rankings in the group stage of World
Cup tournaments 1930–2006. (Precise counts indicated above
each bar.)
Beliefs About Winning on Slot Machines: It’s Not All in the Players’ Heads
In 2007, Americans spent $34 billion gambling in commercial casi-

nos (American Gaming Association, 2008), perhaps half what they
spent across all forms of institutionalized gambling (Christiansen,
2006). This figure is on the same scale as the entire fast food indus-
try ($150 billion) and greatly exceeds the value of another
entertainment industry, the $600 million worth of movie tickets
purchased (American Gaming Association, 2008). To make a profit,
gambling institutions are designed so that the average gambler loses
money. Because gamblers can expect this loss, the fact that so many
people who turn out to be risk averse in other decision domains
still choose to gamble presents a perplexing challenge to the eco-
nomic model of individual decision making (Eadington, 1988;
Wagenaar, 1988; Walker, 1992b).
Nonetheless, many economists could see this paradoxical gam-
bling behavior as readily explained by the standard economic
model by pointing to nonmonetary utility as compensation for
monetary losses (Becker, 1978). When people choose to gamble, this
reasoning goes, they willingly forgo a sum of money (the expected
monetary loss from gambling) as the purchase price for their enter-
taining or exciting experience (Eadington, 1988). Indeed, empirical
research supports the view that the utility of gambling stems from
many nonmonetary sources along with the obvious monetary one
(Bennis, 2004; Smith & Preston, 1984; Wagenaar, Keren, & Pleit-
Kuiper, 1984; Zola, 1963). Nonetheless, although other sources of
utility besides expected winnings are undoubtedly part of what
motivates gamblers, there is abundant evidence that many people
gamble because they have false beliefs about their ability to win.
Often this is a belief that they have an advantage over the casino,
but casino gamblers also systematically overestimate their chances
of winning, overestimate the role of skill in games that are largely
determined by chance, and use gambling strategies that do not work
(Ladouceur, 1993; Lambos & Delfabbro, 2007; Miller & Currie, 2008;
Sundali & Croson, 2006; Wagenaar, 1988; Walker, 1992b). Thus, at
least part of why people gamble seems to stem from a systematic
failure to estimate their expected payoffs correctly.
Theories attempting to account for this faulty payoff estimation
fall into two broad categories. The first, and far more common, type
of theory identifies the source of the problem as originating inside
gamblers’ minds. According to such theories, people gamble
because of shortcomings in how they think and reason, including,
among other things, a failure to understand the nature of probabil-
ity and randomness (Gaboury & Ladouceur, 1988, 1989; Ladouceur
& Dubé, 1997; Ladouceur, Dubé, Giroux, Legendre, & Gaudet, 1995;
Lambos & Delfabbro, 2007; Metzger, 1985; Steenbergh, Meyers,
May, & Whelan, 2002; Sundali & Croson, 2006; Wagenaar, 1988;
Walker, 1990, 1992a).
The second type of explanation, to which we subscribe, focuses
on factors in the external environment: While acknowledging that
gamblers may sometimes have false beliefs about their chances of
winning and use the wrong heuristics, we argue that the source of
these shortcomings lies not so much in biased or irrational think-
ing, but rather in the gamblers’ environment and their interactions
with it (see, e.g., Bennis, 2004; Dickerson, 1977; Griffiths & Parke,
2003a; Harrigan, 2007, 2008; Parke & Griffiths, 2006). Specifically,
there is a mismatch between the (otherwise usually adaptive) heu-
ristics used by gamblers on the one hand, and the structure of the
casino environment on the other—the opposite of the ecologically
rational match between heuristics and environments explored
extensively elsewhere in this book.
Why does this mismatch come about? Because it is in the casi-
nos’ interest for this mismatch to exist, and they construct the
gamblers’ environment so that it does. The degree to which casinos
intentionally design games to exploit normally adaptive heuristics,
or alternatively simply select the games that end up garnering
the greatest profits and which turn out to be the ones that promote
this mismatch, is an open question. But the result is a wide range
of casino games exquisitely designed to exploit otherwise adaptive
heuristics to the casinos’ advantage. They produce representations
in the environment that provide the cues that the gamblers’ heuris-
tics rely on; as we will see, these cues are about the success and
failure of gambling heuristics and about the ways machines
operate. (This is similar to how companies exploit the often-
adaptive use of recognition to lead people to buy the products that
they recognize through advertisement—see Goldstein & Gigerenzer,
1999, 2002.) Unlike the organ-donor example, in which some envi-
ronments were inadvertently designed in a way that discouraged
organ donation, the casino industry has a powerful incentive to
design environments that contribute to false beliefs and a corre-
sponding maladaptive application of heuristics, since their eco-
nomic success stems from their ability to get and keep people
gambling.
We focus here on slot machine environments constructed by
Las Vegas resort casinos to encourage use of misleading cues
(Bennis, 2004). In the standard economic model, logically equiva-
lent representations of information are irrelevant, because deduc-
tive logic, which is equally capable of utilizing information in any
format, is assumed to underlie behavior. But psychologically, dif-
ferent representations of the same information can have a large
impact on how people use it to reach decisions (see, e.g., chapter 17
on the impact of different representations of medical information).
Thus, the casinos’ ability to influence gambling through the strate-
gic representation of information becomes understandable only
when the economic model is revised to incorporate psychologically
realistic theories of cognition.
Representing the Experience of Winning

Major hotel-casino resorts in Las Vegas have one or more casino
floors where hundreds, sometimes thousands, of slot machines are
arranged in aisles with lines of machines on both sides, back to
back against other lines of machines. During play, contemporary
slot machines generate an abundance of audio and visual cues that
are difficult to miss or ignore.
When slot machine players cash out winnings, metal tokens
typically drop several inches onto a metal tray, generating loud
clanking sounds that can be heard almost constantly and from vir-
tually every direction in busy casinos.5 Many machines amplify the
clanking of coins, which makes winning a very public and familiar
(if vicarious) event to those who spend time in a casino. If slot
5. Coin and token payouts are rapidly being replaced with paper
vouchers such that this method of manipulating subjective experience
may soon be a thing of the past.
players do not immediately collect their tokens, wins are announced

with escalating beeping music, marking the increasing credits
that players can cash out in the future. In this case, the amplified
sound of growing credits often accrues at a faster pace than the
credits themselves, contributing to a subjective perception that
players have won more than they actually have.
In addition to audio, slot machines can generate visual cues that
can be seen by others from a distance. For example, most slot
machines in Las Vegas are equipped with a spinning siren light on
top, which flashes whenever a major jackpot has been hit. Larger
jackpots need to be paid by hand, and during the time it takes for
slot machine attendants to walk to the winner and deliver their
money, the winning machine continues to flash and blare, some-
times for more than half an hour. Slot machine players regularly
complain about how slow attendants are to pay off major jackpots.
These long waits serve to advertise large jackpots in a manner that
makes their occurrence appear more frequent than it is. On busy
nights, many large-jackpot winners can be observed, often at the
same time, due in part to extended payoff wait times. Some casinos
prominently display posters of past winners of major jackpots, pho-
tographed while being paid with over-sized publicity checks.
While winnings are emphasized and communicated through a
wide variety of cues in the casino environment, losses are hardly
signaled at all. This raises questions about gamblers’ perceptions of
win and loss probabilities: Where environments have been con-
structed to highlight winnings and hide losses, can we expect indi-
viduals to see through the selectively represented cues and formulate
hard-nosed expectations based on the logic that casinos must profit
to stay in business, that gambling is a zero-sum game, and therefore
that they should expect to suffer losses? Or might gamblers too
often expect to win because instances of winning are almost always
visible in the casino?
Heuristics designed to adaptively guide foraging behavior by
following the observed successes of others, such as an “imitate
the successful” rule (Boyd & Richerson, 1985), run into problems in
the casino environment. To the extent that frequencies of success
are processed unconsciously by observing other gamblers in a
casino, the casinos’ nonrepresentative construction of cues, which
include uninformative or misleading signals from sirens and flash-
ing lights, may significantly promote gambling behavior, to the det-
riment of most gamblers.
Representing How Slot Machines Work

Another way that nonrepresentative cues distort gamblers’ per-
ceptions of the constructed casino environment revolves around
the inner workings of slot machines. Until the 1960s, slot machines
worked much as their exterior design suggests. A machine had
three reels covered with symbols, each with around 20 possible
stop positions where the reel could come to rest showing one of the
symbols, and each stop had an equal probability of occurring
(Cardoza, 1998; Kiso, 2004; Nestor, 1999). Given this design, there
would be 203 (i.e., 8,000) possible outcomes, and a jackpot requir-
ing a unique combination of three symbols would occur with
probability 1 in 8,000, or .000125. After observing the pay line (i.e.,
the payoff-determining three symbols shown when the reels stop
spinning) on several spins on an old machine, along with a view
of the symbols above and below the pay line, savvy players could
estimate the actual number of stops and the frequency of each
symbol on each reel. They could then compare this assessment
with the payout chart for winning combinations to determine the
expected value of playing a particular machine.
Figure 16-3 shows an old and a new slot machine side by side.
On the surface, new slot machines look very much like older
machines, but their internal mechanics are entirely different. New
slot machines use digital random number generators rather than
physically spinning reels to determine wins and misses. Never-
theless, contemporary machines continue to display spinning reels,
providing nonrepresentative cues meant to distort the true payoff-
generating process. If, for example, the largest jackpot requires
Figure 16-3: Left: The “Liberty Bell,” the father of the contempo-
rary slot machine (image courtesy of Marshall Fey), released to the
public in 1899 (Legato, 2004). Right: A contemporary 25¢ banking
slot machine with a siren light on top (image courtesy of Paul and
Sarah Gorman).
three red sevens, it would be possible for the microchip designers

to assign a 1 in 1 billion chance of this outcome, even while the
machine’s external design falsely suggests a 1 in 8,000 chance of
winning, as would have been the case on older machines. Similarly,
inflated frequencies of hope-inspiring near-jackpot misses can also
be created. Such strategically nonrepresentative design is standard
practice in the casino environment (Griffiths & Parke, 2003b;
Harrigan, 2008; Turner & Horbay, 2004). Institutional designers go
to great lengths to represent information in ways that should not
matter in the standard economic model (e.g., rational Bayesian
updaters making inferences about winning probabilities should
not be influenced by sirens, flashing lights, and uninformative
spinning wheels). But this strategy works for the interests of the
casinos because gamblers use decision processes built on psycho-
logical mechanisms that are sensitive to the structure of their
environment and which can thus be subverted by situations con-
structed to provide misleading and irrelevant cues.
Ecological Rationality in Institutional Design
Unlike the axiomatic definitions of rationality that economic

models draw upon, ecological rationality implies that evaluations
of decision processes cannot be undertaken in isolation, strictly at
the level of one individual’s internal logical consistency. Rather,
decision processes should be evaluated contextually according
to how well they match the environments in which they are used.
These distinct notions of rationality have important implications
for the analysis of institutions.
According to the standard economic model, there is no need
to study or analyze strategic interactions between institutional
designers and nonoptimizing heuristic users, because people
would eventually abandon such heuristics in favor of optimal
behavior. The space of problems to which the economic model is
applicable is therefore rather narrowly circumscribed because of
its stringent behavioral assumptions, such as exhaustive search for
information and options, optimal weighing of costs and benefits,
and adherence to logical norms of probabilistic reasoning. These
assumptions rule out consideration of institutions that are built to
work with populations of real humans using heuristics.
The organ-donor example shows how the standard economic
model misses an important institutional determinant of real-world
behavior: the setting of defaults that do not change feasible choice
sets yet influence heuristic-based decision making nevertheless.
In the same way, psychological theories that try to understand
behavior solely in terms of knowledge and beliefs also miss the
importance of heuristics interacting with institutions. The exam-

ples of noncompensatory rules regulating traffic and professional
soccer rankings highlight psychologically important objectives that
are difficult to motivate using the standard economic model: deci-
sion simplicity and transparency. These factors are critical for
many institutional designs, and designers can achieve them not by
trying to manipulate economic models of behavior but by creating
systems that fit human lexicographic decision strategies.
The last example of the casino environment shows how institu-
tions can be designed to exploit vulnerable heuristics that rely on
transparent information structure to produce adaptive choices
in other domains. People typically expect transparency and use
simple rules exploiting straightforward relationships between cues
and outcomes, such as “where I’ve seen success (or near success)
up to now, I will expect success in the future.” Casinos can exploit
this by subverting the cue–outcome relationship and leading
gamblers to think mistakenly that they are on the path to likely
success. Such conflict of interest between institutional designers
and agents who interact with those institutions is also commonly
analyzed within the standard economic model framework. However,
the ongoing systematic exploitation of gamblers by casinos is under-
stood much more easily using the concept of designed mismatch
between heuristics and decision environments than through
complicated rationalizations of gambling as a positive-surplus-
yielding activity where intrinsic, nonpecuniary gains outweigh
monetary losses.
In the book Simple Rules for a Complex World, Richard Epstein
(1995) similarly builds a case for the benefits of designing institu-
tions with simple transparent rules and the dangers of going in the
opposite direction. He argues that in the United States, the law has
become excessively complex and nontransparent, resulting in
an overly litigious environment where complexity is exploited by
lawyers. According to his view, complexity in the legal code makes
outcomes more malleable to intervention by skilled legal crafts-
manship and, thus, more volatile and less robust. The result has
been a kind of arms race where more and more lawyers are
necessary to protect individual and corporate interests against the
claims of others, with the outcome depending on who has the
money to hire the best team of lawyers rather than on more ideal
standards of justice. Epstein advocates that we reduce our complex
avalanche of laws to just six simple mandates, such as property
protection. This will save on legal costs and, more importantly,
reduce uncertainty through greater transparency, thereby increas-
ing public trust in government institutions, and as a consequence,
compliance with the law. (For an extensive investigation of the
general question of how legal institutions shape heuristics and vice

versa, see Gigerenzer & Engel, 2006.)
The central point is that environmental structure is not simply
an independent variable on which decision processes and their
performance depend. Environments themselves can be, and often
are, actively structured, selected, and intentionally designed
(both by humans and by other animals—see Hansell, 2005; Odling-
Smee, Laland, & Feldman, 2003). A crucial ingredient for success-
fully analyzing the institutional dynamics in which environments
and behavior co-evolve is understanding the decision heuristics
that are actually used by the population under consideration (see,
e.g., Todd & Heuvelink, 2007, and chapter 18), not unrealistic opti-
mizing strategies derived from the standard economic model. The
descriptive question of how well, or poorly, people make decisions
in particular environments is thus also, fundamentally, a question
about how well environments are tuned to particular decision tasks.
From the standpoint of ecological rationality, the normative ques-
tion is not simply how our reasoning processes can be improved,
but also how to design environments to better match the ingenious
human cognitive hardware that comes for free.
17
Designing Risk Communication in Health
Stephanie Kurzenhäuser
Ulrich Hoffrage
Seven cardinal rules of risk communication, rule no. 7:

Speak clearly and with compassion.
Vincent T. Covello and Frederick W. Allen
I n October 1995, British women were confronted with bad news.

The U.K. Committee on Safety of Medicines alerted the public
that “combined oral contraceptives containing desogestrel and
gestodene are associated with around a two-fold increase in the risk
of thromboembolism” (Jain, McQuay, & Moore, 1998). In more pop-
ular terms, the third generation oral contraceptives doubled the
risk of getting potentially life-threatening blood clots in the lungs
or legs, that is, increased the risk by 100%. Not surprisingly, this
warning caused great concern among women and their physicians.
Many women stopped taking the contraceptive pill, resulting in
an estimated increase of up to 10% in unwanted pregnancies and
abortions (Dillner, 1996; Furedi, 1999). Ironically, abortions and
pregnancies increase the risk of thrombosis more than the third-
generation pill does.
If the same information about thromboembolism had been
expressed in absolute terms, it would have been clear how infre-
quent this dangerous side effect actually is. The original medical
study had shown that one out of every 7,000 women who took the
second-generation pill had thromboembolism, whereas for every
7,000 women who take the third-generation pill, this number is
two. In terms of absolute risk, the chance of thromboembolism
thus increases from one to two in 7,000 women, which corresponds
to an increase of 0.014 percentage points (Jain et al., 1998). Both
numbers that quantify the increase of risk—by 100% and by 0.014
percentage points—are derived from the same sample of women,
thus they cannot and do not contradict each other. Still, using one
or the other number to communicate the same risk makes a huge
428
DESIGNING RISK COMMUNICATION IN HEALTH 429
difference psychologically (Gigerenzer, Gaissmaier, Kurz-Milcke,

Schwartz, & Woloshin, 2007).
The pill scare offers an important insight and raises an intriguing
question. The insight: The way the information was communicated
affected the way journalists, physicians, and women understood
and perceived the risks, which, in turn, affected their behavior.
More generally, aggregate statistical information can be represented
in several ways, and different representations of health-related
information can lead to different understandings of risks and differ-
ent decisions about whether to undergo a certain diagnostic proce-
dure or which treatment to accept. The question: Given that the
understanding of the risks and, in turn, people’s behavior depends
on the representation of information—what determines the choice
of a particular representation? This question is particularly relevant
when people misunderstand risks and when they would have opted
for another course of action had the same information been repre-
sented differently. The insight is about cognition, and the question
is about interests and goals in risk communication. Both are about
ecological rationality: The former focuses on the way information
is represented in the environment of the receivers of risk messages,
the latter on features of the environment in which risk communica-
tors are operating.
In this chapter, we explore how the representation of statistical
information affects the understanding of risks and uncertainties in
medical contexts. We argue that problems in understanding and
dealing with numbers, sometimes referred to as innumeracy, are
often due to poorly designed information environments, rather than
to internal problems of the human mind (Gigerenzer, Mata, & Frank
2008; Lipkus, 2007; see also Galesic, Gigerenzer & Straubinger,
2009; Peters et al., 2006). Thus, the first aspect of ecological ratio-
nality with which we are concerned here—the degree of fit between
mind and environment—emerges in the interaction between
patients’ decision mechanisms and the medical information envi-
ronments they face (see chapter 16 for beneficially and detrimen-
tally designed environments). Throughout the chapter, we use
mammography screening as our prime example, though our general
conclusions have far wider implications. We begin with a short
introduction regarding the necessity of informing women about
the benefits and the risks of screening tests. We then summarize
the literature on so-called format effects in statistical reasoning
by looking at three types of statistical information that physicians
and patients often encounter: conditional probabilities, single-
event probabilities, and relative risks. For each type of information,
we propose a representation that facilitates understanding. We then
analyze what representations are actually used in published mate-

rials about mammography screening, that is, what the actual infor-
mation environment looks like. Based on this analysis, we turn to
the second aspect of ecological rationality: We identify factors
in the environment that can contribute to innumeracy and address
the question of why risks are not always communicated in a trans-
parent manner. We conclude with some recommendations for
changes, both in the information environment and in the institu-
tional and legal environments, that could help to foster statistical
thinking and informed decisions about mammography and other
medical screening (for more recommendations, see Gigerenzer &
Gray, 2011).
Informed Decision Making About Mammography Screening
Breast cancer is responsible for the highest (Europe) or second-

highest (United States) death toll due to cancer among women
(Boyle & Ferlay, 2005; Jemal et al., 2006). It therefore comes as no
surprise that breast cancer is also one of the most frequently cov-
ered diseases in the print media. Most American and European
women’s magazines have a special feature on breast cancer at least
once a year, and events and campaigns (e.g., pink ribbon parties,
breast cancer awareness months) try to raise awareness about the
disease and educate women about ways to detect it early (Hann,
1999). These efforts have been quite successful in what they set out
to do: In 2003, 70% of women in the United States aged 40 or older
had had a screening mammogram within the past 2 years (National
Cancer Institute, 2005).1 In view of this large compliance rate, do
most women also understand what the risks and benefits of mam-
mography screening are? The answer is clearly “no”: Women have
been found repeatedly to overestimate the benefits by orders of
magnitude while underestimating the risks (e.g., Black, Nease, &
Tosteson, 1995; Domenighetti et al., 2003; Schwartz, Woloshin,
Sox, Fischhoff, & Welch, 2000), and the same pattern was also
found for men and their perceptions of prostate cancer screening
(Gigerenzer, Mata, et al. 2008). More generally, laypeople appear to
lack sufficient knowledge about typical signs and risk factors for
relevant clinical conditions such as myocardial infarction or stroke
(Bachmann et al., 2007).
1. If there were symptoms such as a palpable lump in the breast, mam-

mography would also be used, but this would be clinical or diagnostic
mammography, not screening mammography. We will only consider
screening mammography here.
In a paternalistic medicine, where patients are told what to do,

this systematic ignorance would not matter. Yet, if patients should
be allowed to evaluate screening participation according to their
personal values and ultimately decide themselves what they want,
they need to have sufficient and understandable information about
risks and benefits (Charles, Gafni, & Whelan, 1999; Coulter, 1997).
Both legal and ethical principles imply that not just consent but
“informed consent” should be obtained for this and other medical
procedures: Ideally, patients should be informed about both bene-
fits and risks of a diagnostic test or a medical treatment and its
alternatives before a decision about participation is made (Doyal,
2001; General Medical Council, 1998; Ubel & Loewenstein, 1997).
Because screening tests are performed on individuals without
symptoms, the obligation for physicians to inform potential par-
ticipants thoroughly about benefits and risks is seen to be even
stronger than it is for tests and treatments that are performed on
people showing symptoms of illness (Marshall, 1996; McQueen,
2002). The reason is that in a screening, the number of participants
who benefit from the test (those who have an early stage of the
disease and would profit from early treatment) is rather small,
whereas the side effects of the test (e.g., exposure to x-rays during
mammography) affect all participants. Consequently, women
should be explicitly informed about the benefits, risks, and accu-
racy of mammography screening before they decide to participate
in it (Gigerenzer, 2002; Gigerenzer, Mata, et al., 2008; Marshall,
1996; Mühlhauser & Höldke, 1999). For physicians, this implies
that they have to provide women with the essential facts. Some
of the most important facts (the risk of false-positive and false-
negative results, the predictive value of the mammogram, and the
benefit of the screening) are presented in the following section.
Three Types of Statistical Information and Different Ways

of Representing Them
Much of the information that physicians and patients deal with—be

it the meaning of test results, the likelihood of benefits, or the likeli-
hood of harms—comes as a statistic. However, each piece of statis-
tical information can be represented in different formats, and some
of these are likely to foster misunderstandings, while others foster
transparent insight. In this section, we will review the literature on
how representation formats affect understanding of medical infor-
mation and we will propose representations that facilitate under-
standing. We will do so separately for three types of statistical
information: conditional probabilities, single-event probabilities,
and relative risk reduction.
Conditional Probabilities: What Does a Positive Test Mean?

Even with prior experience, patients often have only little knowl-
edge about the accuracy of diagnostic tests (Hamm & Smith, 1998).
Limited knowledge, in turn, often goes along with the belief that
diagnostic tests are more accurate and more predictive than they
actually are (Black et al., 1995), and sometimes even that the tests
are infallible (Barratt, Cockburn, Furnival, McBride, & Mallon,
1999). When attempting to comprehend results of diagnostic tests,
people need to understand that tests may, in fact, be fallible. Tests
produce incorrect as well as correct results, and the chance of each
type of outcome is typically communicated in terms of conditional
probabilities, such as the error rates and the predictive values of the
test. In terms of mammography screening, these values can be
understood as follows:
Error Rates A test can err in one of two ways. It can produce a
“healthy” result when there is in fact a disease (a false-negative
result or “miss”), and it can indicate a disease where there is none
(a false-positive result or “false alarm”). If the proportion of false-
negative test results is low, a test is said to have a high sensitivity
(false-negative rate and sensitivity add up to 1), and if the propor-
tion of false-positive results is low, it has a high specificity (false-
positive rate and specificity add up to 1). It is not possible to increase
sensitivity and specificity of a given test at the same time. If, for
instance, the critical value that determines whether a specific test
value on a continuous scale is classified as a positive or negative test
result is changed such that this test becomes more sensitive (to reduce
the number of misses), its rate of false-positive results necessarily
goes up.
In a large American study with more than 26,000 women between
the ages of 30 and 79 years who participated in a first mammogra-
phy screening, the sensitivity was 90% and the specificity was
93.5% (Kerlikowske, Grady, Barclay, Sickles, & Ernster, 1996). A
meta-analysis over several systematic screening programs found—
over all age groups and for a 1-year interval—sensitivities between
83% and 95% and specificities between 94% and 99% (Mushlin,
Kouides, & Shapiro, 1998). The statistical properties of a test, espe-
cially sensitivity, depend on the age of the women, due to changes
in breast tissue (higher sensitivity in older women), but also on the
radiological criteria being used and on the training and experience
of the radiologists (Mühlhauser & Höldke, 1999).
Predictive Values The probability with which a positive test result cor-
rectly predicts the presence of a disease is called the positive pre-
dictive value of a test. Accordingly, the negative predictive value is
defined as the probability with which a negative test result cor-

rectly predicts that the disease is not present. For a woman who
underwent screening and who is wondering about the implications
of her test result, the positive and the negative predictive value are
more useful than the test’s error rates. This is simply because these
values are conditioned on the test result (which is communicated
to the woman), whereas the error rates are conditioned on her
health status (which she does not know; otherwise there would be
no reason to obtain a mammogram). However, as we will show
below, the positive and negative predictive values depend on the
error rates—and also on the prevalence of breast cancer. In the
aforementioned American study, the positive predictive value was
10% and the negative predictive value was 99.9%.
Understanding the Error Rates and Predictive Value of Mammograms

Conditional probabilities such as sensitivity and specificity are
useful for women to know, but they are easily misunderstood.
In particular, the conditional probability of a positive test result
given a disease (a test’s sensitivity) is often confused with the
inverse probability of a disease given a positive test result (a test’s
positive predictive value). Such confusion can have a large impact
on understanding, especially because the two values can differ
greatly.
Experts are not immune to this confusion. In one study, 24 expe-
rienced doctors read the following information about mammogra-
phy screening (Hoffrage & Gigerenzer, 1998):
Problem 1: The probability of breast cancer is 1% for a woman

at age 40 who participates in routine screening. If a woman
has breast cancer, the probability is 80% that she will have a
positive mammogram. If a woman does not have breast cancer,
the probability is 10% that she will also have a positive mam-
mogram.
The doctors were then asked to estimate the probability that a

woman from this group who tests positive actually does have breast
cancer—that is, the positive predictive value of this test. The for-
mula needed to solve this diagnostic inference problem is known
as Bayes’s rule (see below), and the correct estimate for the test’s
positive predictive value is 7.5%. Yet, the doctors’ answers ranged
from 0.7% to 90%, covering almost the entire scale of possible
answers, and the most frequent estimate was 90% (reached by six
doctors who added up the two error rates, and two who took the
complement of the false-positive rate). This is a significant mistake to
make, because it could mean the difference between a doctor telling
a woman with a positive test not to worry and just to have a follow-
up test, or to start thinking about treatment and life with the dis-
ease.
The difficulties that people have in reasoning with conditional
probabilities are often presented as if they were the natural conse-
quence of flawed mental software (e.g., Bar-Hillel, 1980). This view,
however, overlooks the fundamental fact that the human mind
processes information through external representations, and that
using particular representations can improve or impair our ability
to draw correct conclusions based on statistical information. How
can the different perspective of ecological rationality help us to
construct the information environment in a way that fits human
decision mechanisms?
Natural Frequencies
Studies that previously found that physicians (Berwick, Fineberg,
& Weinstein, 1981) and laypeople (see Koehler, 1996b) have great
difficulties in understanding the predictive value of test results
typically presented information in terms of probabilities and per-
centages, as in Problem 1 above. Now consider the following alter-
native representation:
Problem 2: Ten out of every 1,000 women at age 40 who par-

ticipate in routine screening have breast cancer. Of these 10
women with breast cancer, 8 will have a positive mammo-
gram. Of the remaining 990 women without breast cancer, 99
will still have a positive mammogram.
After having read this information, physicians (a different set

from those who saw Problem 1) were asked to imagine a sample of
women in this age group who had a positive mammogram in a rou-
tine screening and to estimate how many of these women actually
do have breast cancer. The correct answer is 8 out of 107, or 7.5%,
as before. In responding to this natural frequency representation,
16 out of 24 physicians gave exactly this answer. In contrast, only 1
of 24 physicians could give the correct answer in Problem 1 when
the statistical information was expressed as probabilities (Hoffrage
& Gigerenzer, 1998). Similar beneficial effects of the natural fre-
quency format were observed in two other studies with medical
students (on average in their fifth year of training) and laypeople
(i.e., psychology students), as summarized in Figure 17-1. All three
of these studies found that when the information was presented in
natural frequencies rather than in probabilities, the proportion of
correct responses according to Bayes’s rule increased systematically
100
Probabilities Natural Frequencies
Correct Bayesian Inferences (%)
75
50
25
0
Laypeople Medical Students Physicians
Figure 17-1: The effect of information representation (probabilities

vs. natural frequencies) on statistical reasoning in laypeople, medi-
cal students, and physicians, shown in terms of the participants’
percentage of correct inferences according to Bayes’s rule. Results
were obtained for 15 Bayesian inference tasks given to laypeople
(Gigerenzer & Hoffrage, 1995), four tasks given to medical students
(Hoffrage, Lindsey, Hertwig, & Gigerenzer, 2000), and the same four
tasks given to physicians (Hoffrage & Gigerenzer, 1998).
for each of the problems. The average proportions of these Bayesian

responses ranged from 10% to 18% for probabilities, and 46%
to 57% for natural frequencies. Other studies arrived at the same
conclusion, namely, that natural frequencies improve Bayesian rea-
soning compared to probabilities and percentages, although the
absolute levels of performance differ considerably between studies
(Brase, 2002; Girotto & Gonzalez, 2001; Kurzenhäuser & Lücking,
2004; Lindsey, Hertwig, & Gigerenzer, 2003; Mellers & McGraw,
1999).
Why Do Natural Frequencies Facilitate Statistical Reasoning?

There are two related arguments for why representations in natural
frequencies can aid understanding of statistical information. The
first is computational: Representation matters because the statisti-
cal reasoning that is required, such as to calculate the positive
predictive value of a test (i.e., Bayesian reasoning), is relatively
simple with natural frequencies, but becomes cumbersome the
moment conditional probabilities are used. For instance, when the
information concerning breast cancer is represented in probabilities

as in Problem 1, above, applying a cognitive algorithm to compute
the positive predictive value amounts to performing the following
computation:
p BC | positive M
p BC p positive M | BC
p BC p positive M | BC p no BC p positive M | no BC

.99
.1 (1)
where BC stands for breast cancer and M for mammogram. The

result is .075 (the 7.5% from above). Equation 1 is Bayes’s rule for
binary hypotheses (here: BC or no BC) and data (here: M positive or
negative).
When the information is presented in natural frequencies, as in
Problem 2, the computation is much simpler:
p ( BC|positive M) =
(BC & positive M)
(BC & positive M) ( no BC & positive
p M)
8
=
8 + 99 (2)
Equation 2 is Bayes’s rule for natural frequencies, where (BC &

positive M) is the number of cases with breast cancer and a positive
mammogram, and (no BC & positive M) is the number of cases with-
out breast cancer but with a positive mammogram. The numerical
answer is the same as for Equation 1, but the computation is
simpler (see Figure 17-2, left panel, for a visual version of this
representation).
Note that it is not just the fact that natural frequencies are whole
numbers that makes them easier to understand. Natural frequencies
are counts of occurrences that have not been normalized with
respect to the base rates of disease and no-disease (Gigerenzer &
Hoffrage, 1995, 1999). Thus, natural frequencies are to be distin-
guished from relative frequencies, conditional probabilities, and
other representations where the underlying counts of occurrences
have been normalized with respect to these base rates. As an exam-
ple for such a normalization, consider the following set of frequen-
cies, equivalent to the values presented already: a base rate of 10
out of 1,000, a sensitivity of 800 out of 1,000, and a false-positive
rate of 100 out of 1,000 (right-hand side of Figure 17-2; see also
Natural frequencies Normalized frequencies

1,000 women 1,000 women
BC no BC BC no BC
10 990
normalization
10 990 BC 1,000 1,000 no BC
8 2 99 891 800 200 100 900

positive M negative M positive M negative M positive M negative M positive M negative M
(correct) (false) (false) (correct) (correct) (false) (false) (correct)
Figure 17-2: Natural frequencies versus normalized frequencies.

Natural frequencies (left) refer to the same reference population,
while normalized frequencies (right) refer to subgroups. BC = breast
cancer, M = mammogram. (Figure adapted from Kurzenhäuser &
Lücking, 2004.)
Kurzenhäuser & Lücking, 2004).2 These relative frequencies are

normalized frequencies that confuse people as much as conditional
probabilities do (Gigerenzer & Hoffrage, 1995, Experiment 2).
There is a second explanation for the facilitative effect of natural
frequencies that brings in an evolutionary perspective. Gigerenzer
and Hoffrage (1995; see also Gigerenzer, 1998a) argued that the
human mind appears to be “tuned” to making inferences from nat-
ural frequencies rather than from the modern inventions of proba-
bilities and percentages, because for most of human existence,
individuals have made inferences from information they encode
sequentially through direct experience. Natural frequencies are
seen as the resulting tally of such a sequential sampling process
(hence the term “natural” frequencies; see Cosmides & Tooby, 1996;
Kleiter, 1994). In contrast, mathematical probability did not emerge
until the mid-17th century; in other words, probabilities and per-
centages are a much more “recent” way to represent statistical infor-
mation. Therefore, Gigerenzer and Hoffrage (1995) posited that the
evolved human mind is adapted to deal with natural frequencies.
2. These are the numbers used in the studies cited in Figure 17-1. As
we mentioned above (Equation 1), the positive predictive value resulting
from this input is 7.5%. With more recent estimates for the prevalence
(0.6%), sensitivity (90%), and false-positive rate (6%), the positive predic-
tive value would be 8.3%.
These findings and explanations have sparked considerable

debate. For instance, it has been argued that the facilitating effect of
natural frequencies is not due to frequencies per se, but to the
nested set structure that natural frequencies entail (Barbey &
Sloman, 2007; Girotto & Gonzalez, 2001; Sloman, Over, Slovak, &
Stibel, 2003). This argument overlooks that “nested sets” is not a
different explanation for the facilitating effect of natural frequen-
cies, but rather a defining feature, and that the claim has never been
made that “frequencies per se” provide a computational facilitation
(Gigerenzer & Hoffrage, 1999, 2007; Hoffrage, Gigerenzer, Krauss, &
Martignon, 2002).
While discussion continues as to the reasons for and extent of
the facilitating effect of natural frequencies in Bayesian inferences
(Brase, 2008), there is consensus about its existence. The effect has
also been observed in more complex diagnostic problems that pro-
vide data from more than one cue for evaluating a hypothesis, for
instance, two medical tests in a row (Krauss, Martignon, & Hoffrage,
1999). Moreover, there is evidence that frequency representations
(not only natural frequencies) can reduce or eliminate other
well-known “cognitive illusions” such as the conjunction fallacy
(Hertwig & Gigerenzer, 1999) or the overconfidence bias (Gigerenzer,
Hoffrage, & Kleinbölting, 1991).
Thus, taken together, the evidence suggests that the efficiency
of medical tests (i.e., their error rates and predictive values) should
be communicated to patients in terms of natural frequencies, rather
than conditional probabilities or normalized frequencies, in order
to foster understanding. This information is relevant not only
for the evaluation of the quality of the diagnostic test itself (and the
decision to participate in it), but also for interpreting the test results
(General Medical Council, 1998; Gigerenzer, 2002; Mühlhauser &
Höldke, 1999; Slaytor & Ward, 1998). Although a positive mammo-
gram is a stressful event for any woman, the interpretation of the
meaning of a positive mammogram could greatly influence its per-
ceived threat: Women who know that 9 out of 10 positive results
later prove to be false positives are likely to be less shaken by a
positive mammogram than women who believe that a positive
result indicates breast cancer with very high certainty (Gigerenzer,
2002; see also Marteau, 1995).
Single-Event Probabilities: What Does a 30% Chance of a Side Effect Mean?

The second commonly confusing type of statistical information
that we consider is single-event probabilities. To communicate the
risk that a particular event will happen in the form of a single-event
probability means to make a statement such as the following: “The
probability that this event will happen is x%.” Such a statement
can be confusing, particularly when it is made without specifying

the class of events to which the probability refers.
The following example illustrates the ambiguity that results from
this omission (Gigerenzer, 2002): A psychiatrist who prescribed
Prozac® to his mildly depressed patients used to inform them that
“you have a 30% to 50% chance of developing a sexual problem”
such as impotence or loss of sexual interest. Hearing this, patients
were concerned and anxious, but the majority of them did not ask
further questions. After learning about the ambiguity of single-
event probabilities, the psychiatrist changed his way of communi-
cating risks and chose an alternative, mathematically equivalent
format. He told patients that out of every 10 people to whom he
prescribed Prozac, 3 to 5 would experience a sexual problem.
Psychologically, this way of communicating the risk of side effects
made a difference. It seemed to put patients more at ease, and they
asked questions such as what to do if they were among the 3 to 5
people. The psychiatrist realized that he had never checked how
his patients understood what “a 30% to 50% chance of developing
a sexual problem” meant. It turned out that many had erroneously
thought that something would go wrong in 30% to 50% of their
sexual encounters.
The important insight from this doctor–patient interaction is
that the psychiatrist’s initial approach to risk communication left
the reference class unclear. A reference class answers the question:
percent OF WHAT? Did the 30% to 50% refer to a class of people
(patients who take Prozac), to a class of events (a given patient’s
sexual encounters), or to some other class? Whereas the psychia-
trist’s default reference class was all his patients taking Prozac, his
patients’ default reference class was their own sexual encounters.
(Such misunderstandings regarding the reference class may even
affect the evaluation of theories—see Hoffrage & Hertwig, 2006.)
When risks are solely communicated in terms of single-event
probabilities, people have little choice but to construct a class spon-
taneously, and different people may do this in different ways,
thereby further adding to the confusion and misunderstandings.
This was demonstrated in a study that asked pedestrians in New
York, Amsterdam, Berlin, Milan, and Athens about their under-
standing of a probabilistic weather forecast, such as “there is a 30%
chance of rain tomorrow” (Gigerenzer, Hertwig, van den Broek,
Fasolo, & Katsikopoulos, 2005). Only in New York did a majority of
participants provide the standard meteorological interpretation,
namely, that when the weather conditions are like today, in 3 out of
10 cases there will be (at least a trace of) rain the next day. In each
of the four European cities, this interpretation was judged as the
least appropriate. The preferred interpretation in Europe was that
it will rain tomorrow “30% of the time” (i.e., for about 8 hours),
followed by “in 30% of the area” covered by the forecast. In other

words, numerical probabilities can be interpreted by members of
the public in multiple, possibly even mutually contradictory ways,
making the task of designing information environments to reduce
confusion all the more important.
The ambiguity of a single-event probability and the resulting
misunderstandings are not limited to the risks of side effects and
precipitation. Single-event probabilities can also have far-reaching
consequences when they are used, for instance, by expert witnesses
to explain DNA evidence in court (Koehler, 1996a), by clinical psy-
chologists and psychiatrists to predict the possibility that a patient
with a mental disorder will commit violent acts (Slovic, Monahan,
& MacGregor, 2000), or by medical organizations to communicate
the benefits and risks of treatments (Gigerenzer, 2002).
There is a straightforward way to reduce confusion about what
single-event probabilities mean: Always communicate the refer-
ence class to which the single-event probabilities pertain. For
instance, people should be told that “30% probability of rain tomor-
row” does not refer to how long, in what area, or how much it will
rain—it means that 3 out of 10 times when meteorologists make
this prediction with this probability, there will be at least a trace
of rain in the area during the next day, no matter where exactly,
when exactly, and for how long exactly. This example further
shows that confusion can be avoided by replacing ambiguous
single-event statements with frequency statements—3 out of 10
instead of 30%. Similarly, the psychiatrist could simply explain to
patients that 3 out of every 10 patients have a side effect from this
drug (Gigerenzer & Edwards, 2003).
The risks and benefits of mammography screening should also
be communicated in terms of frequencies within well-defined
reference classes. One risk of mammography screening is obtaining
a false-positive result. For women who undergo mammography
screening for the first time, 9 out of 10 positive mammograms prove
to be false positives, as mentioned before (Mühlhauser & Höldke,
1999; Rosenberg et al., 2006); and about 1 in 2 women who have
10 annual or biannual mammograms will receive at least one false-
positive result (Elmore et al., 1998). Almost all women with false-
positive results have to undergo an additional mammogram or an
ultrasound scan. About 1 in 5 women with a false-positive result
undergoes a biopsy (Elmore et al., 1998; Mühlhauser & Höldke,
1999) that, as an invasive diagnostic procedure, implies scarring
and also bears its own risks such as wound infections. Moreover,
false-positive results can have psychological costs. Women experi-
ence a considerable amount of stress and anxiety in the weeks
between the (false) positive mammogram and the negative result
of the biopsy. While some are simply relieved afterward and
go back to normal life (Scaf-Klomp, Sandermann, van de Weil,

Otter, & van den Heuvel, 1997), others experience anxiety about
breast cancer and mood impairment that can persist for months
(Gøtzsche & Nielsen, 2006; Lerman et al., 1991). Women with false-
positive results must undergo these additional examinations even
though they do not benefit from them and may face new risks and
stress. Of course, it can only be determined post hoc whether the
first (positive) test was a true or a false positive, and women might
be willing to accept such additional examinations “just to make
sure” (however, even a biopsy can produce errors). Nevertheless,
the potential consequences of receiving a positive mammogram
should be made as clear as possible via properly constructed infor-
mation before women decide to participate in mammography
screening.
Relative Risk Reduction: What Does a 25% Chance of a Treatment Benefit Mean?
In addition to single-event probabilities and conditional probabili-
ties, there is a third type of statistical information that frequently
leads to misunderstandings in communicating risk: relative risk
reduction. What is the benefit of mammography screening with
respect to the risk of dying from breast cancer? Women who ask
this question often hear the following answer: By undergoing rou-
tine mammography screening, women over 40 years of age reduce
their risk of dying from breast cancer by 25%. This number is a
relative risk reduction, which is the relative decrease in the number
of breast cancer deaths among women who participate in mammog-
raphy screening compared to the number of breast cancer deaths
among women who do not participate. As a relative value (more
precisely, as a ratio of two ratios), this number is mute about the
underlying absolute frequencies. One source for estimating these
absolute frequencies are four Swedish randomized control trials
that included women between 40 and 74 years of age (Nystroem
et al., 1996). It was found that out of 1,000 women who did not
participate in mammography screening, 4 died of breast cancer,
while out of 1,000 women who did participate in mammography
screening, there were 3 who died of breast cancer. Screening thus
saved the life of 1 out of 4 women who would otherwise have died
from breast cancer, which is a reduction of 25%.3
Relative risk reduction is not the only way to represent the ben-
efits of mammography. Alternatively, its benefits can be framed in
3. A recent meta-analysis comes to the conclusion that a more realistic

estimate of the effect of mammography screening would be a 15% reduc-
tion in breast cancer mortality, which corresponds to 1 in 2,000 fewer
breast cancer deaths (Gøtzsche & Nielsen, 2006).
terms of absolute risk reduction, namely, the proportion of women

who die from breast cancer without undergoing mammography
screening minus the proportion of those who die from breast cancer
despite being screened. With screening, the proportion of women
who die from breast cancer is reduced from 4 in 1,000 to 3 in 1,000.
That is, the absolute risk reduction is 1 in 1,000 (i.e., 0.1%). Still
another representation of the same information is the number
needed to treat (or screen). This is the number of people who must
participate in the screening to result in one less death from breast
cancer, which is the inverse of the absolute risk reduction. In the
present example, it amounts to 1,000 because with screening there
is 1 less breast cancer death in 1,000 screening participants.
The relative risk reduction is a bigger number and so looks more
impressive than the absolute risk reduction. Health organizations
inform patients about the benefits of mammography screening
almost exclusively in terms of the relative risk reduction, and, per-
haps not surprisingly, people are more likely to prefer an interven-
tion if it is advertised in terms of relative risk rather than absolute
risk reduction (Bucher, Weinbacher, & Gyr, 1994; Gigerenzer, 2002;
Heller, Sandars, Patterson, & McElduff, 2004; Sarfati, Howden-
Chapman, Woodward, & Salmond, 1998; for a review, see Ghosh &
Ghosh, 2005). This suggests that peoples’ decisions depend on their
understanding of numbers, which in turn depends on how those
numbers are externally represented. Again, misunderstandings may
result from confusion about the reference class: Whereas relative
risk reduction refers to women dying of breast cancer, absolute risk
reduction and number needed to treat refer to all women in the
relevant age group who participate in screening. Indeed, the fact
that people frequently overestimate the benefits of screening pro-
grams (Black et al., 1995; Domenighetti et al., 2003; Gigerenzer,
Mata, et al., 2008) is consistent with the possibility that they assume
that the relative risk reduction (e.g., 25%) applies to all those who
participate in screenings, when in fact it refers to the people who
die of the disease without having been screened. Similar problems
occur when not a reduction, but an increase in risk is expressed in
relative terms: As the introductory “pill scare” example showed, a
relative risk increase can sound much more alarming than an abso-
lute risk increase.
Both absolute and relative representations of the raw frequencies
are mathematically correct. Yet, they suggest different amounts of
benefit or harm, are likely to elicit different expectations, and may
ultimately even lead to different decisions. We propose that risks
should be communicated in absolute rather than relative terms, to
give people a chance to realistically assess the absolute order of
magnitude. At a minimum, both pieces of information should be
provided (Gigerenzer & Edwards, 2003; Sackett, 1996).
Mammography Screening Pamphlets: Actual Representation in the Environment
In the previous section, we reviewed the literature on format and

representation effects and made recommendations for how statisti-
cal information should be represented in order to foster statistical
insight. But what representations are actually used in information
materials about mammography screening?
There are various sources of information about mammography
screening, such as physicians, pamphlets, relatives, TV, magazines,
the Internet, and colleagues, to name several. Here, we will focus
on pamphlets, which are—after physicians and the popular media—
the third most important source of information on the early detec-
tion of breast cancer and mammography screening for women of all
age groups, both in Germany (Paepke et al., 2001) and in the United
States (Metsch et al., 1998). Because pamphlets are relatively inex-
pensive to produce and easy to distribute, they are particularly
suitable for communicating information about mass screenings
such as mammography screening (Drossaert, Boer, & Seydel, 1996).
When designing a pamphlet that allows its readers to make informed
decisions, the goal should be twofold: The pamphlet should con-
tain all the information necessary to the reader, and at the same
time, the information should be presented in a way that is as com-
prehensible as possible (Marshall, 1996; see also Dobias, Moyer,
McAchran, Katz, & Sonnad, 2001, for mammography messages in
popular magazines).
Yet, many organizations that publish pamphlets seem to have a
different priority: to increase participation rates per se, rather than
informing the public in a transparent way about the advantages
and disadvantages of screening. Indeed, an analysis of 58 Australian
mammography pamphlets showed that information about the
accuracy of mammography screening was only provided occasion-
ally and in a very general way, for instance, stating that mammo-
grams “are not 100% accurate (or foolproof)” (Slaytor & Ward, 1998,
p. 263; see also Gigerenzer, Mata, et al., 2008). While the sensitivity
was mentioned in a quarter of the pamphlets, none of them gave
information about the specificity or the positive predictive value.
Another finding of the Australian pamphlet analysis was an empha-
sis on incidence rather than mortality to communicate the risk of
breast cancer to women; that is, the lifetime risk of developing
breast cancer was stated in 60% (35 of 58) of the pamphlets, whereas
only 2% (1 of 58) mentioned the lifetime risk of dying from breast
cancer. This emphasis is potentially misleading, because the goal of
mammography screening is to reduce mortality. It cannot reduce
incidence.
An analysis of 27 German pamphlets on mammography screen-
ing (Kurzenhäuser, 2003; see also Gigerenzer, Mata, et al., 2008)
identified similar problems. An ideal mammography pamphlet

should present all the facts that are relevant for women considering
participation. However, the analysis of German mammography
pamphlets showed, just like in the Australian study, that the pre-
sentation of information in the pamphlets is not balanced. On the
one hand, a majority of the pamphlets did provide information
about the incidence of breast cancer (70%), the benefit of reduced
mortality rates through mammography screening (70%), or the rec-
ommended screening interval (85%). On the other hand, only a
minority of the 27 pamphlets informed women about the frequency
of false-positive results (22%), the risk of psychological and physi-
cal strain due to such results (11%), or the predictive value of posi-
tive and negative mammograms (15% and 4%, respectively). Also
similar to the Australian pamphlets, the number of pamphlets that
mentioned the lifetime risk of developing breast cancer was higher
(37%) than the number of pamphlets that mentioned the lifetime
risk of dying from breast cancer (4%).
Another problem was the way in which information about
mortality reduction through mammography screening has been
communicated. About half of these statements were ambiguous,
such as, “Mammography screening reduces breast cancer mortality
by 25%.” This formulation leaves open the question of to which
group of women the reduction of 25% refers and can thus easily
be misinterpreted. The size of the figure “25%” points to the rela-
tive risk reduction, but while experts might recognize this immedi-
ately, many laypeople may not—they may not even be aware of
the distinction between absolute and relative risk reduction, and
may mistakenly interpret this 25% as the figure that captures how
much they can reduce their own, individual risk by participating in
the screening. Such a misunderstanding can even be found in the
popular press; for instance, in a major German newspaper it was
claimed that “if a woman goes to the screening on a regular basis,
then her risk of dying of breast cancer is reduced by 35%” (Schmitt,
2008).
Finally, it should be noted that even when a risk is mentioned
in the pamphlets it is often not accompanied by precise risk fig-
ures. Frequently, the pamphlets use verbal expressions, such as
“Mammography detects most breast tumors,” rather than specific
numbers, such as “Mammography detects more than 90% of breast
tumors.”4 A similar observation was made during counseling ses-
sions on HIV testing in Germany (Gigerenzer, Hoffrage, & Ebert, 1998).
4. In fact, only about half of the information that could be backed up by

statistical data from the literature was actually expressed numerically; the
other expressions were verbal. About two thirds of the numerical expres-
sions were stated as absolute frequencies, one third as percentages.
The debate about what role numbers should play in informing

patients is an old one and we will turn to this issue in the following
section.
Let us summarize the findings of the two pamphlet studies
and relate them to the three types of statistical information men-
tioned above. Single-event probabilities were not found in the
pamphlets, a welcome result, given the potential confusion elicited
by using this format if the reference class is not specified. However,
the representation of risk reduction was problematic: The almost
exclusive use of relative risk reduction and the ambiguous mode
of presentation are likely to foster misunderstandings (here, over-
estimation) of the benefits of mammography screening. For the pre-
sentation of error rates (and thus also of the risks of false-positive
and false-negative results) and predictive values, the choice of the
statistical format in the pamphlets was not the problem, but the
qualitative rather than quantitative form in which the information
was given (or, in many cases, the fact that it was omitted entirely;
see also Zapka et al., 2006). As a consequence, women would only
receive at best a vague idea of the error rates and predictive values
of mammography screening.
Our findings in Germany regarding pamphlets are consistent
with the literature on typical misperceptions of screenings in other
countries. As mentioned above, most people overestimate the ben-
efit of cancer screening (Black et al., 1995; Gigerenzer, Mata, et al.,
2008; Schwartz, Woloshin, Black, & Welch, 1997; Woloshin et al.,
2000). For instance, in one study with 287 women who returned
completed questionnaires, about 17% accurately estimated both
the absolute and the relative risk reduction through mammography
screening, while 14% underestimated and 49% overestimated the
benefit (and 20% did not respond)—even though all these women
had read one of the two risk reduction rates just before estimation
(Schwartz et al., 1997). In addition, a majority of women are not
informed about the risks of mammography screening: 61% of the
women in an Australian study (Cockburn, Pit, & Redman, 1999)
and 92% in an American study (Schwartz et al., 2000) said that the
mammography procedure has no potential negative effects for a
woman without breast cancer.
How accurate is women’s knowledge concerning the errors in
mammography screening? Most women seem to know that false
negatives and false positives can occur (Schwartz et al., 2000), but
they are not well informed about how often they do. An Australian
study found that about a third of women had unrealistically high
expectations of the sensitivity of mammography screening (Barratt
et al., 1999), while another study, also with Australian women,
found the opposite, namely, that many overestimated the false-
negative rate (Cockburn, Redman, Hill, & Henry, 1995). Hamm and
Smith (1998) went one step further and asked patients also to
estimate the predictive values of diagnostic tests (however, not of
mammography screening). They found that patients assumed simi-
lar error rates and positive predictive values for five different diag-
nostic tests, independent of the actual numbers. The patients
expected rather low error rates (false negatives were perceived to be
more likely than false positives) and very high positive predictive
values. If women applied this rationale to the test efficiency of
mammography screening, then one could expect that they would
also overestimate the test’s positive predictive value.
Factors That Hamper Transparent Risk Communication
The research on misperceptions related to mammography screen-

ing shows that, despite the popularity of breast cancer as a media
topic and the widespread use of mammograms, risk communi-
cation about mammography screening is often not transparent.
Transparency, however, is necessary to enable women to make
informed participation decisions about mammography screening.
One obstacle to this ideal is that the information is often communi-
cated such that it is difficult to understand or misleading. As we
already mentioned, the match between our cognitive system and
the way information is represented in the environment is one aspect
of ecological rationality (the present chapter focuses on numeric
presentation formats of risks; for an overview of suggested best
practices for verbal and visual formats, see Lipkus, 2007). Another
aspect comes into play when considering why a risk communicator
chooses a particular representation. In the following, we discuss
some obstacles to transparent risk communication that emerge
either from the physicians’ environments or from their assump-
tions about the patients with whom they have to interact (for more
comprehensive reviews of obstacles and solutions, see Gigerenzer,
Mata, et al., 2008; Skubisz, Reimer, & Hoffrage, 2009).
Institutional Constraints in the Physician’s Environment: Lack of Time,

Lack of Training, Lack of Feedback, Lack of Legal Security
Lack of time was the reason mentioned most frequently by American
physicians for not discussing risks and benefits of cancer screening
tests with their patients (Dunn, Shridharani, Lou, Bernstein, &
Horowitz, 2001). This may not necessarily be simply the physi-
cians’ fault—it can also be a consequence of economic pressures or
other structural shortcomings, such as patients-to-physician ratios
that are too high.
But even if lack of time was not a problem, there are other obsta-
cles: Many physicians are not trained in the communication
skills required for discussing risks and benefits with their patients
(Gigerenzer, 2002; Towle, Godolphin, Grams, & Lamarre, 2006).
Between a quarter and a third of the American physicians in the
previously mentioned study (Dunn et al., 2001) said that the com-
plexity of the topic and a language barrier between themselves and
their patients would keep them from discussing the benefits and
risks of mammography screening with their patients (some even
indicated their own lack of knowledge as a reason).
The evidence on the facilitating effect of intuitive representa-
tions such as natural frequencies presented earlier in this chapter
can also be applied to training programs. Note that the evidence
came from studies in which the format had been experimen-
tally manipulated: The positive effect of natural frequencies was
established without having to provide more knowledge through
training or instruction—just by replacing probabilities and percent-
ages with natural frequencies. But people can also be explicitly
trained to translate conditional probabilities into this format and
thus gain insight even if the information is originally presented
in terms of probabilities. Especially doctors and other health
professionals could benefit from such training, not only for improv-
ing their risk communication skills, but also for improving
their own diagnostic inferences (because they will frequently
encounter statistical information in terms of probabilities and nor-
malized frequencies in medical textbooks). In fact, teaching people
to change representations turns out to be much more effective in
improving diagnostic inferences than training them to apply math-
ematical formulas such as Bayes’s rule (Kurzenhäuser & Hoffrage,
2002; Sedlmeier & Gigerenzer, 2001; see also Gigerenzer, Mata,
et al., 2008).
Given the facilitating effect of natural frequency representations,
it is straightforward to teach risk communicators such an informa-
tion format and to provide them with the necessary data. Textbooks
and training programs, as discussed in the previous paragraph,
are one way to achieve this goal. Another way would be to let the
environment do the work. If physicians lived in an environment in
which they got accurate, timely, and complete feedback, they would
be able to construct natural frequency representations themselves,
based on their own experience. This, however, is often not the case.
Radiologists, for instance, who perform screening mammograms,
usually refer women with a positive result to a pathologist, who in
turn may order a biopsy. If the radiologist is not notified about the
result of the biopsy, he or she cannot build the experience required
to estimate the predictive value of a positive mammogram. An easy
solution to this problem would be to change information flow

among physicians such that the construction of natural frequency
representations would be possible.
Besides lack of time and training, the literature on informed con-
sent points to still another serious institutional constraint that
might hamper transparent risk communication: Physicians might
be hesitant to discuss the risks of cancer screening tests with their
patients, because they fear they could be sued and found negligent
if the patient suffers negative consequences afterward. This hap-
pened, for instance, to Daniel Merenstein, while he was a resident
in a training program for family doctors in Virginia in 1999. In con-
formity with the national, evidence-based clinical guidelines,
Merenstein informed a 53-year-old man about the risks and benefits
of PSA (prostate specific antigen) estimation in the context of pros-
tate cancer screening, after which the patient elected not to apply
the test. When this patient soon afterward developed prostate
cancer, he sued for malpractice. Even though Merenstein himself
was exonerated, the clinic where he spent his residency was found
liable for 1 million U.S. dollars. The prosecution successfully
argued that Merenstein should not have acted according to the
current national guidelines, but rather according to customary prac-
tice, namely, ordering a PSA test automatically for men over 50
(Merenstein, 2004; see also Hurwitz, 2004). It is ironic that trans-
parent risk communication based on the current best available
evidence—intended to reach informed consent in the context of
shared decision making—may be risky itself. This lack of legal
security urgently needs to be resolved to allow physicians to use
best available practice when communicating with their patients.
These institutional barriers located in the physicians’ environ-
ments impede transparent risk communication. We next discuss
barriers related to (mis)perceptions physicians have about their
patients, which, in turn, lead to suboptimal risk communication on
the part of the physicians.
Physicians’ Beliefs About Patients’ Competencies, Desires, and Utilities

One reason why risk communicators (physicians and other health
professionals) are reluctant to specify the risks of tests and treat-
ments may be their belief that patients simply cannot deal with
statistical information (Marteau, 1995) and that the topic of risks
and benefits of screening tests is too complex (Dunn et al., 2001).
In light of the difficulties in basic understanding of statistical infor-
mation, some physicians ask themselves whether informed consent
is a “contemporary myth” (Lemaire, 2006) and whether they should
not “recognize the utopian nature of the goal of properly informed
consent and return to the more honest and realistic paternalism of
the past” (Doyal, 2001, p. 29).
There is indeed no scarcity of studies showing that patients have

problems understanding—or, more precisely, accurately remember-
ing—clinical communication containing statistical information
(e.g., Doyal, 2001; Lemaire, 2006; Lloyd, 2001; Schwartz et al.,
1997; Weinstein, 1999). For example, in a sample of 56 patients
who were counseled on their risk of having a stroke with or without
a certain operation (the operation lowered the stroke risk, but oper-
ation-induced stroke could occur in rare cases as a complication),
only one patient was able to recall the two risk figures 1 month
later. The risk estimates of the others showed a wide range: For the
majority of patients, they were much too high, while in contrast
some had even forgotten that there was a stroke risk associated with
the operation (Lloyd, Hayes, London, Bell, & Naylor, 1999). In
another study, only 56% of 633 women were able to correctly
answer the question of which is greater, a risk of 1 in 112 or 1 in 384
(Grimes & Snively, 1999; see also Yamagishi, 1997). Research shows
that people indeed differ in numeracy, that is, in their ability to
process basic numerical and probability concepts (Lipkus, Samsa,
& Rimer, 2001; Peters et al., 2006). Low numeracy has been dis-
cussed as an explanation for women’s overestimation of the benefit
of mammography screening (Schwartz et al., 1997; Woloshin et al.,
2000), and an individual’s numeracy skills qualify to some extent
the effects of different communication formats on health risk per-
ception (Galesic et al., 2009; Keller & Siegrist, 2009). Thus, on the
one hand, the belief that some patients do not have the ability to
deal with quantitative risk information is partly valid.
On the other hand, this belief should not be used as a justifica-
tion for omitting precise statistical information about the risks and
benefits of medical tests such as mammography screening from the
communication process. First, the overestimation of the benefit of
screening was also found in numerate women (Black et al., 1995).
Second, withholding such information because it is anticipated
that patients lack the capacity to understand it may indeed lead to
a self-fulfilling prophecy. Third, there is promising evidence that
negative effects of low numeracy can be overcome or at least
reduced by smart information representations that visualize out-
comes and simplify information (Hibbard & Peters, 2003; Peters,
Dieckmann, Dixon, Hibbard, & Mertz, 2007). Natural frequencies,
for instance, also helped patients with lower numeracy skills to
better understand positive predictive values of medical screening
tests (Galesic et al., 2009).
Another pair of related conceptions that could likely affect
physicians’ information policies are the beliefs that patients do
not want to be informed in detail about the risks and benefits
of medical tests, or—even if they wanted to be informed—that
patients prefer verbal descriptions (e.g., “very accurate”) over pre-
cise numerical information (see Heilbrun, Philipson, Berman, &
Warren, 1999; Marteau, 1995). It is clear that information demand

will differ between individuals (e.g., Chamot, Charvet, & Perneger,
2005), but overall, research suggests that such assumptions about
desiring little information are wrong. In fact, a large majority of
patients want to be informed about risks and benefits of a medical
procedure or treatment before they commit to it (e.g., Bottorff,
Ratner, Johnson, Lovato, & Joab, 1998; Marteau & Dormandy, 2001),
and this is especially true for women undergoing mammography
screening (Cockburn et al., 1999). Even most of those women who
agree that physicians should actively encourage women to partici-
pate in mammography screening also indicate that women have to
be informed about all the advantages and disadvantages of screen-
ing before making a decision to attend (Cockburn et al., 1999).
But in which form should this information be given? Admittedly,
enabling people to thoroughly understand numerical expressions
of risk is not a trivial task (e.g., Renner, 2004; Weinstein, 1999), and
representations that foster understanding will not invariably suc-
ceed. However, numbers appear to be better suited than words for
communicating risk. Verbal quantifiers such as “high” or “moder-
ate” are less precise than numbers, thus inviting more varied inter-
pretations and achieving an even less accurate understanding
(Burkell, 2004; Marteau et al., 2000; but see Marteau & Dormandy,
2001, for an exception). For example, what seems a “moderate” risk
from the physician’s perspective might well seem a “high” risk from
the patient’s viewpoint (Burkell, 2004). Also, even though most
people like to provide information in categorical terms, they prefer
to receive information numerically when they have to base a deci-
sion on it (Wallsten, Budescu, Zwick, & Kemp, 1993). For instance,
in genetic counseling for breast and ovarian cancer, 73% of those
counseled expressed a preference for the risk to be described in
quantitative formats (Hallowell, Statham, Murton, Green, & Richards,
1997). Additionally, a numerical statement of risk can increase
trust in and comfort with the risk information, compared to a purely
verbal statement (Gurmankin, Baron, & Armstrong, 2004). In sum,
even though people often translate numerical into categorical risk
information during their decision-making process (Bottorff et al.,
1998), they expect numbers at the outset, and they appear to benefit
more from numbers than from words—as long as those numbers are
represented in the right way.
Finally, risk communicators might discourage a transparent dis-
cussion of the risks of mammography screening because they are
afraid that such transparency would keep women away from the
screening and that therefore lives would be lost that could other-
wise be saved (Dunn et al., 2001; Napoli, 1997). Framing can indeed
influence participation rates: Detection behaviors such as under-
taking screening and other diagnostic tests are more effectively
promoted by using a loss frame (i.e., by emphasizing the risk of not

undertaking an action: “failing to detect breast cancer early can cost
you your life”) than by using a gain frame (i.e., by emphasizing the
benefit of undertaking the action: “detecting breast cancer early
can save your life”; Banks et al., 1995; Rothman & Salovey, 1997;
Wilson, Purdon, & Wallston, 1988). An explanation for this finding
is that people’s attitude toward such detection behaviors tends to
be similar to their attitude toward risks, simply because such behav-
iors can reveal a threatening health status. As loss frames can induce
risk-seeking behavior (Rothman & Salovey, 1997; Rothman, Bartels,
Wlaschin, & Salovey, 2006), they may, so the rationale goes, also
induce detection behaviors. Prevention behaviors (e.g., using sun-
screen to avoid skin cancer), on the other hand, cannot reveal bad
or even threatening news and are therefore promoted more effec-
tively with gain-framed messages (Detweiler, Bedell, Salovey,
Pronin, & Rothman, 1999).
The finding that presenting risk information in different ways
can influence patients’ decisions is a challenge to the ideal of
informed consent: Health professionals should be aware of the risk
of manipulating patient decisions with information formats
(Edwards, Elwyn, Covey, Matthews, & Pill, 2001; Morgan & Lave,
1990) This raises ethical issues about the goal of risk communica-
tion: Health professionals can present the information either in such
a way as to reduce framing effects and enhance informed choice—
for example, by expressing the benefits in a variety of forms, or by
using both gain and loss frames. Or they can—in order to enhance
participation rates—frame the benefits of screening in the most
positive light (Edwards et al., 2001; Gigerenzer & Edwards, 2003).
Napoli (1997) suggested that the latter position is responsible for
the widespread use of the relative risk reduction in information
materials about mammography screening, and Phillips, Glendon,
and Knight (1999) suggested a similar motivation for the frequent
use of the “1-in-9 figure” (cumulative lifetime risk), the most dra-
matic way to express a woman’s risk of developing breast cancer.5
This strategy seems to be effective (see the previously mentioned
results on loss framing), but it can also have adverse effects on
5. For most women this is not the best estimate—particularly not for
those women who have not yet developed this disease. The cumulative life-
time risk is a fictitious probability that is attached to a newborn female, com-
puted based on the assumption that today’s probabilities of developing breast
cancer within a specific age group remain constant until this newborn dies
at the age of 85. But because the probability of getting breast cancer between
the ages of, say, 60 and 85 is necessarily smaller than the probability of get-
ting the disease between birth and the age of 85, any woman who has not
yet had breast cancer has a lower probability than 1-in-9 of getting it before
the age of 85.
mammography screening participation: One of the main reasons

for women not participating is fear of diagnosis, which is often
based on an overestimation of their personal breast cancer risk (Aro,
de Koning, Absetz, & Schreck, 1999).
In a review of six studies that focused on the effect of informa-
tion on participation in prostate cancer and prenatal birth defect
screening, Jepson, Forbes, Sowden, and Lewis (2001) found mixed
results and concluded that it is not clear whether informed choice
affects uptake of screening. For mammography screening, the
results are also inconsistent. Matter-Walstra and Hoffrage (2001)
reported that a substantial proportion of women changed their
mind and decided not to participate in the screening after they had
been informed about risks and benefits. On the other hand, Rimer
et al. (2002) found that women who are more knowledgeable about
mammography are more likely to participate in screening, and a
recent review found that personalized risk communication has,
overall, a small but positive effect on uptake of cancer screening
(Edwards et al., 2006).
From an ethical point of view, the goal of enhancing participa-
tion rates instead of informed choice is problematic (Raffle, 2001;
see also Morgan & Lave, 1990), in particular, when participation
can have side effects. This is, in fact, the case for mammography
screening. On the one hand, as Gøtzsche and Nielsen (2006) con-
clude in their meta-analysis, such screening “likely reduces breast
cancer mortality” (p. 5), but on the other hand, it “also leads to over-
diagnosis and overtreatment” (p. 5), and “it is thus not clear whether
screening does more good than harm” (p. 13). Therefore, each
woman should be helped to understand the pros and cons of screen-
ing, to clarify her own values, and to consider, with or without her
physician, what decision would be best for her personally (see
Mullen et al., 2006, for a recent review of consistency between deci-
sions and values and other measures of informed decision making).
Conclusion
Patients, or laypeople in general, often misunderstand risk informa-

tion that they receive from physicians or other health professionals,
and physicians often fare little better. This lack of statistical insight
is often attributed to internal problems of the human mind. This
view, however, overlooks the fundamental fact that the mind
receives and processes information through external representa-
tions, which further implies that the selection of representations
can improve or impair our understanding of statistical informa-
tion considerably. In this chapter, we have seen how conditional
probabilities, single-event probabilities, and relative risks are rep-

resentations that hinder understanding. These obscuring represen-
tations abound in actual instances of medical communication such
as mammography pamphlets. Institutional constraints in the physi-
cians’ environment as well as beliefs about patients’ competencies
and information needs are contributing to their widespread use.
In emphasizing the importance of external factors, we do not
deny that internal factors such as numeracy skills and knowledge
might also impact the task of fostering informed choices. What we
need to explore further is how aspects of the structure of the exter-
nal information environment interact with the decision mecha-
nisms that patients bring to bear on the medical choices they must
make. What is even more important than new research, though, is
that implications from past research be implemented. Better infor-
mation representations, better patient and physician education,
and more legal security for physicians who practice evidence-based
medicine, to name only a few, are measures that are relatively easy
and cost-efficient to achieve. Efforts in this direction may not only
help to prevent major incidents like the U.K. pill scare in the mid-
1990s, but may also enhance truly informed decision making in
this crucial domain.
18
Car Parking as a Game Between
Simple Heuristics
John M. C. Hutchinson
Carola Fanselow
Peter M. Todd
The road to success is lined with many tempting parking

spaces.
Anonymous
Y ou are driving into town looking for somewhere to park. There

seem not to be many parking spaces available at this time of day,
and the closer you get to your destination, the fewer vacancies there
are. After encountering a long stretch without a single vacancy, you
fear that you have left it too late and are pleased to take the next
place available—but then somewhat annoyed when completing
the journey on foot to find many vacancies right next to your desti-
nation. Evidently everyone else had also assumed that the best
spots must have been taken and had parked before checking them.
Something to remember for next time: Given the pessimistic habits
of others maybe it would be better to try a different strategy by driv-
ing straight to the destination and then searching outward.
For many of us, looking for a good parking space is a very famil-
iar problem, and we probably expend some mental effort not to be
too inefficient at it, especially in the rain or when we expect to
carry a load back to our car. However, finding the best parking space
can never be guaranteed because we lack full information of the
spaces and competitors ahead. Moreover, even if our ambition is
merely to make the best decision on average from the information
available, the quantity and diversity of this information (parking
patterns already observed, time of day, number of other drivers,
etc.) suggest that processing it optimally is too complex for our
cognitive capabilities. Various authors have come to a similar con-
clusion. As van der Goot (1982, p. 109) put it, “There is every reason
454
CAR PARKING AS A GAME BETWEEN SIMPLE HEURISTICS 455
to doubt whether the choice of a parking place is (always) preceded

by a conscious and rational process of weighing the various possi-
bilities.” In his book Traffic, Vanderbilt (2008, p. 146) noted that
with regard to foraging for food or for parking, “neither animals
nor humans always follow optimal strategies,” owing to cognitive
limitations.
Instead, we envisage that drivers typically use fairly simple
heuristics (rules of thumb) to make good decisions, if not the best
possible ones, about where and when to park. An example could
be, “If I have not found a space in the last 5 minutes, take the next
one I encounter.” As we have seen throughout this book, there
are many other decision domains in which simple heuristics have
been found that can perform about as well as more complex solu-
tions, by taking advantage of the available structures of information
in the decision environment. Furthermore, these heuristics often
generalize to new situations better than do complex strategies,
because they avoid overfitting (see chapter 2). Could simple rules
also be successful at the task of finding a good parking space? And
what features of the parking environment, itself shaped by the
decisions of drivers seeking a space, might such rules exploit to
guide us to better choices? These are the questions we explore in
this chapter.
Selecting a parking space belongs to the class of sequential search
problems, for which some successful heuristics have already
been explored in the literature. These problems crop up in many
different domains, whenever choices must be made between
opportunities that arise more-or-less one at a time; in particularly
challenging (and realistic) cases the qualities of future opportuni-
ties are unpredictable and returning to an opportunity that arose
earlier is costly or impossible. Thus, decisions about each opportu-
nity must be made on the spot: Should I accept this option, or reject
it and keep searching for a better one? This decision can depend on
the memory of the qualities of past opportunities, for instance
by using those qualities to set an aspiration level to guide further
search (Herbert Simon’s notion of satisficing search—see Simon,
1955a). One familiar, and well investigated, example of sequential
search is mate search (e.g., Hutchinson & Halupka, 2004; Todd &
Miller, 1999; Wiegmann & Morris, 2005). Another example, which
also might have been familiar to our ancestors, is deciding at which
potential campsite to stop for the night when on a journey through
unfamiliar territory. Simple heuristics that work well, and that
people use, have been studied in a number of sequential search set-
tings (Dudey & Todd, 2002; Hey, 1982; Martin & Moon, 1992; Seale
& Rapoport, 1997, 2000). The search for a parking space is a version
of such sequential decision making: Parking spaces are encoun-

tered one at a time and must be decided upon when they are found
in ignorance of whether better spaces lie ahead; moreover, they are
often unavailable to return to later because other drivers may have
filled them up.
It seems plausible that heuristics that work well in one sequen-
tial-search domain will work well in another. If evolution has
adapted our heuristics in domains such as mate choice, we might
tend to apply similar heuristics in novel sequential-search contexts
such as selecting a house or parking a car. We might also have the
ability to invent new heuristics for novel situations, but those that
prove satisfactory may be the same ones as used in other sequen-
tial-choice problems. In either case, good candidates for parking
heuristics might be those already proposed for other sequential
search problems, so we will begin our exploration with a set of
such strategies.
There are several reasons why car parking provides a particu-
larly tractable example of sequential search to model and test
empirically. One advantage is that it seems reasonable in many
cases to quantify parking-site quality simply as the distance from
one’s destination, whereas in other domains, such as mate search,
quality is often multidimensional, difficult to measure, and not
ranked consistently by different individuals. Another advantage is
that once a car is parked the decision making is over, avoiding the
complications of multiple and reversible decisions that occur in
some other domains such as mate choice. Furthermore, parking
decisions take place over an easily observable time scale. And
because car parking is a problem that many people encounter
repeatedly, they have the possibility to adapt the parameters of any
heuristic they use to the environment encountered. This improves
the chance that empirical observations will match predictions made
assuming individuals maximize their performance—the predic-
tions from the computer simulations in this chapter are based on
this assumption.
The world of parking has another aspect that motivated us to
work on this problem—the pattern of available and filled parking
places is not generated by randomly sprinkling cars across a park-
ing lot but rather is created through the decisions of the drivers
who parked earlier. It thus provides a familiar simple example
of an important class of problems in which critical aspects of the
environment are constructed by the other players. Our goal is to
find which heuristics work well for choosing parking places in the
environment (pattern of vacant spaces) created by the heuristics
used by others. Because we expect the heuristics used by others
also to have been chosen to work well against their competitors,
we arrive in the world of game theory. In game theory the usual
approach is to search for equilibria. At equilibrium the distribution

of strategies in the population is such that any driver cannot
improve performance by choosing a different strategy, so there is no
incentive to change strategy. Consequently, populations that reach
these equilibria should remain on them and thus such equilibria
are what we generally expect to observe.1 Note that we are not
envisaging that drivers use introspection to calculate which strate-
gies will lead to equilibria, but rather that through trial and error
and simple learning rules they discard poorly performing strategies
and come to use those that work well.2
In this chapter we describe our search for decision strategy
equilibria in an agent-based model that simulates drivers making
parking decisions along an idealized road. By investigating equilib-
ria, we sought strategies that work well against each other in the
social environments that they themselves create. We begin by con-
sidering past work on parking and other forms of sequential search,
then describe our model and the equilibria that emerge both when
all drivers must use the same strategy and in the more realistic set-
ting when we allow drivers to differ in the ways they search. Given
the similarities between parking search and other forms of search
mentioned earlier, results from the parking domain may be infor-
mative about other sequential-search domains, but here we concen-
trate on just the single domain, demonstrating an approach to
exploring ecological rationality in situations where agents shape
their own environments.
Previous Work on Parking Strategies
Curiously, the strategic game-theoretic aspect of the parking prob-

lem seems to have been largely neglected in earlier studies of
car parking. Most have assumed a randomly produced pattern of
available spaces at some constant density, patently different from
the situations real drivers encounter. In one of the few analyses to
consider the patterns created by drivers parking, Anderson and de
Palma (2004) explored the equilibrium occupancy of parking places
in a situation similar to ours, with the aim of devising pricing
1. However, such equilibria need not exist, and environmental fluctua-

tions can lead to them not being attained. Also, if several theoretical equi-
libria exist, it can be difficult to predict which of them will be occupied in
a real setting.
2. Both these processes of introspection and learning normally lead to
the same equilibria (e.g., Kreps 1990, chapter 6), but we believe that econo-
mists and psychologists have tended to overemphasize the use of intro-
spection in everyday (as opposed to novel experimental) situations.
for parking that would alleviate congestion near the destination.

However, they assumed that parking search “can be described by a
stochastic process with replacement” (p. 5) in which drivers check
the availability of spots at random and forget if they have checked
one before, which is very different from the plausible search pro-
cess that we consider. An earlier model of the effects of pricing by
Arnott and Rowse (1999) had drivers use a decision rule based on
their distance from the destination (the fixed-distance heuristic
that we describe later) and assumed independence of neighboring
parking spaces (which they note is an approximation)—but the
nonindependent pattern of spaces created by other drivers is exactly
the kind of structure that we want to investigate.
The problem of determining good strategies for finding a parking
place has also been addressed within the more abstract mathemati-
cal framework of optimal stopping problems (DeGroot, 1970). In
the original formulation of what was called the “Parking Problem”
(MacQueen & Miller, 1960), drivers proceed down an endless street
toward a destination somewhere along that street, passing parking
places that are occupied with some constant probability p, and
they must choose a space that minimizes the distance from the
destination (either before or after it). The optimal strategy in this
case is a threshold rule that takes the first vacancy encountered
after coming within r parking places of the destination, where r
depends on the density p of occupied parking places. For instance,
if p = .9, then r = 6, while if p ≤ .5, then r = 0 (i.e., you should drive
all the way to the destination and start looking for a space beyond;
Ferguson, n.d., p. 2.11). This optimal solution provides a useful
comparison for our simulations. However, in this original Parking
Problem parking places are filled randomly so that the probability
of one being occupied is independent of its location or of whether
neighboring places are occupied. Besides not assuming such inde-
pendence, our scenario also differs in that we mostly use a dead-
end rather than infinite road, and we consider performance criteria
other than just parking distance from the destination. Other math-
ematical analyses of extensions of the parking problem have con-
sidered various complications (e.g., allowing drivers to turn around
at any point—Tamaki, 1988—and varying the probability of occu-
pancy as a function of distance from the destination—Tamaki,
1985), but not in ways that address the game-theoretic aspects that
we focus on.
Experimental investigations of heuristics used by people for
more abstract sequential search problems have been carried out by
Seale and Rapoport (1997, 2000). They studied the classic secretary
problem, in which people see a sequence of numbers (or their ranks)
one at a time and try to select the highest value, without being able
to return to any previously seen value. They investigated three
types of rule: cutoff rules, which check a fixed proportion of the

available options and then take the first option thereafter that is
better than all those previously seen; candidate count rules, which
stop on the nth candidate seen, where a candidate is an option that
is better than all options previously encountered; and successive
non-candidate count rules, which count up the number of values
seen since the previous candidate and stop at the next candidate
after that count has exceeded some threshold. By testing these rules
with different parameter values in simulation and experiments,
Seale and Rapoport found that cutoff rules perform best (they are
optimal under some assumptions) and are most often used by
people in experiments. Successive non-candidate count rules came
close in performance, but candidate count rules fared poorly. Dudey
and Todd (2002) considered how these rules performed in the
task of maximizing expected quality (rather than maximizing the
chance of finding the highest quality individual from a set) and
found the same relative performances. In addition, when environ-
ments changed by getting better over time (e.g., when the distribu-
tion from which encountered options are drawn shifts upward with
successive options), cutoff rules continued to perform best; this
situation corresponds roughly to the parking situation we consider
here, where drivers encounter a sequence of spaces that by defini-
tion improve the closer they get to their destination. (See Bearden
& Connolly, 2007, and Lee, 2006, for empirical and theoretical
extensions of the sequential search problem.)
Hutchinson and Halupka (2004) compared the performance of
various heuristics in a somewhat different sequential choice sce-
nario based on mate choice. The cutoff and candidate count rules
performed much worse than heuristics in which males choose the
first female who exceeds a fixed quality threshold (or one from a
sequence of declining thresholds). The values of these thresholds
were envisaged to have evolved in response to the distributions of
mate qualities encountered by the population in earlier years.
Likewise with parking, drivers may well know from experience the
likely distribution of “qualities” available (certainly this is the
assumption of our game-theoretic analysis), so it could again be
true that fixed thresholds perform well.
Modeling the Interaction of Parking Strategies
To investigate the performance and equilibria of parking heuristics

in different environments, we set up an agent-based model in which
drivers follow various heuristics as they drive along a road search-
ing for a good parking space. In this section we describe the fixed
aspects of the environment in which parking takes place, including
its physical layout and some social factors such as the flux of
arriving cars.
Many real-life parking decisions are complicated by the intricate
topology of streets and parking lots and the idiosyncratic variation
in their likelihood of having vacancies. Indeed, most empirical
work on parking has focused on these higher level structures and
how drivers deal with them, for instance, how they decide which
streets to drive down or parking lots to check to find a good
spot (Salomon, 1986; Thompson & Richardson, 1998). To turn the
spotlight instead on how drivers decide between individual park-
ing places, we constructed our model around a very simple and
constant topology: a long dead-end street, with an approach lane
leading to the destination and a return lane leading away from it,
and a parking strip (central reservation) between the two lanes, one
car wide, where cars going in either direction can park (Figure 18-1).
All drivers have the same destination at the end of this street,
and all pass a common starting point that is far enough away to
be clear of parked cars. There are 150 parking places up to the des-
tination, which is sufficient, given the other conditions, for drivers
always to find somewhere to park. If cars fail to select a parking
space as they approach the destination, they turn around and take
the first vacancy they come to on their way out. Turning around
anywhere other than at the destination is not allowed. Once parked,
drivers walk to the destination, spend a variable but predeter-
mined time there, walk back, and then drive away in the return lane.
We explain later the various rules by which we allow drivers
to decide whether to park in a vacant parking place. All the rules
assume that drivers cannot see whether parking places in front of
them are occupied, with the consistent exception that on their
way to the destination drivers never take a space if the next place
beyond it is also empty. Just occasionally this catches drivers
Destination
1 2 3 4 5 6 7 8
Figure 18-1: The structure of the parking situation, showing the

approach lane along the top, return lane along the bottom, park-
ing strip in between, and the destination and turn-around point.
The car in the approach lane will be able to park in the sixth place
from the destination, which is just being vacated. The car that has
just turned round at the destination will be able to park at the fourth
place from the destination.
out when a car in the return lane takes the space in front before
they get to it.3
We model time as discrete steps of 0.75 seconds, the time taken
to drive past one parking place (if it is 5 meters long, and speed is
22.5 kilometers per hour). Turning around at the destination is
instantaneous. We assume that walking is one-fifth the speed of
driving. The time a driver spends at the destination is randomly
drawn from a gamma distribution with a mean of 30 minutes, with
shape parameter 2 (i.e., a skewed bell shape with mode = 15 min-
utes), and an upper limit of 3 hours. Observed parking time
distributions are indeed skewed like this or even more so (Young,
1986). Each day, the parking strip starts empty and 1,080 cars
arrive at the end of the street over a period of 9 hours (averaging
two per minute). Arrival times within this period are randomly
drawn from a uniform distribution, except that if two cars draw the
same 0.75-second time step, one randomly draws another time.
In our first investigations we make the simplifying assumption
that the population is composed of drivers all using the same heu-
ristic, and we assess the performance of a single “mutant” driver
using a modified heuristic in such a social environment. (Later we
relax this assumption and develop an evolutionary algorithm in
which there can be many coexisting strategies in the population
competing against each other.) To make comparisons between dif-
ferent strategies efficient, we compare what would happen to the
same car if it went back to its original starting position and tried
another strategy (cf. a repeated-measures design) in the following
way: Each day a car is selected at random from those arriving,
and the simulation proceeds until it is time for this car to enter the
street. The state of all cars is then stored, and the simulation
proceeds with the focal car’s driver using one particular strategy.
Once the driver selects a parking space and the strategy’s perfor-
mance has been assessed, the original state of the street at the car’s
arrival time is restored. Then the simulation restarts with the focal
car using another strategy, but with all other drivers arriving at the
same times, spending the same times at the destination and using
the same strategies as before. Our comparisons of strategies were
typically based on means of 100,000 focal cars.4
3. For each time step we work backward from the destination allowing
cars in the incoming lane to move toward the destination or park in an
adjacent space if empty, then we work back down the return lane, from the
exit toward the destination, moving each car one space forward or letting
it park, and then again in the same direction allowing parked cars to leave
if the owner has returned to the car and there is an adjacent empty gap in
the return lane.
4. A different procedure was used to compare situations in which every
individual in the population uses the same strategy. For each day we
A Nash Equilibrium for a Simple Satisficing Strategy
Our main aim is to understand which parking strategies are eco-

logically rational. This requires specifying the environment, which
is strongly shaped by the strategies used by other parkers. In this
section we investigate the dependence of a driver’s parking per-
formance on the strategies used by that driver and by other drivers
and use these results to calculate how the population strategy
would evolve if all drivers select strategies that increase their
performance. For ease of illustration, we will consider in this sec-
tion only the very simple fixed-distance heuristic that ignores all
spaces until the car reaches D places from the destination and then
takes the first vacancy (unless, as always, there is another vacancy
immediately ahead). This is a form of satisficing with parameter
D defining the aspiration level.
We first ask how well one driver does by changing strategy while
the rest of the population uses the fixed-distance strategy with an
aspiration level fixed at DP (the population parameter). Each simu-
lation day we calculated the performance of a driver who uses
the same fixed-distance strategy but with different mutant values
of the parameter (Dm). The open circles in Figure 18-2 show how
the mutant strategies performed as we changed Dm when the popu-
lation was using DP = 45. Performance is here assessed in terms of
total travel time, including time to drive to and from the parking
place and walk to and from the destination, so lower values indi-
cate better performance.5
One conspicuous feature of the graph is the sudden deterioration
in performance if the mutant driver accepts a place farther from
the destination than everybody else does. The reason is that there
is often a vacancy 46 places from the destination that everybody
else (using DP = 45) has ignored; a mutant using Dm = 46 is
thus quite likely to end up there and perform much worse than the
population average. (The same is true of mutants using larger values
of Dm.) There is thus a considerable advantage in holding out as
long as everybody else, but it matters less how much closer to the
destination one’s threshold is (i.e., how much lower Dm is than DP).
If the mutant instead uses Dm = 44, there probably will not be
another vacancy for some distance (because spaces in this region
would have been taken by other members of the population). In
recorded the performance of every car and took the average. We then aver-
aged this average over 100,000 days of independent simulations.
5. More precisely, the vertical axis measures the time from arriving at
a starting position 150 spaces from the destination until returning to this
starting position on the way back, but omitting the time spent at the des-
tination itself.
540 DP = 15
DP = 31
DP = 45
Dm = DP
Travel Time (Seconds)
520
500
480
460
0 10 20 30 40 50 60
Dm = Mutant Aspiration Level (Places From Destination)
Figure 18-2: Performance of the fixed-distance parking heuristic

depends on the aspiration levels of both the focal driver and other
drivers in the population. Performance is taken as total travel time
to and from the destination (the mean over 100,000 individuals
each randomly selected from a different day). Each point shows
the performance of a single mutant using an aspiration level of Dm
when the rest of the population uses DP. Different symbols are used
for three different levels of DP; in each case the minimum in the
curve is the best Dm for an individual to select. The diamond symbol
highlights cases of the mutant using the same aspiration level as the
population; this only minimizes travel time when DP = 31, the Nash
equilibrium.
fact, the next available space, say at position K, would also be the
one taken by mutants with values of Dm between DP and K, so those
mutant strategies will therefore have similar levels of performance
to Dm = 44 (as shown by the flattening of the line of open circles). If
we change the population’s value of DP by a few places the position
of the kink in the graph shifts correspondingly.
The line of crosses in Figure 18-2 shows the outcome when the
population strategy shifts more dramatically to DP = 15. The kink
has disappeared and the mutant driver now does better to accept a
space farther from the destination than would the rest of the popu-
lation. This is because if it seeks only a closer space (Dm < 15), it
will probably not find one on the way to the destination and will
thus waste time driving there and back before taking one farther
than 15 parking places from the destination; this probably was
already available on the inward journey. In this social environment
(DP = 15) it is better for the mutant to take Dm = 45 than Dm = 15,

whereas in the other social environment (DP = 45) the converse was
true. Is there a stable equilibrium strategy between these two points
where it pays to be exactly as picky as the rest of the population?
To find out, we proceeded through a succession of steps in which
the population strategy always shifts to the strategy of the most suc-
cessful mutant tested in the previous step. So for instance, in a
population using DP = 15 a mutant with Dm = 35 would be found to
be the best strategy, and thus the population as a whole would next
shift to DP = 35.6
Whatever population strategy we started with, this algorithm
eventually settled on strategy DP = 31. At that point no mutant
strategy does better (Figure 18-2, line of solid dots), so we have
reached what in game theory is termed a Nash equilibrium (e.g.,
Fudenberg & Tirole, 1991). Theoreticians are particularly interested
in finding Nash equilibria, because once a population reaches an
equilibrium there is no incentive for an individual to use a different
strategy. Thus, given sufficient consistency in the environmental
conditions, we might expect to observe populations occupying
such equilibria.
In practice, in a population using DP = 31 real drivers would
be unlikely to experience enough trials to distinguish the per-
formances of strategies with slightly higher or lower parameter
settings, because the performance differences are quite small in
that range. Nevertheless, someone trying to park each working
day along the same street at the same time could probably gain
enough feedback to learn to avoid extreme deviations from the
equilibrium parameter value. So, a population of such drivers might
occupy a loose equilibrium somewhere around this parameter
value. (In the next section we will see how much the value of DP
changes when aspects of the environment change, such as car
arrival rates. In the real world drivers experience a range of parking
environments and it is questionable to what extent feedback gained
in one environment may usefully be applied in another.)
In theory, more than one Nash equilibrium may exist. We looked
for other pure Nash equilibria by starting the search algorithm
from different parts of the parameter space and by allowing DP to
change only a small step each time. We did not find another equi-
librium, but these methods are not infallible. Another limitation of
6. For reasons of computational efficiency, our search algorithm allows

only more gradual change than this: We compare just a few mutant strategies
near to the current population strategy and change the population strategy
to the best of these. This process is iterated until we get to a situation where
no nearby mutant strategy outperforms the population strategy, at which
point we check whether a wider range of mutant strategies would do better.
the algorithm used is that it can find only pure Nash equilibria, in
which every individual adopts the same value of D. But it is also
possible for the population to reach mixed equilibria, in which dif-
ferent values of D would be used by different drivers according to a
particular probability distribution that results in an equal mean
payoff for all drivers. Later we describe an evolutionary algorithm
we used to search for such mixed equilibria.
In the search process just presented, the population’s overall
change in strategy toward the Nash equilibrium is driven by the
selfish behavior of individuals adopting the best-performing strat-
egy. But the mean performance of individuals in the population
need not improve as the population approaches this equilibrium
and may get worse (related to the Tragedy of the Commons, where
all individuals seeking to maximize their own benefit makes things
worse for everyone; in real life people being picky about parking
spaces further reduces overall performance because of the increased
traffic generated; Vanderbilt, 2008, pp. 149 ff.). Here, when DP = 62
we find the social optimum that minimizes mean total travel time,
to 462 seconds, which is 15 seconds less than the mean travel time
for everyone at the Nash equilibrium. Thus, the population as a
whole suffers at equilibrium from everyone’s attempts to find better
parking spots.
A Brief Sensitivity Analysis

In the previous section we allowed the pattern of parking spaces to
change as the population strategy evolved but kept constant the
underlying environment, such as the topology of the street. Real-
life parking situations vary widely in such respects and most driv-
ers will face this variety regularly. How robust are our results when
the environment varies?
One aspect of the underlying environment is the rate at which
drivers arrive. Halving the rate considerably changes the equilib-
rium aspiration level from DP = 31 to DP = 11 places from the desti-
nation (i.e., drivers are more ambitious if there is less competition
for spaces). Another situation of reduced competition is at the
beginning of the day before the street has had a chance to fill up. If
drivers know that they are among the first 150 parkers of the day,
they should change their aspiration level, but it turns out that there
is no pure Nash equilibrium. If the 150 cars in this population play
a value of DP around 20, the best response is to use a value of Dm in
the low 30s, but the converse is true, too. And if DP lies between
20 and 30, the best response is also either around 20 or in the low
30s. (In theory the population might cycle in the parameters that it
uses, but there may well be a mixed equilibrium involving different
parkers using different strategies; we have not investigated this.)
There is also no pure Nash equilibrium if we change the topol-

ogy of the environment so that the destination lies halfway along a
one-way street (as in the original mathematical formulation of the
Parking Problem of MacQueen & Miller, 1960). If DP is 24 or less,
the best response of a rare mutant is to take a higher value of D. But
if everybody else then applies an aspiration level of 25 (or any
single higher value), the best response of a rare mutant is to take a
lower value (e.g., if DP = 25, the best Dm = 18).
In sum, the result that we found in the previous section is not all
that robust. Real drivers who encounter a variety of parking situa-
tions might try to adjust the parameters of their heuristics appropri-
ately, but knowing the right adjustment for many situations seems
an impossible task even were the driver fully informed about the
density of arriving drivers and so on. Thus, it seems rather that a
robust sort of heuristic that performs reasonably well in a variety of
situations without the need for fine tuning would be more useful.
We do go on to investigate other sorts of heuristics, but it was
beyond the scope of our project to decide which is the most robust
in this sense, partly because the answer would depend on how and
how much we allow the environments to vary, which is either an
arbitrary choice or would require extensive empirical analysis of
real drivers’ experiences. Rather, we restrict the rest of this chapter
to consideration of the same underlying environment as considered
earlier; there are several further lessons to be learned from this
model system.
Alternative Measures of Performance

Definitions of ecological rationality stress that it is necessary to
specify the currency by which performance is assessed. In a game-
theoretic situation, the currency is doubly important because it
affects which strategies are selected by others and thereby the
(social) environment that they create. Would the population adopt
similar equilibrium parameters for the fixed-distance strategy if
other aspects of performance matter more than total travel time? A
number of criteria have been identified as important to drivers
when selecting a parking location, including cost, parking time
limits, accessibility (e.g., parallel parking or not), and legality of a
spot (van der Goot, 1982).
When travel time is the measure of performance, cars that find
a space before reaching the destination perform better than those
that find the same space but only on the way back after having to
turn around. In the real world, the hassle of turning around may
make it appropriate to decrease the performance score even further.
The opposite extreme is to ignore any time spent in the comfort of
the car and to focus just on the distance from the parking space
to the destination. This distance should be easier to judge than

total travel time, and time to walk to the destination is known to
matter greatly (Vanderbilt, 2008; van der Goot, 1982). Using this
performance measure changes the Nash equilibrium from DP = 31 to
DP = 23. The only cost of being more picky in this case is that you
might pass a vacancy that another car will take before you get back
to it after turning around, should you fail to find a closer space.
This rarely happens if the acceptance threshold D is close to the
destination, and consequently small changes in the value of D make
little difference to this measure of performance.
Another possible performance measure is for drivers to count the
number of free spaces they pass as they walk to the destination.
Minimizing this measure leads to the population not attempting to
park until about eight places from the destination, although the fit-
ness landscape is so flat around this value that we could not resolve
whether a pure Nash equilibrium truly exists.
Many other variations on these performance criteria could be
employed by drivers. We have considered only mean times and
distances, but drivers may have a disproportionate dislike of par-
ticularly long walks or delays, especially if they have an appoint-
ment. Suppose that you aim to reduce your chance of missing an
appointment to 5% and are willing to start your journey as early as
necessary to achieve this. But you seek to minimize how much
earlier you must leave by choosing an appropriate parking heuris-
tic. In that case the performance currency is the 95th percentile
of time taken to arrive at the destination. This again can lead to a
different equilibrium strategy.
In these analyses we have assumed that drivers try a local range
of different values of D and select the one that works best. But in
reality we think that drivers would take into account their experi-
ence when using a particular value of D to direct which other values
they try later. For instance, if after parking you walk past lots of
closer free spaces, you might try a lower value of D next time.
Conversely if you have to turn around at the destination and end up
finding a space farther away than the value of D you used, a reason-
able learning rule might increase D toward the distance of the
actual parking place you took. Such learning rules will not neces-
sarily lead to the equilibria described earlier.
Other Ways to Park
A Selection of Simple Parking Heuristics

So far we have considered just one kind of parking heuristic (the
fixed-distance heuristic), but drivers could use all sorts of others.
Now we consider a set of seven simple heuristics, each with just

one or two parameters. All were inspired by related rules for search
that have been suggested in other domains of psychology and eco-
nomics, and all operate by setting some threshold and parking
when it is met (in contrast to, for instance, looking for specific pat-
terns of parked cars and spaces—cf. Corbin, Olson, & Abbondanza,
1975). The thresholds are applied to more-or-less easily comput-
able aspects of the parking environment, such as current distance
to the destination, counts of the number of empty or occupied park-
ing places that the car has passed, and relations between these
values that measure the observed density of available spaces.7
The fixed-distance heuristic that we have analyzed in the previ-
ous sections takes the first vacancy encountered within a fixed dis-
tance (number of parking places) D of the destination, ignoring
all information provided by the pattern of occupancy encountered
en route. This heuristic, while simple, requires knowledge of how
far away the destination is—not always easy to judge accurately,
especially in a novel environment.
The proportional-distance heuristic takes the first vacancy after
driving a proportion P of the distance between the first occupied
place encountered and the destination. For instance, if P = .3 and
the first parked car passed was 60 parking places from the destina-
tion, then this strategy will take the first empty space encountered
60 × .3 = 18 or more parking places farther on. Again knowledge of
the distance to the destination is required, but this heuristic
also responds to the parked cars encountered. This has similarities
to Seale and Rapoport’s (1997) cutoff rule for sequential search in
the secretary problem, in that an aspiration level (e.g., “within 42
places of the destination”) is set using information from a fixed
number of items encountered initially (in our case the position of
the first parked car).
The car-count heuristic parks in the first vacancy after passing
C parked cars (without considering how many free spaces have
been passed). This would be equivalent to a non-candidate count
rule (where occupied places are non-candidates) in Seale and
Rapoport’s (1997) scheme, something that they did not assess.
The space-count heuristic selects the first space after reaching
the first parked car and then passing S available spaces (without con-
sidering how many parked cars have been passed). This heuristic
7. All strategies will not park in an empty space if the next parking place
closer to the destination is also empty; instead, they move one place for-
ward, reevaluate the available information, and make a decision again. All
strategies take the first free place after turning around at the destination.
is equivalent to Seale and Rapoport’s (1997) candidate-count rule,

where candidates here are spaces.
The block-count heuristic chooses the first space after passing a
block of at least B parked cars without a space. This mirrors Seale
and Rapoport’s (1997) successive non-candidate count rule.
The x-out-of-y heuristic takes a space only if x or more parking
places were occupied out of the last y (or fewer) places passed
(excluding the one currently alongside). (When y = the total number
of possible parking spaces, this rule is the same as the car-count
heuristic with C = x, and when x = y this rule is equivalent to the
block-count heuristic with B = x.)
The linear-operator heuristic keeps a moving average of the pro-
portion of occupied places passed, using an exponentially fading
memory (zi = a zi−1 + bi, where zi is the average at i places after the
start, a < 1 is a constant controlling how rapidly the memory of past
occupancy fades, and bi = 0 if the ith place is vacant or −1 if occu-
pied; z0 = 0). The driver parks in a space only if the updated current
average is above a threshold value zT. For ease of comparison when
the value of a differs, we report this threshold value as a proportion
zpT of the maximum attainable value of zi, which is 1/(1 − a), so that
zpT = zT(1 − a). (As a approaches 1, this heuristic approaches the car-
count heuristic.)
The last two strategies are related in that they require no knowl-
edge of the position of the destination and respond only to a locally
high density of parked cars. (The block-count heuristic can be
thought of similarly.) Both approaches have been used to model
moving memory windows (e.g., Groß et al., 2008; Hutchinson,
McNamara, & Cuthill, 1993; Roitberg, Reid, & Li, 1993). Keeping
tallies in the x-out-of-y heuristic may seem cognitively simpler
than the multiplication required by the linear operator. But as we
will see, large values of y are favored in the equilibria that we have
found for the x-out-of-y heuristic, requiring the driver to keep in
memory a running window of the exact occupancy pattern of the
last 20 spaces or more. Such exact memory seems less plausible
than the linear operator’s multiplicative mechanism for biasing the
estimate of occupancy toward the most recent experience.
Other possible heuristics could analyze the pattern of parking
occupancy in more sophisticated ways, for instance by computing
the rate at which occupancy increases, or by combining the infor-
mation about distance from the destination and occupancy with a
more elaborate function than that used by the proportional-distance
heuristic. But for the moment we avoid going down the avenue of
more complex heuristics and instead examine how our seven
simple heuristics behave when at a pure equilibrium and how
well they perform when competing against each other in a mixed
population. You might try to predict which strategies will outcom-

pete the others before reading on.
The Heuristics at Pure Nash Equilibria: Their Parameters, the Environments

They Create, and Their Ecological Rationality
One obvious approach to comparing the ecological rationality of
the heuristics is to let them compete directly: We describe such
tournaments in the next section. But first, in this section, we allow
each heuristic to compete just against versions of itself differing in
parameter values, that is, in environments created when all drivers
use the same parking heuristic. The heuristics will still be compet-
ing with themselves when we later allow them also to compete
with other heuristics, so some of our understanding of these single-
heuristic equilibria will carry over. Table 18-1 lists the parameter
values that achieve pure Nash equilibria for each of the above heu-
ristics, along with performance measures at equilibrium.
The proportional-distance heuristic at equilibrium takes spaces
at least 61% of the way from the first parked car to the destination.
The mean distance of the first parked car from the destination in
our canonical setup is 74 parking places, so this heuristic will, on
average, ignore vacancies farther than 29 places from the destina-
tion while driving toward it (which is about the same as the D = 31
of the fixed-distance heuristic at equilibrium).
The x-out-of-y heuristic at equilibrium has parameter values 28
out of 29. But the minimum of the performance surface is rather flat
and the special cases where x = y perform almost as well when
values of x are in the 20s. For the block-count heuristic (equivalent
to restricting the x-out-of-y strategy set to x = y), the equilibrium
value of x is 23. For the car-count heuristic (equivalent to the x-out-
of-y strategy if y is very large), the equilibrium value of x is 37
parked cars to pass. The space-count heuristic only achieves an
equilibrium when S is high enough that cars never park before turn-
ing around.
The equilibrium parameters of the linear-operator heuristic
(a = .84, zpT = .974) are harder to interpret intuitively, but two exam-
ples illustrate its behavior. Starting from the first parked car encoun-
tered, it would not allow parking for at least 21 places after that,
even if every place passed were full. Or if a space occurred just
before the heuristic would have accepted it, then it would take
a further solid block of 12 cars before another space would be
acceptable. Thus, at their equilibria both the x-out-of-y and the lin-
ear-operator heuristics require a long stream of densely packed cars
before parking is triggered. Nevertheless, they still sometimes
accept parking places well before the fixed-distance, or even the
proportional-distance, heuristics do at their equilibria.
Table 18-1: Parameter Values Leading to Nash Equilibria of Various Heuristics for Travel Time, and Measures of
Performance at These Equilibria
Heuristic Typical
SE
Fixed Proportional Car Space Block x-out-of-y Linear
distance distance count count count operator
Equilibrium parameter 31 .61 37 Large 23 28/29 .84, .974 —
values
Mean total travel time (s) 478 479 478 487 479 479 476 0.06
95th percentile of time to 441 441 440 442 442 441 440 2
arrive (s)
Mean number of places 34.6 34.6 34.7 34.7 34.7 34.7 34.8 0.01
from destination
Mean number of spaces on 0.75 0.72 0.80 0.65 0.72 0.74 0.84 0.0007
walk to destination
Proportion of cars that turn 0.60 0.65 0.63 1 0.73 0.69 0.62 0.0002
Note. Environmental parameters are the baseline values given in the text. Equilibria are based on performance of at least 100,000 cars, each on an
independent day. Estimates of performance are based on all 1,080 cars in each of 10,000 whole-day simulations, except that the 95th percentile of
arrival time was based on a sample of 10,000 cars, each on an independent day. The last column shows typical standard errors of the performance
measures.
The various heuristics at equilibrium produce distinct behav-

iors: Figure 18-3 shows some clear differences in whether a spot
tends to be occupied before or after turning around. However, the
different behaviors still produce environments that are remarkably
similar, in terms of the distribution of occupied spaces averaged
over the day (Figure 18-3, stepped curves). We think that this is
because if a heuristic were not adequately exploiting regions near
the destination, in all cases there is the option of the parameter
values shifting so that more cars turn around and occupy these
spaces on their return. All equilibria involve at least 60% of cars
finding spaces after turning around (with the space-count heuristic
evolving parameters that lead all cars to drive to the destination
and turn around before looking for a spot—this go-to-end strategy
provides a benchmark for comparison with the other more sophis-
ticated strategies). Table 18-1 shows various other measures of
performance for the different heuristics at equilibrium. There is
again surprisingly little difference in these measures between the
heuristics—indeed the differences mostly seem too small to be
noticeable by real drivers. This similarity in performance does not
imply that the different classes of heuristic are equal in competitive
ability if the population contains a mixture of heuristics, as we will
see later.
Finding Mixed Equilibria With an Evolutionary Algorithm
So far, we have attempted to find only pure equilibria, in which

every individual uses exactly the same heuristic. Now we reinves-
tigate the same situations to look for mixed equilibria, in which a
mixture of heuristics (different types of heuristics or the same type
with different parameter values) are used. A mixed equilibrium
can be achieved by each individual in the population consistently
using one heuristic, but with the heuristic differing between indi-
viduals according to some probability distribution. Alternatively
all individuals could use this probability distribution to select a
heuristic afresh on each occasion. For a mixture of heuristics to
form a Nash equilibrium, no mutant using a different heuristic can
on average do better, again providing no incentive for a driver to try
a new heuristic. Furthermore, all the heuristics composing the
mixed equilibrium must perform equally well when the population
uses them in the equilibrium proportions. This means that there
need be no immediate disincentive for a mutant to switch from
one of the component heuristics to another. What would stop any
consequent drift away from the equilibrium proportions is if fur-
ther slight increases amongst the population in the rate of use of
any heuristic result in reduced performance for that heuristic.
1 2000
Fixed-distance Proportional-
.75 distance 1500
.5 1000
.25 500
0 0
Car-count Space-count
.75 1500
Proportion of Day Occupied
Cars Parking per 100,000

.5 1000
.25 500
0 0
Block-count x-out-of-y
.75 1500
.5 1000
.25 500
0 0
Linear-operator All heuristics
.75 1500
.5 1000
.25 500
0 0
100 75 50 25 1 25 50 75 100 100 75 50 25 1 25 50 75 100
Distance From Destination
Figure 18-3: The distribution of parking positions at each of the

equilibria in Table 18-1. Each black histogram (scale on the right)
differentiates between cars parking before reaching the destination
(to the left of the midline) and those parking after reaching it (to
the right): Consequently the histogram also indicates the shape of
the distribution of time to park. Each histogram is based on the
parking positions from randomly selected single cars from 100,000
independent days. The stepped curve (scale on the left) shows
the occupancy of each spot averaged over the entire day (on each
of 10,000 independent days, we randomly sampled one moment
within the range of times when cars could arrive). The panel at the
bottom right superimposes these distributions of occupancy for all
seven equilibria, showing how similar they are.
473
In this case, small disturbances to the equilibrium tend to be self-

correcting. This extra property is the condition for a mixed Nash
equilibrium to be an evolutionarily stable strategy (ESS—see
Reichert & Hammerstein, 1983).
The Evolutionary Algorithm

To search for mixed equilibria we used an evolutionary algorithm
(Ruxton & Beauchamp, 2008). (Technically, ours is an evolutionary
programming approach—see Bäck, Rudolph, & Schwefel, 1993.)
The general operation of the evolutionary algorithm is to let a mixed
population of strategies compete at parking, measure their mean
performances, and then select a new population from the most suc-
cessful, but adding some extra strategies modified from these winners.
This process is repeated over many generations, leading to strategy
change in the population in a manner akin to natural evolution.
Our evolutionary algorithm uses the same baseline parking envi-
ronment as before, but now the 1,080 individuals parking on one
day can each differ in the type of heuristic they use and the param-
eter values of their heuristic. Within each generation, the same
1,080 individuals compete over a large number (R) of independent
days (i.e., each day the order of their arrival and their parking dura-
tions differ, but each individual uses the same strategy every day).
The measure of the performance of an individual is the mean of its
total travel times (parking search, walking, and driving away) over
these R days.
Following this tournament, the best 10% of the individuals from
the generation are selected and each of these is copied into the
next generation with the same strategy and parameters. Each also
replicates to form nine individuals with slightly mutated strategy
parameters. The magnitude of mutation is randomly sampled from
a normal distribution (or a discretized version in the case of integer
parameters).8 In addition, there is a further round of more extensive
mutations (hopeful monsters): Ten of the 1,080 individuals in
the new generation are picked at random, their heuristic type is
reallocated at random from those under consideration, and their
parameter values are assigned anew from a uniform distribution
over a plausible parameter range (e.g., for the fixed-distance heuris-
tic, between 1 and 64). The performance of this new set of 1,080
individuals is then evaluated as before.
8. The standard deviation of this distribution depends on the absolute

value of the parameter, so that mutation is less in the case of a small integer or a
proportion near 0 or 1 (for an integer parameter of value d, SD = V d + 25, or,
in the case of a proportion p, SD = V p( p) / 5 , where V is a constant).
In the first generation, we start at R = 400 replicate days, which

means that assessment of performance is inaccurate enough to
allow some less good strategies to survive by chance initially (i.e.,
selection is weak). In this way we do not immediately shift the
entire population onto heuristics that just happen to have good
parameter values in the initial generations but give the less good
heuristics (or regions of parameter space) a chance to improve. As
the evolutionary algorithm proceeds, within a few generations there
is typically some stability, but the distributions of parameter values
of the survivors have rather broad peaks. This might be because
there is disruptive selection favoring a diversity of parameter
values, or it could be artifactual if the ratio of selection to mutation
is insufficient for stabilizing selection to generate sharper peaks.
The breadth of the peaks may in turn affect their mean value,
because of the game-theoretic aspect of the parking situation. This
may be realistic in that real drivers make mistakes both in their
choices of good strategies and in enacting these strategies, and thus
everyone’s strategies should be adapted to the likelihood of others
also making such mistakes. However, to avoid the extra issue of
deciding what level of driver error is realistic, in later generations
we increase the selection versus mutation ratio as far as is practical,
which typically reduces the breadth of the parameter distribution
peaks.9
The Mixed Equilibria That Emerge

We used the evolutionary algorithm first to investigate the stability
of the pure Nash equilibria that we found earlier, by retaining the
constraint that all individuals in a population must use the same
type of heuristic but now allowing all individuals to differ in their
parameter values. We again will start by considering the fixed-dis-
tance strategy, for which we earlier found a pure Nash equilibrium
strategy that accepted spaces that were 31 places or less from the
destination. This was also an ESS because mutant strategies all did
worse, never equally well. Nevertheless, this is not quite the equilib-
rium that arises in the evolutionary algorithm, even if the starting
population is set at this pure equilibrium. We showed earlier only
that in a population for which all but one individual used D = 31,
any mutant individual performed worse. But the population can
9. To do this, we typically double R every 10 generations, so that

selection becomes more discriminating. The extent of mutation V is also
decreased from 0.1 to 0.05 after 10 generations and to 0.035 after 20. Further
reductions are not appropriate because otherwise integer parameter values
mutate too rarely for evolution to occur in the time available for running
the program. After 40 generations we also remove the hopeful-monster
mutations.
evolve away from this ESS if several individuals change their

behavior at the same time. If more than about 5 of the 1,080 drivers
per day switch from D = 31 to D = 30, then the best value of D for
any other individual to use is no longer 31, but 30. Thus, unless the
mutation rate in the evolutionary algorithm is extremely low, the
population evolves away from D = 31 and eventually consists of
97% of the population using D = 30 and the other 3% using D = 29.
This is a second ESS, which is a mixed one. It is possible that
there are further ESSs, but all of the mixtures of values of D with
which we started our evolutionary algorithm evolved toward this
second equilibrium. When we specify less harsh selection that
allows some slightly suboptimal values of D to persist (i.e., by
making R small), the mean value of D in the population still remains
close to 30.
We repeated this analysis with the other six heuristics, in each
case allowing only one type of heuristic in the population but per-
mitting individuals to differ in their parameter values. In general
the evolutionary algorithm converged to a mixed equilibrium
with a greater range of parameter values than for the fixed-distance
heuristic, but mostly their distributions were unimodal with a
mean close to the heuristic’s pure Nash equilibrium found earlier
(Table 18-2). This may reflect either a mixed ESS or the presence
of a flat fitness maximum (with the consequence that selection
fails to distinguish between neighboring parameter values). The
linear operator was somewhat of an exception in that the parameter
values at the equilibrium were spread bimodally and generated
more picky behavior than in the pure equilibrium: Starting from
the first parked car encountered, it would not allow parking for
about 31 places even if every place passed were full.
Table 18-2: The Equilibria Found by the Evolutionary Algorithm

Compared With the Pure Nash Equilibria
Equilibrium Heuristic
parameters
Fixed Propor- Car Space Block x-out- Linear
distance tional count count count of-y operator
distance
Pure Nash 31 .61 37 Large 23 28/29 .84, .97

Evolved:
Mode 30 .60 38 >39 23 30/31 .92, .91
5–95% 30–30 .58– 36–39 21–26 30/31– [.9, .95]–
quantile .62 32/33 [.93, .9]
range
Note. The space-count heuristic evolves to require sufficient spaces before parking so that cars
always turn at the destination; therefore parameter values above about 39 spaces are selectively
neutral.
Finally, we allowed the different heuristics to compete against

each other. The results were consistent irrespective of the initial
mixtures of heuristics that we tried. Within a few generations only
two types of heuristics remained and when the other heuristics
arose again by mutation, they rarely persisted for more than a gen-
eration or two. The two survivors were the fixed-distance and lin-
ear-operator heuristics. About 75% of the population used the
fixed-distance heuristic, with the distribution of parameter values
virtually indistinguishable from the mixed equilibrium when
only this heuristic was allowed. Likewise, the linear-operator heu-
ristic evolved parameter values within or close to the range when
only that heuristic was allowed; slightly different equilibrium
values evolved in different runs of the evolutionary algorithm. With
moderate values of R, which allow some suboptimal parameter
values to survive selection, the proportions and parameter values
of these two heuristics remain stable indefinitely. But with higher
selection pressure from large values of R, the proportions start to
oscillate and then diverge, which can eventually drive the linear-
operator heuristic to extinction. However, this extinction would
only be temporary whenever mutants are introduced, because each
of these heuristics can invade a population composed entirely of
the other heuristic. Furthermore, such harsh selection pressure is
unrepresentative of the real world, so an equilibrium with both
heuristics persisting is more relevant.
It remains possible that some particular combinations of the
other heuristics are also ESSs, since the evolutionary algorithm
does not systematically check all combinations of parameter values.
But because all starting conditions that we investigated converged
to the identified two-heuristic equilibrium, we claim that it is the
most likely ESS to arise among these strategies in this environment.
What we cannot yet say is whether this mixture of two heuristics is
stable against invasion by heuristics that we have not considered.
Given that the ESS consists of some individuals that respond to
local density and some that use a fixed distance from the destina-
tion (and/or individuals that sometimes do each), a plausible can-
didate for a heuristic that would outcompete these would be one
that combined these two approaches. Moreover, it seems intuitively
reasonable that drivers might somehow combine such obviously
relevant cues as position and density, rather than using only one
piece of information. We investigate such a heuristic at the end of
the next section.
Explaining Heuristic Competitiveness via Environment Structure
To understand why some types of heuristics outcompete others in

the search for good parking spaces, it is necessary to consider the
structure of the environment in which they operate—this is the

central tenet of ecological rationality. As we have emphasized, the
environment structure for this parking task is created by the heuris-
tics that the population is using. Figure 18-3 demonstrates that
some broad structural features of the patterns of occupancy are
fairly consistent across environments, at least among those pro-
duced by well-adapted heuristics (i.e., with near-equilibrium
parameter values). Here we focus on another consistent environ-
mental feature and its implications for strategy performance.
The feature can be seen in Figure 18-4, which shows the results
of a simulation in which all drivers decide whether to accept
each parking space using the fixed-distance heuristic with the
pure ESS aspiration level D = 31. The vertical axis represents the
position along the parking strip, with the destination at the bottom;
the distribution of cars in the parking strip is plotted against time
on the horizontal axis. Thus one “column” in the figure corresponds
to the presence or absence of parked cars at all distances from the
destination at one particular time step. Clear structure is apparent:
The distribution of parked cars over time is characterized by a
striking pattern of peaks occurring fairly regularly in time, although
100
Parking Places From Destination
80
60
40
DP= 31
20
0 1 2 3 4 5 6 7 8 9 10 11
Time (Hours)
Figure 18-4: Distribution of parked cars over time on one simulated

day. The vertical axis represents the position along the parking
strip and thus the distance from the destination, and the horizontal
axis represents time. All cars follow the fixed-distance heuristic
with DP = 31. Places are marked as black when unoccupied. A fairly
regular pattern of peaks emerges.
varying in height. This is typical also of our simulations of the

other parking heuristics. We did not expect such a pattern and we are
not aware of others having reported it either in computer simula-
tions or in the field. Its consequence is that drivers arriving at
different times can encounter the first (farthest from destination)
parked car at very different positions, after which large blocks of
spaces may appear.
A likely prerequisite for this pattern to occur is that spaces are
not chosen randomly; rather, parking spaces closer to the destina-
tion must have a higher probability of being chosen. This is nor-
mally the case with well-adapted heuristics, since they are
sufficiently ambitious that quite often cars have to turn around at
the destination and then will take the closest available space on
their return. To demonstrate how this behavior leads to the observed
structure, we ran simplified simulations in which each car instan-
taneously occupied the space closest to the destination. First, we
kept the parking duration and the interval between the arrival of
cars constant, which resulted in an overlapping-staircase-like pat-
tern of occupancy over time (Figure 18-5a). The first staircase
arises because later arrivals have to take parking places progres-
sively farther from the destination. Then, because the order in
which cars depart follows the same sequence as their arrival, it is
cars closest to the destination (i.e., those that formed the base of the
staircase) that start leaving first; their places are then the ones taken
by new arrivals, and so the old staircase stops growing and a new
staircase starts to build in parallel. There is no pattern of peaks.
When we introduce stochasticity in arrival intervals, the stair-
cases separate (Figure 18-5b) because there are periods when no
cars happen to arrive to be added to the new staircase. Conversely,
if several cars arrive closely spaced, insufficient cars may have
left the previous staircase so that there are no spaces at the top of
the new staircase; instead, cars are added to the top of the previous
staircase, thus extending it farther out away from the destination.
Because some cars join old staircases, the new staircases tend to
grow progressively more slowly than those before, meaning that
they are less steep.
Adding stochasticity in parking durations (Figure 18-5c) gives
rise to pulses in the availability of spaces rather than of cars; a pat-
tern of peaks still arises (especially in conjunction with stochastic-
ity in arrival interval—Figure 18-5d), but it is less pronounced.
Also, the right edge of each staircase is more ragged, so that a driver
would encounter more but shorter blocks of spaces.
The pattern of peaks seen in Figures 18-4 and 18-5 demonstrates
that the distribution of parked cars along a street can have con-
siderable structure. Even though in the simulations shown in Figure
18-5 cars always take the closest space available, and therefore
40
(a)
30
20
10
0 1000 2000 3000 4000 5000

40
(b)
30
Parking Places From Destination
20
10
0 1000 2000 3000 4000 5000

40
(c)
30
20
10
0 1000 2000 3000 4000 5000

40
(d)
30
20
10
0 1000 2000 3000 4000 5000

Time Steps
Figure 18-5: Distribution of parked cars over time (plotted as in

Figure 18-4) under different conditions: (a) Interval between arriv-
ing cars and parking duration held constant, creating overlapping
staircases of occupied parking places. (b) Stochasticity added to
arrival times, causing staircases to separate. (c) Stochasticity added
to parking durations, generating uneven peaks. (d) Stochasticity in
both arrival times and parking durations (as in Figure 18-4).
480
car density is on average higher nearer the destination, neverthe-

less at some times of day a driver who has passed a dense block of
parked cars may still then encounter a region with many spaces.
The reason is that nearby cars tend to have arrived one after the
other (generating the staircases), and thus are liable to all leave at a
similar time, lowering car density locally. Another consequence
of nearby cars leaving at similar times is that finding one place
unoccupied informs drivers that nearby places also are likely to be
unoccupied. For instance, consider the day illustrated in Figure
18-4, and ignore the spaces passed prior to encountering the first
parked car; even though overall only one in six parking places
is unoccupied, a space is more likely to be immediately followed
by another space than by a parked car. The existence of this auto-
correlation invalidates the assumptions of previous analytic models
(e.g., MacQueen & Miller, 1960; Tamaki, 1988) that the probability
of a parking place being occupied is independent of the occupancy
of other places. We will now illustrate ways that the structure in
Figures 18-4 and 18-5 can help us understand why (and when)
some heuristics function better than others.
First, we can see why the space-count heuristic was outcom-
peted by the policy of only parking after passing the destination.
The autocorrelation in the occurrence of spaces means that encoun-
tering more spaces than usual is a reason to expect further spaces
to occur ahead, so search should continue; instead, the space-count
heuristic is triggered to accept a space.
The proportional-distance heuristic might a priori seem a good
means to spot times of day when there are gaps between the stair-
cases in Figure 18-4, and to search nearer the destination at such
times. But in fact it is poor at this job because there can be one car
near the top of a staircase that remains parked for much longer than
the mean (the gamma distribution of parking durations is right
skewed) so that the heuristic becomes blind to any low density
beyond. Another problem is that when spaces are appearing near
the destination the current staircase stops growing, but cars just
arrived near the top of the staircase remain for some time; when
they disappear the desirable spaces near the destination have long
been occupied. So the proportional-distance heuristic’s emphasis
on the very first car encountered may be misguided. The space-
count and car-count heuristics may similarly be poorly designed in
their partial dependence on occupancy far from the destination as
a means to predict occupancy near the destination.
The linear-operator, x-out-of-y, and block-count heuristics all
respond to local density and thus can utilize the local positive auto-
correlation in vacancies. When they detect a high density of spaces
they do not park, gambling on there being more spaces ahead. We
analyzed how the linear-operator heuristic performs in the mixed
ESS with the fixed-distance heuristic. The linear-operator heuristic

is adaptive in that at times when the peaks are growing because
some cars are encountering no spaces, it accepts spaces before the
fixed-distance heuristic would (and therefore has to turn around
less often), whereas at times of lower density when many parked
cars are leaving, it tends to be more ambitious than the fixed-
distance heuristic. We are not sure why the linear operator is the
most successful of the density-dependent heuristics, but it may be
important that it both avoids being triggered until well after the first
car is encountered and yet is not overly influenced by the occa-
sional gap in an otherwise high-density sequence. Potentially these
density-dependent heuristics have to be very picky to avoid being
triggered in areas of high density a long way from the destination,
but this is too picky once they get close. Thus, a superior heuristic
might wait until within a certain distance of the destination to
invoke a less picky density-dependent trigger.
Accordingly, we tested such a distance-and-density heuristic.
This requires the conditions for both the fixed-distance and the
block-count heuristics to be simultaneously satisfied. We chose the
block-count heuristic to monitor density because it requires only
one parameter. The distance-and-density heuristic indeed invades
the mixed fixed-distance and linear-operator ESS, driving both to
extinction. It is also the only surviving strategy when pitted against
all seven previous heuristics. At equilibrium the parameters of
the distance-and-density heuristic are somewhat broadly spread:
The parameter value of the fixed-distance component averages 36,
but values from 32 to 41 may also persist; the parameter value
of the block-count component averages 12, but values from 9 to 17
may also persist. Larger values of one parameter are associated with
larger values of the other, so successful versions that are less picky
about how far away to accept a space are more picky about how
long the block of cars must be to trigger acceptance. There is no
pure Nash equilibrium.
Conclusions: The Game of Parking
A common starting point for people thinking about how they

search for a parking space, and for researchers developing models
to try to understand the process of searching for parking, is to
suppose that drivers proceed some way toward their destination
until they are “close enough” and then take the next available
space. And indeed this fixed-distance heuristic is optimal in an
idealized world where parking spaces are distributed indepen-
dently. But in the real world, as people park in spaces and exit them
later, they create structure in the environment of available spaces
that other drivers are searching through. We investigated the

consequences in a simple model system and demonstrated how the
aspiration level of the fixed-distance heuristic should adjust to an
environment created by other drivers using the same heuristic.
In theory there was a stable equilibrium in which everybody using
one particular aspiration level would create an environment
where no other aspiration level could do better. However, every-
body in real life using similar aspiration levels seems unlikely,
especially given our demonstrations that very different parameter
values are to be expected among drivers if they are influenced by
their experiences of parking in different underlying environments
(e.g., with different arrival rates of competing parkers, or different
street layouts) or if they differ in which performance criteria most
matter to them.
Nevertheless, our idealized model system proved illuminating
when assessing other plausible sorts of simple parking heuristics.
All could give rise to pure equilibria yielding a similar mean per-
formance and distribution of parked cars; but when we allowed
different heuristics to compete, a mixture of the fixed-distance and
linear-operator heuristics consistently prevailed. A more detailed
examination of the environments created by these heuristics helped
to explain why. In particular, there was considerable clumping in
the distribution of parking spaces, but where these clumps occurred
moved during the day, which the linear-operator heuristic could
exploit. This led us to design a superior two-criterion heuristic
that allowed parking only if both the destination was sufficiently
close and there had been a lack of vacancies in the last few places
passed. Because the autocorrelation in the occurrence of spaces is
likely to be a common phenomenon in many different parking
situations and should occur regardless of the performance crite-
rion, it may be a widely applicable conclusion that such distance-
and-density heuristics are a good choice. We used the simplest
block-count component mechanism to assess the local density of
vacancies, so somewhat more complex heuristics (nevertheless
well within our cognitive abilities) could well be even more com-
petitive. However, there remains the problem that the appropriate
parameter values of such heuristics may be difficult for drivers to
select; as with the fixed-distance heuristic, these are likely to
depend considerably on underlying aspects of the environment
such as arrival rates and street topology.
We have emphasized the need to consider dynamic and game-
theoretic, strategic interactions in our analysis of what are good
parking strategies. But do drivers really take these aspects into
account when looking for parking spaces? Of course, based on their
experience with what has worked previously, drivers could be
blindly applying rules adapted to the game-theoretic situation
without thinking about why they work. Alternatively, although

drivers are clearly not carrying out in their heads the kind of
computationally intensive analysis presented in this chapter, they
may be applying rules that their intuition suggests are adaptive in a
competitive game-theoretic context. This intuition might be as
simple as a justified expectation that there is some autocorrelation
in the occurrence of parking spaces, or instead our ever-active brains
might be using much more complex calculations even though
our underlying theories may be misguided. Where we get our park-
ing heuristics from is a hard problem to solve, but empirical inves-
tigations can at least start by determining which heuristics people
actually apply in particular parking contexts, and how these may fit
to the perceived structure of the environment.
Part VII
AFTERWORD
19
The Normative Study of Heuristics
Gerd Gigerenzer
Peter M. Todd
It simply wasn’t true that a world with almost perfect

information was very similar to one in which there was
perfect information.
Joseph Stiglitz, on the financial crash of 2008
H ow do we make decisions? Three major answers have been pro-

posed: The mind applies logic, statistics, or heuristics. Yet, these
mental tools have not been treated as equals, each suited to a par-
ticular kind of problem, as we believe they should be. Rather, rules
of logic and statistics have been linked to rational reasoning, and
heuristics to error-prone intuitions or even irrationality. Logic and
statistics have been given normative status in psychological theo-
ries, and also sometimes treated as descriptive models of the pro-
cess of reasoning. Heuristics have virtually never been treated as
normative, only as descriptive models. This division has a conse-
quence that may be appealing to some: If people became rational,
psychology departments and journals devoted to the study of rea-
soning and decision making could be closed down because behav-
ior would be fully described by the laws of logic and statistics.
According to this descriptive/normative schism, psychology is
restricted to dealing with the clinical or pathological only. In the
words of a well-known economist, “either reasoning is rational or
it’s psychological” (see Gigerenzer, 2000, p. vii).
Can the study of heuristics tell us how we ought to make deci-
sions? To ask this question seems naïve, even ludicrous. Normative
questions about “ought” have been carefully kept apart from
descriptive questions about “is,” by prominent advocates such as
Kant, Frege, Popper, the logical empiricists, and current-day text-
books on decision making. This is/ought schism is also reflected in
487
488 AFTERWORD
a division of labor between disciplines. Logic, statistics, and phi-

losophy are considered normative disciplines that prescribe how
we ought to reason, while experimental psychology is relegated
to do the empirical work on how people reason, and then compare
it with how they ought to reason. But note that unlike in Kant’s
thinking and moral philosophy, the term “normative” is used in the
cognitive sciences for the best means, not the best ends (such as
virtues). Normative reasoning, sometimes called optimal reason-
ing, is defined by principles such as Bayesian probability updating,
consistency of beliefs, or utility maximization—definitions vary.
When we use the term “normative” here, we refer to this means–
ends connotation: How ought one go about reaching a given goal?
We are not making the stronger claim that the study of heuristics by
itself could specify that goal or provide the ends (except insofar as
making speedy, frugal, or transparent decisions could be goals
themselves). Thus, we can rephrase our question this way: Can the
study of heuristics tell us what strategies we ought to use to reach a
given goal?
The answer from the psychological literature appears to be a reso-
lute “no,” both among those who emphasize human irrationality and
those who insist on the rationality of cognition. In the literature that
emphasizes our irrationality, deviations from logical or statistical
principles have been routinely interpreted as cognitive fallacies
(Tversky & Kahneman, 1974) and attributed to heuristics such as “rep-
resentativeness” or to an intuitive “System 1” used by the mind to
make quick judgments (Evans, 2008). According to this view, people
often rely on heuristics but ought not to—we would be better off if
we reasoned rationally, as defined by the rules of logic and statistics.
In the literature that emphasizes human rationality, the laws of
statistics (such as Bayes’s rule) are again proposed as the normative
means toward given ends. These normative computations are
sometimes said to describe the “computational level” of cognition
(a term borrowed from Marr, 1982), while “at the algorithmic level,
the relevant cognitive processes operate via a set of heuristic tricks
. . . rather than with explicit probabilistic calculations” (Chater &
Oaksford, 2008, p. 8). In this rational view of the mind, a heuristic
is a quick-and-dirty cognitive shortcut or approximation to an opti-
mization process that is too difficult for the mind to execute. Both
programs, although in different ways, maintain the conventional
split between the normative and the descriptive, with heuristics
being merely descriptive and hence unsuitable for telling us how to
best reach particular goals.
In this chapter, we will argue against this schism and for the pos-
sibility of a normative study of heuristics. The normative part comes
from exploring the ecological rationality of heuristics; it com-
plements the descriptive part, the study of the adaptive toolbox.
ECOLOGICAL RATIONALITY: THE NORMATIVE STUDY OF HEURISTICS 489
The term “ecological” signals that the yardstick for rationality is

some measure of success in the external world, instead of some
measure of internal consistency, as in most traditional theories of
rationality. To make our point, we introduce in the next section two
distinctions: process models versus as-if models; and problems for
which optimization is feasible or not. In short, our argument is that
the study of heuristics can be normative through answering ques-
tions of what particular heuristic process one should use to succeed
in a given environment, and through considering the ecological ratio-
nality of heuristics in situations where optimization is not feasible.
Simon’s Question
Herbert Simon (1979b) stressed in his Nobel Memorial Lecture that

the classical model of rationality requires knowledge of all the rel-
evant alternatives in any decision situation along with their conse-
quences and probabilities, all occurring in a predictable world
without surprises. These conditions, however, are rarely met when
individuals and organizations need to make decisions. Therefore,
Simon later (1989) called for a research program that poses and
answers a fundamentally new question:
Simon’s question: “How do human beings reason when the

conditions for rationality postulated by the model of neoclas-
sical economic theory are not met?” (p. 377)
Note that this question is descriptive, not normative. An answer

to Simon’s question is provided in the empirical study of the adap-
tive toolbox of heuristic decision mechanisms introduced in this
book’s predecessor (Gigerenzer, Todd, & the ABC Research Group,
1999) and developed further in several of the earlier chapters in
this book. To see how to extend Simon’s question to a normative
consideration of heuristics, we must first consider the two distinc-
tions that underlie his query.
Process Models Versus As-If Models

In asking how human beings reason, Simon was seeking process
models that describe the specific steps in a cognitive process, such
as the search rules, stopping rules, and decision rules a heuristic
employs. The goal of as-if models, in contrast, is not to specify the
processes by which people reason but just to predict the resulting
behavior. Both types of models exist in other sciences as well.
For instance, Ptolemy’s theory in which planets move around
the earth in circles and epicycles served as an as-if model that
490 AFTERWORD
predicted the positions of planets over time well—provided enough

epicycles were included in the model. In contrast, Kepler’s theory
in which planets move in ellipses around the sun was meant as a
process model, describing the actual motions of the planets.
Following Milton Friedman (1953), neo-classical economists
have taken a decisive stand for as-if models and against the psycho-
logical process models Simon championed. This attitude has
shaped even behavioral economics, which claims to build more
psychologically realistic theories of human decision making (see
Berg & Gigerenzer, 2010). In psychology, the rejection of cognitive
process models has been most forcefully articulated by behavior-
ists, but the tradition continues today in a proliferation of as-if
models in modern cognitive science, such as many Bayesian models
of cognition. One key reason for the construction of these as-if
models of decision making is a methodological preference for opti-
mization models. Even though they typically involve parameter
estimations and computations that few researchers would argue are
realistically occurring in cognition, these optimization models are
often proposed because they embody the ideals of traditional ratio-
nality that researchers still feel must hold at the computational
level of good cognition. The corresponding algorithmic-level
models, such as heuristics, are given relatively little attention, again
in part because they cannot easily be modeled with standard opti-
mization techniques. Thus, Simon’s call for process models was at
the same time a call against unrealistic optimization models, which
brings us to the second distinction implied by his question.
Problems That Are Feasible Versus Not Feasible for Optimization

As indicated above, Simon emphasized that the conditions for
rationality are not met when the world is uncertain and not all of
the relevant options, consequences, and probabilities are known. In
other words, he wanted a theory of human cognition in those
common situations when optimization approaches are not feasible.
We use the term “optimization” here in its mathematical sense: to
compute the maximum or minimum of a function. For instance,
signal detection theory is an optimization theory (derived from
Neyman–Pearson decision theory in statistics) and so are expected
utility theory, prospect theory, and many sequential models of deci-
sion making (derived from Abraham Wald’s 1947 extension of
Neyman–Pearson theory). Note, though, that an optimization model
is not the same as an optimal outcome. An optimization model can
be expected to lead to the optimal (best) outcome if its set of condi-
tions is met. But if, as Simon posed, one or more of the conditions
are not met, anything is possible in principle as far as what mecha-
nisms will produce the best outcomes (Lipsey, 1956). For instance,
when the parameter values of the optimizing multiple regression

model are not known but need to be estimated from samples, this
optimizing model can be outperformed by heuristics such as take-
the-best and tallying in terms of more accurate predictions
(Czerlinski, Gigerenzer, & Goldstein, 1999).
The types of situations Simon was interested in where optimiza-
tion is not feasible can be caused by the following factors:
1. Intractability. The problem is well specified but computa-

tionally intractable, such as probabilistic inference in
Bayesian networks, the games of chess and Go, and the
traveling salesman problem (Dagum & Luby, 1993; Reddy,
1988). Chess, for instance, is a well-specified game in which
all alternatives are known and an optimal sequence of
moves exists. Yet chess is computationally intractable: The
optimal sequence cannot be found with certainty either by
chess masters or by chess computers such as Deep Blue.
2. Estimation error. The problem is well specified and tracta-
ble, but the parameter values need to be estimated from
limited samples. As illustrated by the mean–variance port-
folio (chapter 1) and analytically explained by the bias–
variance dilemma (chapter 2), the “variance” introduced
by estimation error can lead to greater error than the “bias”
of a simple heuristic does. In this situation, it is ecologi-
cally rational to rely on a simple heuristic rather than an
optimization method or other complex strategy.
3. Imprecise specification. The problem is ill specified; that
is, not all alternatives, consequences, and probabilities are
or can be known. This appears to be the case for most prob-
lems individuals and organizations face, from choosing a
mate, to selecting a job, to picking stocks.
These three factors indicate the range of problems that Simon

was interested in understanding: those for which optimization is
inapplicable, impossible, or undesirable because it leads to inferior
results. In situations characterized by factor (1), an optimal strategy
exists but cannot be determined; in (2), an optimal strategy may
lead to suboptimal outcomes if the parameter values are not per-
fectly known or cannot be estimated without error; and in (3), opti-
mization is not applicable.
How to Respond to Simon’s Question?

Faced by Simon’s challenge, one approach would be to sidestep it:
refuse to study behavior in worlds where optimization is out of
reach, and instead change the problems of interest into ones that
492 AFTERWORD
allow optimization. This is commonly done by assuming that all

the necessary information is perfectly known. The father of modern
Bayesian decision theory, Leonard Jimmy Savage (1954), called
such problems “small worlds.” A lottery in which all alternatives,
outcomes, and probability distributions are known with certainty
is a prototypical small world. The big question here is whether
the small world is a true microcosm of the real (“large”) world, that
is, whether the optimal strategy determined in the small world is
actually also optimal in the large world. Savage was very clear
about the importance of this issue of generalization of small-world
results to the large world. Such generalization often does not hold.
For instance, a rational theory of investment such as Markowitz’s
(1952) Nobel-Prize-winning mean–variance portfolio is optimal in
the small world where its conditions hold and the parameter values
can be estimated without error, but this is not the case in the
real large world of investment. Instead, a simple diversification
heuristic called 1/N (invest equally in all N assets) can outperform
the mean–variance portfolio (DeMiguel, Garlappi, & Uppal, 2009;
see also chapter 1). Thus, if the small world is not a true micro-
cosm, then ignoring Simon’s question and applying small-world
rationality to the large world may be futile. Joseph Stiglitz (2010),
quoted at the beginning of this chapter, made this point in attribut-
ing the financial crash of 2008 in part to the application of financial
theories that assume perfect information to the real world of invest-
ment in which this condition was not perfectly met.
A second response to Simon’s question is to take it seriously and
study how people make decisions in situations where optimization
is out of reach (without changing the problem into one that allows
optimization in a small world). Using the investment problem, this
approach would first ask how individuals and firms allocate their
money when the future is uncertain (providing a descriptive answer
in terms of the tools used from the adaptive toolbox). If the answer
then is that many people rely on the 1/N heuristic, researchers fol-
lowing this approach would not conclude that using this heuristic
is a fallacy due to cognitive limitations. Rather, they would apply
the study of ecological rationality to move beyond Simon’s original
question and ask the subsequent normative question: Which envi-
ronments (varying in, for instance, predictability, sample size, etc.)
will allow the 1/N heuristic to outperform the mean–variance port-
folio, and vice versa?
Ecological Rationality: The Normative Extension of Simon’s Question
Ecological rationality concerns the match between cognition and envi-

ronment. The study of ecological rationality enables researchers to
make comparative statements about what is best: Given a problem

(task) in an environment, which of a set of heuristics (strategies)
will perform best? Or, given a heuristic for a task, in which of a set
of environments will it perform best? Consequently, it allows us to
ask, and answer, the normative extension of Simon’s question:
Ecological rationality’s question: Given a problem in an envi-

ronment, which strategies should humans rely on when opti-
mization is not feasible?
The answers to this question map the set of strategies (including

heuristics) in the adaptive toolbox of an individual or cultural
group onto a set of environmental structures. For instance, when
sample size is small and cues are moderately to highly redundant,
a person should use take-the-best rather than multiple regression to
get higher predictive accuracy (chapter 2; Czerlinski et al., 1999).
Assessing the match between strategies humans use and the prob-
lems and environments they face requires that we have process
models of those strategies, not as-if models. This close connection
between the descriptive (how people can actually make decisions)
and the normative (how they should make decisions in particular
settings) is atypical for optimization approaches (such as Bayesian
probability updating), which do not require a study of cognition for
their as-if modeling.
Methodological Approaches to Ecological Rationality

Three methodological approaches have been pursued in studying
the ecological rationality of heuristics: analyses with full informa-
tion, limited information, and erroneous information. The first
approach uses analytical methods applied to an environment that
is fully known (e.g., Katsikopoulos & Martignon, 2006). In this anal-
ysis, the term “environment” refers to an n × (m+1) matrix with the
values of n objects on m cues plus the criterion. This has led to a
number of theorems. For instance, in an environment with binary
cues whose weights decrease exponentially (such as 1, 1/2, 1/4,
1/8, and so on), the accuracy of take-the-best equals that of any
linear model with the same order of weights (Martignon & Hoffrage,
1999, 2002). The second approach is the study of the ecological
rationality in situations where the environment is not fully known,
that is, where inferences must be based on samples, and the param-
eters need to be estimated from these samples. An important result
here is that the theorems derived from fully known environments
do not generally hold with limited information (chapter 2). This
mirrors the difference between the perfectly known small worlds of
optimization and the large worlds with only partial information.
494 AFTERWORD
The third approach to ecological rationality investigates the role

of errors in the given information (Hogarth & Karelaia, 2007;
chapter 3). It can be combined with both of the other two approaches.
Three Illustrations
To repeat, ecological rationality is a normative discipline that
requires descriptive knowledge about the processes underlying
decision making. Normative statements about decision making
involve both psychological and environmental structures, and to
know what is best, we must know what structures go into the deci-
sion process. Despite long-standing admonitions to avoid the so-
called naturalistic fallacy—never derive ought from is—in this case
the ought, how people should make decisions, is not independent
from the is, how people are able to make decisions. To illustrate
the importance of understanding cognitive processes for determin-
ing what one ought to do, we revisit three problems in health care
discussed earlier in this book.
What organ donation policy should a government implement? If
we want to save some of the lives of those 5,000 Americans who die
every year waiting in vain for a donation, then we need to know
first how people make decisions (chapter 16). More specifically, we
need to know why the great majority of Americans do not sign up
to be a potential organ donor despite most saying they are in favor
of donation. If people do not sign up because they are not informed
about the problem, then country-wide information campaigns are
what we ought to do. Yet millions of dollars and euros have been
spent on such campaigns with little success, because they are
derived from the wrong psychology, based on the belief that more
information will always help. If the behavior of most people is
instead driven by using the default heuristic in the local legal envi-
ronment concerning organ donation, then we ought to do some-
thing different to save the lives of those waiting: Change the opt-in
default on donation to an opt-out default. To debate what is the
right thing to do without analyzing the interaction between mind
and environment may prove futile and cost lives.
Next, consider another key problem in health care: A majority of
physicians do not understand health statistics, such as how to esti-
mate the probability that a patient has cancer after a positive screen-
ing test (chapter 17). The normative recommendation made for
decades is that physicians should learn how to derive this proba-
bility using Bayes’s rule, given the sensitivity and specificity of
the test, and the prior probability of the disease. Yet, this proposal
has had as little success as the organ donor publicity campaigns.
An efficient solution to this problem starts once again with an anal-
ysis of the cognitive processes of physicians and the structure of
information in their environment. The resulting recommendation

is to change the representation of information from conditional
probabilities to natural frequencies. This strategy has helped hun-
dreds of physicians to understand the outcomes of their tests
(Gigerenzer, Gaissmaier, Kurz-Milcke, Schwartz, & Woloshin, 2007).
As in the donor problem, determining what ought to be done—
changing the environment—follows from understanding how cog-
nition is influenced by that environment.
Coronary care unit allocation provides a third illustration of how
ought depends on is (chapter 14). How should patients with severe
chest pain be assessed so that those with likely heart attacks are
assigned to the intensive care unit (ICU) and the rest are not? At one
Michigan hospital, physicians’ defensive decision making waste-
fully sent some 90% of all patients to the ICU. The first reaction
was to introduce a statistical software program that allowed physi-
cians to compute the probability of the patient having a heart attack
using a pocket calculator and a chart full of numbers. Again, this
solution disrespects physicians’ natural heuristic decision making,
and studies have documented that physicians tend to discard this
approach as soon as the researchers leave the hospital. A more effi-
cient solution can be found by providing a heuristic strategy that
fits the sequential thinking of physicians, the fast and frugal tree
described in chapter 14. This approach, matching strategy and
information to mental structures, has reportedly led to more accu-
rate patient allocations, and just as important, physicians like it
and have been using it for years.
Each of these three examples answers the question of ought by
the study of is. How we should make decisions can be informed by
how we actually achieve ecological rationality.
Ecological Rationality and Bounded Rationality

Ecological rationality is an extension of the study of bounded ratio-
nality as Herbert Simon proposed it (1955a, 1956, 1989). Simon’s
question makes it clear that he took bounded rationality to be the
descriptive study of how humans make decisions when optimiza-
tion is not feasible. Yet, the common understanding of bounded
rationality has shifted over the years away from Simon’s notion.
Economists and psychologists have proposed two different inter-
pretations, each of which assumes the possibility of optimization.
The first reinterpretation of bounded rationality, embraced by virtu-
ally all economists, is that “boundedly rational procedures are in
fact fully optimal procedures when one takes account of the cost of
computation in addition to the benefits and costs inherent in the
problem as originally posed” (Arrow, 2004, p. 48). In other words,
bounded rationality is nothing but optimization under constraints
496 AFTERWORD
in disguise. The second reinterpretation, embraced by most psy-

chologists, is that of deviations from optimality. The goal is
“to obtain a map of bounded rationality, by exploring the system-
atic biases that separate the beliefs that people have and the
choices they make from the optimal beliefs and choices assumed in
rational-agent models” (Kahneman, 2003, p. 1449). The two inter-
pretations of bounded rationality appear diametrically opposed,
emphasizing rationality and irrationality, respectively. Nevertheless,
both refer to optimal beliefs or procedures. Yet Simon’s bounded
rationality is neither the study of optimization under constraints,
nor that of deviations from optimization. It is the study of heuristic
decisions, as explored in our previous book (Gigerenzer et al.,
1999), while its normative extension, the study of ecological ratio-
nality, is what we have laid out in this volume. The application of
these ideas to decisions in social environments, yielding social
rationality, is the subject of the next volume in our series (Hertwig,
Hoffrage, and the ABC Research Group, in press).
Rationality for Mortals
We began this chapter with the schism between “is” and “ought,”
institutionalized in the division of labor between disciplines. Until
recently, the study of cognitive heuristics has been seen as a solely
descriptive enterprise, explaining how people actually make deci-
sions. The study of logic and probability, in contrast, has been
seen as answering the normative question of how one should make
decisions. This split has traditionally elevated logic and probabil-
ity above heuristics—contrasting the pure and rational way people
should reason with the dirty and irrational way people in fact do
reason. Yet logic, statistics, and heuristics finally need to be treated
as equals, each suited to its particular kind of problem.
The study of ecological rationality widens the domain of the
analysis of rational behavior from situations with perfect knowl-
edge to those with imperfect knowledge. It is a more modest kind of
rationality that is not built on what is the best strategy overall, but
what is best among the available alternatives. To strive for the abso-
lute best—optimization—is an appealing but often unrealistic goal,
a rational fiction possibly anchored in our Western religions.
According to many traditions, God or the Creator is omnipotent,
or almighty, with unlimited power to do anything. He (sometimes
she) is also omniscient, knowing everything about his creation.
Furthermore, some theologians proposed that God has created
every animal and plant so perfectly that it could not fit better into
its environment, a concept that we might call optimization today.
These three O’s, omnipotence, omniscience, and optimization, have
sparked generations of discussion and debate and have led to unex-

pected paradoxes: Can God create a rock so heavy that even he
cannot lift it? If he can create such a rock, then he cannot lift it
and is not omnipotent; but if he cannot create it, again he is not
omnipotent. How can we mortals have free will if God is omni-
scient? And if we do not have free will, why would we be punished
for sinning?
In secularized societies, we tend to smile about these heavenly
paradoxes—surely we have moved beyond consideration of such
questions. Yet the same three O’s continue to appear in modern
times in the way we conceive of ourselves through the social sci-
ences. Mortal beings figuring out how to act in the world are rou-
tinely modeled as if they have unlimited computational power,
possess complete information about their situation, and compute
the optimal plan of action to take. These assumptions can be found
in optimal foraging theories, models of cognition, and economic
theories of market behavior, among other modern scientific notions
of human (and animal) behavior.
The study of ecological rationality dispenses with the three
ideals of godlike psychology. It does not require optimization,
finding the absolute best solution, but asks which heuristics are
better than others and good enough to solve a problem in real time
with real resources. Optimizing is thus replaced by satisficing.
Ecologically rational beings are not omniscient but rather must
search for information and at some point (after not too much time)
must also stop that search. Omniscience is thus replaced by limited
information search. Finally, the fiction of omnipotence is replaced
by a more realistic vision of a mind that exploits the structure of
the environment and the benefits of simplicity and relies on heuris-
tics that are tractable and robust.
Norms based on the three O’s can create a modern version of
natural theology, rather than guidelines for humans. The normative
question to ask is emphatically not: If people were omniscient,
how should they behave? Rather, we must start with how people
actually think, and ask: Given limited knowledge and an uncertain
future, how should people behave?
In reaction to these three O’s and the lack of realism in the study
of human thinking, Herbert Simon called for sanity in theories of
rationality. The concept of ecological rationality laid out in this
book is our answer to his call: how real people make decisions with
limited time, information, and computation. We do this every day
in a world with pervasive uncertainty, but also rich and reliable
structure for our minds to exploit.
References
Abelson, R. P. & Levi, A. (1985). Decision making and decision theory.

In G. Lindzey & E. Aronson (Eds.), Handbook of social psychol-
ogy. Vol. I. Theory and method (3rd ed., pp. 231–309). New York:
Random House.
Adamowicz, W. A., Hanemann, M., Swait, J., Johnson, R., Layton, D.,
Regenwetter, M., et al. (2005). Decision strategy and structure in
households: A “groups” perspective. Marketing Letters, 16, 387–399.
Albers, W. (2001). Prominence theory as a tool to model boundedly ratio-
nal decisions. In G. Gigerenzer & R. Selten (Eds.), Bounded rational-
ity: The adaptive toolbox (pp. 297–317). Cambridge, MA: MIT Press.
Allan, L. G. (1993). Human contingency judgments: Rule based or asso-
ciative? Psychological Bulletin, 114, 435–448.
Allen, C. (2000). The evolution of rational demons. Behavioral and
Brain Sciences, 23, 742.
Allison, R. I. & Uhl, K. P. (1964). Influence of beer brand identification
on taste perception. Journal of Marketing Research, 1, 36–39.
Allison, T. & Cicchetti, D. (1976). Sleep in mammals: Ecological and
constitutional correlates. Science, 194, 732–734.
Alloy, L. B. & Tabachnik, N. (1984). Assessment of covariation by
humans and animals: The joint influence of prior expectations
and current situational information. Psychological Review, 91,
112–149.
498
REFERENCES 499
Altmann, E. M. & Gray, W. D. (2002). Forgetting to remember: The

functional relationship of decay and interference. Psychological
Science, 13, 27–33.
American Gaming Association. (2008). 2008 State of the states: The
AGA survey of casino entertainment. Washington, DC: Author.
Anderson, C. (2006). The long tail: Why the future of business is selling
less of more. New York: Hyperion.
Anderson, J. R. (1974). Retrieval of propositional information from
long-term memory. Cognitive Psychology, 5, 451–474.
Anderson, J. R. (1990). The adaptive character of thought. Hillsdale,
NJ: Erlbaum.
Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., &
Qin, Y. (2004). An integrated theory of the mind. Psychological
Review, 111, 1036–1060.
Anderson, J. R., Bothell, D., Lebiere, C., & Matessa, M. (1998). An inte-
grated theory of list memory. Journal of Memory and Language, 38,
341–380.
Anderson, J. R. & Lebiere, C. (1998). The atomic components of thought.
Mahwah, NJ: Erlbaum.
Anderson, J. R. & Milson, R. (1989). Human memory: An adaptive per-
spective. Psychological Review, 96, 703–719.
Anderson, J. R. & Schooler, L. J. (1991). Reflections of the environment
in memory Psychological Science, 2, 396–408.
Anderson, J. R. & Schooler, L. J. (2000). The adaptive nature of memory.
In E. Tulving & F. I. M. Craik (Eds.), Oxford handbook of memory
(pp. 557–570). Oxford: Oxford University Press.
Anderson, S. P. & de Palma, A. (2004). The economics of pricing park-
ing. Journal of Urban Economics, 55, 1–20.
Andersson, P., Edman, J., & Ekman, M. (2005). Predicting the World
Cup 2002 in soccer: Performance and confidence of experts and
non-experts. International Journal of Forecasting, 21, 565–576.
Ariely, D. & Levav, J. (2000). Sequential choice in group settings: Taking
the road less traveled and less enjoyed. Journal of Consumer
Research, 27, 279–290.
Armelius, B. & Armelius, K. (1974). The use of redundancy in multi-
ple-cue judgments: Data from a suppressor-variable task. American
Journal of Psychology, 87, 385–392.
Armor, D. A. & Taylor, S. E. (2002). When predictions fail: The dilemma
of unrealistic optimism. In T. Gilovich, D. Griffin, & D. Kahneman
(Eds.), Heuristics and biases: The psychology of intuitive judgment
(pp. 334–347). Cambridge: Cambridge University Press.
Arnott, R. & Rowse, J. (1999). Modeling parking. Journal of Urban
Economics, 45, 97–124.
Aro, A. R., de Koning, H. J., Absetz, P., & Schreck, M. (1999). Psychosocial
predictors of first attendance for organised mammography screen-
ing. Journal of Medical Screening, 6, 82–88.
500 REFERENCES
Arrow, K. J. (2004). Is bounded rationality unboundedly rational? Some

ruminations. In M. Augier & J. G. March (Eds.), Models of a man:
Essays in memory of Herbert A. Simon (pp. 47–55). Cambridge,
MA: MIT Press.
Ashby, F. G. (Ed.). (1992). Multidimensional models of categorization.
Hillsdale, NJ: Erlbaum.
Asuncion, A. & Newman, D. J. (2007). UCI machine learning reposi-
tory [http://www.ics.uci.edu/∼mlearn/MLRepository.html]. Irvine,
CA: University of California, School of Information and Computer
Science.
Axelrod, R. (1984). The evolution of cooperation. New York: Basic
Books.
Ayton, P. & Önkal, D. (2004). Effects of ignorance and information on
judgmental forecasting. Unpublished manuscript, City University,
London, England.
Babler, T. G. & Dannemiller, J. L. (1993). Role of image acceleration
in judging landing location of free-falling projectiles. Journal of
Experimental Psychology: Human Perception and Performance,
19, 15–31.
Bachmann, L. M., Gutzwiller, F. S., Puhan, M. A., Steurer, J., Steurer-
Stey, C., & Gigerenzer, G. (2007). Do citizens have minimum medi-
cal knowledge? A survey. BioMed Central Medicine, 5, 14.
Bäck, T., Rudolph, G., & Schwefel, H.-P. (1993). Evolutionary program-
ming and evolution strategies: Similarities and differences. In
D. B. Fogel & W. Atmars (Eds.), Proceedings of the Second Annual
Conference on Evolutionary Programming (pp. 11–22). San Diego,
CA: Evolutionary Programming Society.
Bak, P. (1997). How nature works: The science of self-organized criti-
cality. New York: Oxford University Press.
Banks, S. M., Salovey, P., Greener, S., Rothman, A. J., Moyer, A.,
Beauvais, J., et al. (1995). The effects of message framing on mam-
mography utilization. Health Psychology, 14, 178–184.
Baranski, J. V. & Petrusic, W. M. (1994). The calibration and resolution of
confidence in perceptual judgment. Perception and Psychophysics,
55, 412–428.
Baratgin, J. & Noveck, I. A. (2000). Not only base rates are neglected
in the engineer-lawyer problem: An investigation of reasoners’
underutilization of complementarity. Memory & Cognition, 28,
79–91.
Barber, B. (1961). Resistance by scientists to scientific discovery.
Science, 134, 596–602.
Barbey, A. K. & Sloman, S. A. (2007). Base-rate respect: From ecologi-
cal rationality to dual processes. Behavioral and Brain Sciences,
30, 241–254.
Bar-Hillel, M. (1980). The base-rate fallacy in probability judgments.
Acta Psychologica, 44, 211–233.
Barlow, H. (2001). The exploitation of regularities in the environment
by the brain. Behavioral and Brain Sciences, 24, 602–607.
REFERENCES 501
Baron, J. (1985). Rationality and intelligence. Cambridge: Cambridge

University Press.
Baron, R. S., Kerr, N. L., & Miller, N. (1992). Group process, group deci-
sion, group action. Buckingham, England: Open University Press.
Barratt, A., Cockburn, C., Furnival, A., McBride, A., & Mallon, L. (1999).
Perceived sensitivity of mammographic screening: Women’s views
on test accuracy and financial compensation for missed cancers.
Journal for Epidemiology and Community Health, 53, 716–720.
Baucells, M., Carasco, J. A., & Hogarth, R. M. (2008). Cumulative domi-
nance and heuristic performance in binary multi-attribute choice.
Operations Research, 56, 1289–1304.
Baumann, M. R. & Bonner, B. L. (2004). The effects of variability and
expectations on utilization of member expertise and group perfor-
mance. Organizational Behavior and Human Decision Processes,
93, 89–101.
Beach, L. R. & Mitchell, T. R. (1978). A contingency model for the selec-
tion of decision strategies. Academy of Management Review, 3,
439–449.
Bearden, J. N. & Connolly, T. (2007). Multi-attribute sequential search.
Organizational Behavior and Human Decision Processes, 103,
147–158.
Becker, G. S. (1978). The economic approach to human behavior.
Chicago: University of Chicago Press.
Begg, I. M., Anas, A., & Farinacci, S. (1992). Dissociation of processes
in belief: Source recollection, statement familiarity, and the illu-
sion of truth. Journal of Experimental Psychology: General, 121,
446–458.
Bennis, W. (2004). Experience, values, beliefs, and the sociocultural
context in gambling decision making: A field study of casino black-
jack (Dissertation). Ann Arbor, MI: UMI Dissertations Publishing.
Bentley, J. L. & McGeoch, C. C. (1985). Amortized analyses of self-
organizing sequential search heuristics. Communications of the
ACM, 28, 404–411.
Berg, N. (2006). A simple Bayesian procedure for sample size deter-
mination in an audit of property value appraisals. Real Estate
Economics 34, 133–155.
Berg, N., Biele, G., & Gigerenzer, G. (2010). Logical consistency and
accuracy of beliefs: Survey evidence on health decision-making
among economists. Unpublished manuscript.
Berg, N. & Gigerenzer, G. (2010). As-if behavioral economics: Neo-
classical economics in disguise? History of Economic Ideas, 18,
133–165.
Bergert, F. B. & Nosofsky, R. M. (2007). A response-time approach
to comparing generalized rational and take-the-best models of
decision making. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 33, 107–129.
Berndt, E. R. & Wood, D. O. (1975). Technology, prices, and the derived
demand for energy. Review of Economics and Statistics, 57, 259–268.
502 REFERENCES
Berretty, P. M., Todd, P. M., & Martignon, L. (1999). Categorization

by elimination: Using few cues to choose. In G. Gigerenzer, P. M.
Todd, & the ABC Research Group, Simple heuristics that make us
smart (pp. 235–254). New York: Oxford University Press.
Berwick, D. M., Fineberg, H. V., & Weinstein, M. C. (1981). When doc-
tors meet numbers. American Journal of Medicine, 71, 991–998.
Betsch, T. & Haberstroh, S. (Eds.). (2005). The routines of decision
making. Mahwah, NJ: Erlbaum.
Bettman, J. R., Johnson, E. J., Luce, M. F., & Payne, J. W. (1993).
Correlation, conflict, and choice. Journal of Experimental Psychol-
ogy: Learning, Memory, and Cognition, 19, 931–951.
Bettman, J. R., Johnson, E. J., & Payne, J. W. (1990). A componential
analysis of cognitive effort in choice. Organizational Behavior and
Human Decision Processes, 45, 111–139.
Bjork, E. L. & Bjork, R. A. (1988). On the adaptive aspects of retrieval
failure in autobiographical memory. In M. M. Gruneberg, P. E.
Morris, & R. N. Sykes (Eds.), Practical aspects of memory II
(pp. 283–288). London: Academic Press.
Björkman, M. (1994). Internal cue theory: Calibration and resolution
of confidence in general knowledge. Organizational Behavior and
Black, W. C., Nease, R. F., Jr., & Tosteson, A. N. (1995). Perceptions of breast
cancer risk and screening effectiveness in women younger than 50
years of age. Journal of the National Cancer Institute, 87, 720–731.
Bless, H., Wänke, M., Bohner, G., Fellhauer, R. F., & Schwarz, N. (1994).
Need for cognition: Eine Skala zur Erfassung von Engagement und
Freude bei Denkaufgaben. Zeitschrift für Sozialpsychologie, 25,
147–154.
Bookstaber, R. & Langsam, J. (1985). On the optimality of coarse behav-
ior rules. Journal of Theoretical Biology, 116, 161–193.
Borges, B., Goldstein, D. G., Ortmann, A., & Gigerenzer, G. (1999). Can
ignorance beat the stock market? In G. Gigerenzer, P. M. Todd, &
the ABC Research Group, Simple heuristics that make us smart
(pp. 59–72). New York: Oxford University Press.
Borkenau, P. & Ostendorf, F. (1993). NEO-Fünf-Faktoren-Inventar
(NEO-FFI). Göttingen, Germany: Hogrefe.
Both, C., Bouwhuis, S., Lessells, C. M., & Visser, M. E. (2006). Climate
change and population declines in a long-distance migratory bird.
Nature, 441, 81–83.
Bottorff, J. L., Ratner, P. A., Johnson, J. L., Lovato, C. Y., & Joab, S. A.
(1998). Communicating cancer risk information: The challenges of
uncertainty. Patient Education and Counseling, 33, 67–81.
Box, G. E. P. & Jenkins, G. M. (1976). Time series analysis, forecasting,
and control. San Francisco: Holden-Day.
Boyd, M. (2001). On ignorance, intuition, and investing: A bear
market test of the recognition heuristic. Journal of Psychology and
Financial Markets, 2, 150–156.
REFERENCES 503
Boyd, R. & Richerson, P. J. (1985). Culture and evolutionary processes.

Chicago: University of Chicago Press.
Boyd, R. & Richerson, P. J. (2005). The origin and evolution of cultures.
New York: Oxford University Press.
Boyle, P. & Ferlay, J. (2005). Cancer incidence and mortality in Europe,
2004. Annals of Oncology, 16, 481–488.
Brakman, S., Garretsen, H., Van Marrewijk, C., & Berg, M. van den.
(1999). The return of Zipf: Towards a further understanding of
the rank–size distribution. Journal of Regional Science, 39,
183–213.
Brand, S., Reimer, T., & Opwis, K. (2003). Effects of metacognitive
thinking and knowledge acquisition in dyads on individual prob-
lem solving and transfer performance. Swiss Journal of Psychology,
62, 251–261.
Brandstätter, E., Gigerenzer, G., & Hertwig, R. (2006). The priority heu-
ristic: A process model of risky choice. Psychological Review, 113,
409–432.
Brandstätter, E., Gigerenzer, G., & Hertwig, R. (2008). Risky choice
with heuristics: Reply to Birnbaum (2008), Johnson, Schulte-
Mecklenbeck, and Willemsen (2008) and Rieger and Wang (2008).
Psychological Review, 115, 281–290.
Brase, G. L. (2002). Which statistical formats facilitate what decisions?
The perception and influence of different statistical information
formats. Journal of Behavioral Decision Making, 15, 381–401.
Brase, G. L. (2008). Frequency interpretation of ambiguous statistical
information facilitates Bayesian reasoning. Psychonomic Bulletin
& Review, 15, 284–289.
Brehmer, B. (1973). Note on the relation between single-cue probabil-
ity learning and multiple-cue probability learning. Organizational
Behavior and Human Performance, 9, 246–252.
Brehmer, B. (1994). The psychology of linear judgement models. Acta
Psychologica, 87, 137–154.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1993).
Classification and regression trees. New York: Chapman & Hall.
Brighton, H. & Gigerenzer, G. (2008). Bayesian brains and cognitive
mechanisms: Harmony or dissonance? In N. Chater & M. Oaksford
(Eds.), The probabilistic mind: Prospects for Bayesian cognitive
science (pp. 189–208). New York: Oxford University Press.
Bröder, A. (2000a). Assessing the empirical validity of the “take-the-
best” heuristic as a model of human probabilistic inference. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 26,
1332–1346.
Bröder, A. (2000b). A methodological comment on behavioral decision
research. Psychologische Beiträge, 42, 645–662.
Bröder, A. (2000c). “Take the best—ignore the rest.” Wann entscheiden
Menschen begrenzt rational? [When do people decide boundedly
rationally?] Lengerich, Germany: Pabst Science Publishers.
504 REFERENCES
Bröder, A. (2002). Take the best, Dawes’ rule, and compensatory deci-
sion strategies: A regression-based classification method. Quality
& Quantity, 36, 219–238.
Bröder, A. (2003). Decision making with the “adaptive toolbox”:
Influence of environmental structure, intelligence, and work-
ing memory load. Journal of Experimental Psychology: Learning,
Bröder, A. (2005). Entscheiden mit der “adaptiven Werkzeugkiste”:
Ein empirisches Forschungsprogramm. [Decision making with the
“adaptive toolbox”: An empirical research program]. Lengerich,
Germany: Pabst Science.
Bröder, A., & Eichler, A. (2001). Individuelle Unterschiede in bevor-
zugten Entscheidungsstrategien. [Individual differences in preferred
decision strategies]. Poster presented at the 43rd “Tagung experi-
mentell arbeitender Psychologen,” April 9–11, 2001, Regensburg,
Germany.
Bröder, A. & Eichler, A. (2006). The use of recognition information and
additional cues in inferences from memory. Acta Psychologica,
121, 275–284.
Bröder, A. & Gaissmaier, W. (2007). Sequential processing of cues in
memory-based multi-attribute decisions. Psychonomic Bulletin
and Review, 14, 895–900.
Bröder, A. & Newell, B. R. (2008). Challenging some common beliefs
about cognitive costs: Empirical work within the adaptive toolbox
metaphor. Judgment and Decision Making, 3, 195–204.
Bröder, A. & Schiffer, S. (2003a). Bayesian strategy assessment in multi-
attribute decision research. Journal of Behavioral Decision Making,
16, 193–213.
Bröder, A. & Schiffer, S. (2003b). “Take the best” versus simultaneous
feature matching: Probabilistic inferences from memory and effects
of representation format. Journal of Experimental Psychology:
General, 132, 277–293.
Bröder, A. & Schiffer, S. (2006a). Adaptive flexibility and maladaptive
routines in selecting fast and frugal decision strategies. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 32,
904–918.
Bröder, A. & Schiffer, S. (2006b). Stimulus format and working memory
in fast and frugal strategy selection. Journal of Behavioral Decision
Making, 19, 361–380.
Brown, N. R. (2002). Real-world estimation: Estimation modes and
seeding effects. Psychology of Learning and Motivation, 41,
321–359.
Bruner, J. S., Goodnow, J. J., & Austin, A. A. (1956). A study of thinking.
New York: Wiley.
Brunswik, E. (1943). Organismic achievement and environmental prob-
ability. Psychological Review, 50, 255–272.
Brunswik, E. (1955). Representative design and probabilistic theory in
a functional psychology. Psychological Review, 62, 193–217.
REFERENCES 505
Bruss, F. T. (2000). Der Ungewissheit ein Schnippchen schlagen. Spektrum

der Wissenschaft, 6, 106.
Buchanan, M. (1997). One law to rule them all. New Scientist, 2107,
30–35.
Bucher, H. C., Weinbacher, M., & Gyr, K. (1994). Influence of method
of reporting study results on decision of physicians to prescribe
drugs to lower cholesterol concentration. British Medical Journal,
309, 761–764.
Budescu, D. V., Wallsten, T. S., & Au, W. T. (1997). On the impor-
tance of random error in the study of probability judgment. Part
II: Applying the stochastic judgment model to detect systematic
trends. Journal of Behavioral Decision Making, 10, 172–188.
Bullock, S. & Todd, P. M. (1999). Made to measure: Ecological rational-
ity in structured environments. Minds and Machines, 9, 497–541.
Burkell, J. (2004). What are the chances? Evaluating risk and benefit
information in consumer health materials. Journal of the Medical
Library Association, 92, 200–208.
Busemeyer, J. R. & Johnson, J. G. (2004). Computational models of deci-
sion making. In D. J. Koehler & N. Harvey (Eds.), Blackwell hand-
book of judgment and decision making (pp. 133–154). Oxford:
Blackwell.
Busemeyer, J. R. & Rapoport, A. (1988). Psychological models of deferred
decision making. Journal of Mathematical Psychology, 32, 1–44.
Camerer, C. F. & Johnson, E. J. (1991). The process-performance paradox
in expert judgment: How can the experts know so much and pre-
dict so badly? In J. Smith (Ed.), Towards a general theory of exper-
tise: Prospects and limits (pp. 195–217). Cambridge: Cambridge
University Press.
Carbone, C. & Gittleman, J. L. (2002). A common rule for the scaling of
carnivore diversity. Science, 295, 2273–2276.
Cardoza, A. (1998). Secrets of winning slots. New York: Author.
Carnap, R. (1947). On the application of inductive logic. Philosophy
and Phenomenological Research, 8, 133–148.
Castellan, N. J. (1973). Multiple-cue probability learning with irrele-
vant cues. Organizational Behavior and Human Performance, 9,
16–29.
Central Intelligence Agency. (2005). The world factbook. Dulles, VA:
Potomac Books.
Chamot, E., Charvet, A. I., & Perneger, T. V. (2005). Variability in
women’s desire for information about mammography screening:
Implications for informed consent. European Journal of Cancer
Prevention, 14, 413–418.
Chapman, G. B. (1991). Trial order affects cue interaction in contin-
gency judgment. Journal of Experimental Psychology: Learning,
Chapman, L. J. & Chapman, J. P. (1967). Genesis of popular but errone-
ous diagnostic observations. Journal of Abnormal Psychology, 72,
193–204.
506 REFERENCES
Charles, C., Gafni, A., & Whelan, T. (1999). Decision-making in the

physician-patient encounter: Revisiting the shared treatment deci-
sion-making model. Social Science and Medicine, 49, 651–661.
Charniak, E. & McDermott, D. (1985). An introduction to artificial intel-
ligence. Reading, MA: Addison-Wesley.
Chase, V. M. (1999). Where to look to find out why: Rational infor-
mation search in causal hypothesis testing. Unpublished doctoral
dissertation.
Chater, N. (2000). How smart can simple heuristics be? Behavioral and
Brain Sciences, 23, 745–746.
Chater, N. & Brown, G. D. A. (1999). Scale-invariance as a unifying
psychological principle. Cognition, 69, 17–24.
Chater, N. & Oaksford, M. (Eds.). (2008). The probabilistic mind:
Prospects for Bayesian cognitive science. Oxford: Oxford University
Press.
Chater, N., Oaksford, M., Nakisa, R., & Redington, M. (2003). Fast, frugal,
and rational: How rational norms explain behavior. Organizational
Behavior and Human Decision Processes, 90, 63–86.
Cheng, P. W. (1997). From covariation to causation: A causal power
theory. Psychological Review, 104, 367–405.
Cheng, P. W. & Novick, L. R. (1990). A probabilistic contrast model of
causal induction. Journal of Personality and Social Psychology, 58,
545–567.
Cheng, P. W. & Novick, L. R. (1992). Covariation in natural causal
induction. Psychological Review, 99, 365–382.
Christensen, L. R. & Greene, W. H. (1976). Economies of scale in U.S.
electric power generation. Journal of Political Economy, 84,
655–676.
Christensen-Szalanski, J. J. J. (1978). Problem solving strategies: A selec-
tion mechanism, some implications and some data. Organizational
Christiansen, E. M. (2006). The gross annual wager of the United States.
Insight, 4, 1–9.
Chu, P. C. & Spires, E. E. (2003). Perceptions of accuracy and effort of
decision strategies. Organizational Behavior and Human Decision
Processes, 91, 203–214.
Claudy, J. G. (1972). A comparison of five variable weighting proce-
dures. Educational and Psychological Measurement, 32, 311–322.
Clutton-Brock, T. H. & Albon, S. D. (1979). The roaring of red deer and
the evolution of honest advertisement. Behaviour, 69, 145–170.
Cockburn, J., Pit, S., & Redman, S. (1999). Perceptions of screening
mammography among women aged 40–49. Australian and New
Zealand Journal of Public Health, 23, 318–321.
Cockburn, J., Redman, S., Hill, D., & Henry, E. (1995). Public under-
standing of medical screening. Journal of Medical Screening, 2,
224–227.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences.
Hillsdale, NJ: Erlbaum.
REFERENCES 507
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multi-
ple regression/correlation analysis for the behavioral sciences (3rd
ed.). Mahwah, NJ: Erlbaum.
Colinvaux, P. A. (1978). Why big fierce animals are rare: An ecologist’s
perspective. Princeton, NJ: Princeton University Press.
Collett, T. S. & Land, M. F. (1975). Visual control of flight behaviour in
the hoverfly, Syritta pipiens L. Journal of Comparative Physiology,
99, 1–66.
Condorcet, N. C. (1785). Essai sur l’application de l’analyse a la probabi-
lite des decisions rendues a la pluralite des voix. Paris: Imprimerie
Royale.
Cook, M. & Mineka, S. (1989). Observational conditioning of fear to
fear-relevant versus fear-irrelevant stimuli in rhesus monkeys.
Journal of Abnormal Psychology, 98, 448–459.
Cook, M. & Mineka, S. (1990). Selective associations in the observational
conditioning of fear in rhesus monkeys. Journal of Experimental
Psychology: Animal Behavior Processes, 16, 372–389.
Cooksey, R. W. (1996). Judgment analysis: Theory, methods, and appli-
cations. London: Academic Press.
Coombs, C. H. & Lehner, P. E. (1981). Evaluation of two alternative
models of a theory of risk: I. Are moments useful in assessing
risks? Journal of Experimental Psychology: Human Perception and
Performance 7, 1110–1123.
Cooper, G. F. (1990). The computational complexity of probabilistic
inference using Bayesian belief networks. Artificial Intelligence,
42, 393–405.
Cooper, R. (2000). Simple heuristics could make us smart; but which
heuristic do we apply when? Behavioral and Brain Sciences, 23,
746.
Corbin, R. M., Olson, C. L., & Abbondanza, M. (1975). Context effects in
optional stopping decisions. Organizational Behavior and Human
Performance, 14, 207–216.
Cosmides, L, & Tooby, J. (1996). Are humans good intuitive statisti-
cians after all? Rethinking some conclusions from the literature on
judgment under uncertainty. Cognition, 58, 1–73.
Costa, P. T. & McCrae, R. R. (1992). The NEO personality inventory
and NEO five factor inventory. Professional manual. Odessa, FL:
Psychological Assessment Resources.
Coulter, A. (1997). Partnerships with patients: The pros and cons
of shared clinical decision-making. Journal of Health Services
Research and Policy, 2, 112–121.
Cover, T. & Hart, P. (1967). Nearest neighbor pattern classification. IEEE
Transactions on Information Theory, 13, 21–27.
Cowan, N. (2001). The magical number 4 in short-term memory: A
reconsideration of mental storage capacity. Behavioral and Brain
Sciences, 24, 87–185.
Czerlinski, J., Gigerenzer, G., & Goldstein, D. G. (1999). How good are
simple heuristics? In G. Gigerenzer, P. M. Todd, & the ABC Research
508 REFERENCES
Group, Simple heuristics that make us smart (pp. 97–118). New

York: Oxford University Press.
Dagum, P. & Luby, M. (1993). Approximating probabilistic inference
in Bayesian belief networks is NP-hard. Artificial Intelligence, 60,
141–153.
Dasarathy, B. (1991). Nearest neighbor (NN) norms: NN pattern classifica-
tion techniques. Los Alamitos, CA: IEEE Computer Society Press.
Daston, L. J. (1988). Classical probability in the Enlightenment.
Princeton, NJ: Princeton University Press.
Davis, J. H. (1973). Group decision and social interaction: A theory of
social decision schemes. Psychological Review, 80, 97–125.
Davis, J. H. (1992). Some compelling intuitions about group consen-
sus decisions, theoretical and empirical research, and interper-
sonal aggregation phenomena: Selected examples, 1950–1990.
3–38.
Dawes, R. M. (1979). The robust beauty of improper linear models in
decision making. American Psychologist, 34, 571–582.
Dawes, R. M. (1993). Prediction of the future versus an understanding
of the past: A basic asymmetry. American Journal of Psychology,
106, 1–24.
Dawes, R. M. & Corrigan, B. (1974). Linear models in decision making.
Psychological Bulletin, 81, 95–106.
Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial
judgment. Science, 243, 1668–1674.
Dawes, R. M. & Mulford, M. (1996). The false consensus effect and
overconfidence: Flaws in judgment or flaws in how we study judg-
ment? Organizational Behavior and Human Decision Processes,
65, 201–211.
Dawkins, R. (1989). The selfish gene (2nd ed.). Oxford: Oxford University
Press.
DeGroot, M. H. (1970). Optimal statistical decisions. New York:
McGraw–Hill.
DeMiguel, V., Garlappi, L., & Uppal, R. (2009). Optimal versus naive
diversification: How inefficient is the 1/N portfolio strategy?
Review of Financial Studies, 22, 1915–1953.
Dennett, D. A. (1991). Consciousness explained. Boston: Little,
Brown.
Detweiler, J. B., Bedell, B. T., Salovey, P., Pronin, E., & Rothman, A. J.
(1999). Message framing and sunscreen use: Gain-framed messages
motivate beach-goers. Health Psychology, 18, 189–196.
Dhami, M. K. (2003). Psychological models of professional decision-
making. Psychological Science, 14, 175–180.
Dhami, M. K. & Ayton, P. (2001). Bailing and jailing the fast and frugal
way. Journal of Behavioral Decision Making, 14, 141–168.
Dhami, M. K., Hertwig, R., & Hoffrage, U. (2004). The role of represen-
tative design in an ecological approach to cognition. Psychological
Bulletin, 130, 959–988.
REFERENCES 509
Dickerson, M. (1977). The role of the betting shop environment in the

training of compulsive gamblers. B. A. B. P. Bulletin, 5, 3–8.
Dieckmann, A. & Rieskamp, J. (2007). The influence of information
redundancy on probabilistic inferences. Memory & Cognition, 35,
1801–1813.
Dillner, L. (1996). Pill scare linked to rise in abortions. British Medical
Journal, 312, 996.
Dobias, K. S., Moyer, C. A., McAchran, S. E., Katz, S. J., & Sonnad,
S. S. (2001). Mammography messages in the popular media:
Implications for patient expectations and shared clinical decision-
making. Health Expectations, 4, 131–139.
Domenighetti, G., D’Avanzo, B., Egger, M., Berrino, F., Perneger, T.,
Mosconi, P., et al. (2003). Women’s perception of the benefits of
mammography screening: Population-based survey in four coun-
tries. International Journal of Epidemiology, 32, 816–821.
Domingos, P. & Pazzani, M. (1997). On the optimality of the simple
Bayesian classifier under zero-one loss. Machine Learning, 29,
103–130.
Dougherty, M. R. P., Gettys, C. F., & Ogden, E. E. (1999). MINERVA-DM:
A memory processes model for judgments of likelihood.
Douglas, M. (1992). Risk and blame: Essays in cultural theory. London:
Routledge.
Doya, K., Ishii, S., Pouget, A., & Rao, R. P. N. (Eds.). (2007). Bayesian
brain: Probabilistic approaches to neural coding. Cambridge, MA:
MIT Press.
Doyal, L. (2001). Informed consent: Moral necessity or illusion? Quality
in Health Care, 10(Suppl. 1), 29–33.
Drossaert, C. H. C., Boer, H., & Seydel, E. R. (1996). Health education
to improve repeat participation in the Dutch breast cancer screen-
ing programme: Evaluation of a leaflet tailored to previous partici-
pants. Patient Education and Counseling, 28, 121–131.
Dudey, T. & Todd, P. M. (2002). Making good decisions with mini-
mal information: Simultaneous and sequential choice. Journal of
Bioeconomics, 3, 195–215.
Dunn, A. S., Shridharani, K. V., Lou, W., Bernstein, J., & Horowitz, C. R.
(2001). Physician-patient discussion of controversial cancer screen-
ing tests. American Journal of Preventive Medicine, 20, 130–134.
Eadington, W. R. (1988). Economic perceptions of gambling behavior.
Journal of Gambling Behavior, 3, 264–273.
Ebbinghaus, H. (1966). Über das Gedächtnis. Untersuchungen zur
Experimentellen Psychologie [About memory. Investigations in
experimental psychology]. Amsterdam: Bonset. (Original work
published 1885)
Echterhoff, W. (1987). Eine neue Methode für Risikovergleiche,
dargestellt an zwei Unfallentwicklungen. In G. Kroj & E. Sporer
(Eds.), Wege der Verkehrspsychologie (pp. 26–38). Braunschweig,
Germany: Rot-Gelb-Grün.
510 REFERENCES
Edgell, S. E. & Hennessey, J. E. (1980). Irrelevant information and utiliza-

tion of event base rates in nonmetric multiple-cue probability learn-
ing. Organizational Behavior and Human Performance, 26, 1–6.
Edwards, A., Elwyn, G., Covey, J., Matthews, E., & Pill, R. (2001).
Presenting risk information—A review of the effects of “framing’’
and other manipulations on patient outcomes. Journal of Health
Communication, 6, 61–82.
Edwards, A. G. K., Evans, R., Dundon, J., Haigh, S., Hood, K., & Elwyn,
G. J. (2006). Personalised risk communication for informed deci-
sion making about taking screening tests. Cochrane Database of
Systematic Reviews, 4, Art. No. CD001865.
Edwards, W. (1968). Conservatism in human information processing.
In B. Kleinmuntz (Ed.), Formal representation of human judgment
(pp. 17–52). New York: Wiley.
Einhorn, H. J. (1970). The use of nonlinear, noncompensatory models
in decision making. Psychological Bulletin, 73, 221–230.
Einhorn, H. J. (1972). Expert measurement and mechanical com-
bination. Organizational Behavior and Human Performance, 7,
86–106.
Einhorn, H. J. & Hogarth, R. M. (1975). Unit weighting schemes for deci-
sion making. Organizational Behavior and Human Performance,
13, 171–192.
Einhorn, H. J. & Hogarth, R. M. (1986). Judging probable cause.
Elmore, J. G., Barton, M. B., Moceri, V. M., Polk, S., Arena, P. J., &
Fletcher, S. W. (1998). Ten-year risk of false positive screening
mammograms and clinical breast examinations. New England
Journal of Medicine, 338, 1089–1096.
Enquist, M. & Leimar, O. (1990). The evolution of fatal fighting. Animal
Behavior, 39, 1–9.
Epstein, R. A. (1995). Simple rules for a complex world. Cambridge,
MA: Harvard University Press.
Erev, I., Wallsten, T. S., & Budescu, D. V. (1994). Simultaneous over-
and underconfidence: The role of error in judgment processes.
Estes, W. K. (1976). The cognitive side of probability learning.
Ettenson, R., Shanteau, J., & Krogstad, J. (1987). Expert judgment: Is
more information better? Psychological Report, 60, 227–238.
Evans, J. S. B. T. (1989). Bias in human reasoning: Causes and conse-
quences. Hillsdale, NJ: Erlbaum.
Evans, J. S. B. T. (2008). Dual-processing accounts of reasoning, judgment
and social cognition. Annual Review of Psychology, 59, 255–278.
Evans, J. S. B. T. & Over, D. E. (1996). Rationality in the selection task:
Epistemic utility versus uncertainty reduction. Psychological
Review, 103, 356–363.
Ewald, P. W. (1994). Evolution of infectious diseases. Oxford: Oxford
University Press.
REFERENCES 511
Fahrenberg, J., Hempel, R., & Selg, H. (1994). Das Freiburger

Persönlichkeits-Inventar FPI (6th rev. ed.). Göttingen, Germany:
Hogrefe.
Fair, R. C. (1986). Evaluating the predictive accuracy of models. In
Z. Griliches & M. D. Intriligator (Eds.), Handbook of econometrics
(pp. 1979–1995). Amsterdam: North-Holland.
Fasolo, B., McClelland, G. H., & Todd, P. M. (2007). Escaping the tyranny
of choice: When fewer attributes make choice easier. Marketing
Theory, 7, 13–26.
Ferguson, T. S. (n.d.). Optimal stopping and applications. Retrieved
from http://www.math.ucla.edu/~tom/Stopping/Contents.html
Fiedler, K. (1983). On the testability of the availability heuristic. In
R. W. Scholz (Ed.), Decision making under uncertainty (pp. 109–119).
Amsterdam: North-Holland.
Fiedler, K. (1991). The tricky nature of skewed frequency tables:
An information loss account of distinctiveness-based illusory
correlations. Journal of Personality and Social Psychology, 60,
24–36.
Fiedler, K. (1996). Explaining and simulating judgment biases as an
aggregation phenomenon in probabilistic, multiple-cue environ-
ments. Psychological Review, 103, 193–214.
Fiedler, K., Russer, S., & Gramm, K. (1993). Illusory correlations and
memory performance. Journal of Experimental Social Psychology,
29, 111–136.
Fiedler, K., Walther, E., & Nickel, S. (1999). The autoverification of
social hypotheses: Stereotyping and the power of sample size.
Journal of Personality and Social Psychology, 77, 5–18.
Fildes, R. & Makridakis, S. (1995). The impact of empirical accuracy
studies on time series analysis and forecasting. International
Statistical Review, 65, 289–308.
Finkelstein, M. O. & Levin, B. (2001). Statistics for lawyers (2nd ed.).
New York: Springer.
Fischer, J. E., Steiner, F., Zucol, F., Berger, C., Martignon, L., Bossart, W.,
et al. (2002). Using simple heuristics to target macrolide prescrip-
tion in children with community-acquired pneumonia. Archives
of Pediatric and Adolescent Medicine, 156, 1005–1008.
Fischhoff, B. (2002). Heuristics and biases in application. In T. Gilovich,
D. Griffin, & D. Kahneman (Eds.), Heuristics and biases: The
psychology of intuitive judgment (pp. 730–748). Cambridge:
Cambridge University Press.
Fischhoff, B. & Beyth-Marom, R. (1983). Hypothesis testing from a
Bayesian perspective. Psychological Review, 90, 239–260.
Fischhoff, B. & MacGregor, D. (1982). Subjective confidence in fore-
casts. Journal of Forecasting, 1, 155–172.
Fischhoff, B., Watson, S. C., & Hope, C. (1984). Defining risk. Policy
Science, 17, 123–139.
Fishburn, P. (2001). Utility and subjective probability: Contemporary
theories. In N. J. Smelser & P. B. Baltes (Eds.), International
512 REFERENCES
encyclopedia of the social and behavioral sciences (Vol. 24,

pp. 16113–1621). London: Elsevier.
Fishburn, P. C. (1974). Lexicographic orders, utilities and decision
rules: A survey. Management Science, 20, 1442–1471.
Flexser, A. J. & Bower, G. H. (1975). Further evidence regarding instruc-
tional effects on frequency judgments. Bulletin of the Psychonomic
Society, 6, 321–324.
Ford, J. K., Schmitt, N., Schechtman, S. L., Hults, B. H., & Doherty,
M. L. (1989). Process tracing methods: Contributions, problems,
and neglected research questions. Organizational Behavior and
Decision Processes, 43, 75–117.
Forster, M. R. (1994). Non-Bayesian foundations for statistical estima-
tion, prediction, and the Ravens example. Erkenntnis, 40, 357–376.
Fox, J. (1997). Applied regression analysis, linear models, and related
methods. Thousand Oaks, CA: Sage.
Foxall, G. R. & Goldsmith, R. E. (1988). Personality and consumer
research: Another look. Journal of the Market Research Society,
30, 111–125.
Franklin, B. (1907). Letter to Jonathan Williams (Passy, April 8, 1779).
In A. H. Smyth (Ed.), The writings of Benjamin Franklin (Vol. VII,
pp. 281–282). New York: Macmillan.
Friedman, J. H. (1997). On bias, variance, 0/1–loss, and the curse-of-
dimensionality. Data Mining and Knowledge Discovery, 1, 55–77.
Friedman, M. (1953). Essays in positive economics. Chicago: University
of Chicago Press.
Friedman, M. (1992). Do old fallacies ever die? Journal of Economic
Literature, 30, 2129–2132.
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network
classifiers. Machine Learning, 29, 131–163.
Frosch, C., Beaman, C. P., & McCloy, R. (2007). A little learning is a
dangerous thing: An experimental demonstration of ignorance-
driven inference. Quarterly Journal of Experimental Psychology,
60, 1329–1336.
Fudenberg, D. & Tirole, J. (1991). Game theory. Cambridge, MA: MIT
Press.
Funder, D. C. (1987). Errors and mistakes: Evaluating the accuracy of
social judgment. Psychological Bulletin, 101, 75–90.
Furby, L. (1973). Interpreting regression toward the mean in develop-
mental research. Developmental Psychology, 8, 172–179.
Furedi, A. (1999). The public health implications of the 1995 “pill
scare.” Human Reproduction Update, 5, 621–626.
Gabaix, X. (1999). Zipf’s law for cities: An explanation. Quarterly
Journal of Economics, 114, 739–767.
Gaboury, A. & Ladouceur, R. (1988). Irrational thinking and gambling.
In W. R. Eadington (Ed.), Gambling research: Proceedings of the
Seventh International Conference on Gambling and Risk Taking
(Vol. 3, pp. 142–163). Reno: University of Nevada.
REFERENCES 513
Gaboury, A. & Ladouceur, R. (1989). Erroneous perceptions and gam-

bling. Journal of Social Behavior and Personality, 4, 411–420.
Gaissmaier, W. (2008). The mnemonic decision maker: How search in mem-
ory shapes decision making. Doctoral dissertation, Free University
Berlin. http://www.diss.fu-berlin.de/diss/receive/FUDISS_thesis_
000000005913.
Galef, B. G., Jr., McQuoid, L. M., & Whiskin, E. E. (1990). Further evi-
dence that Norway rats do not socially transmit learned aversions
to toxic baits. Animal Learning and Behavior, 18, 199–205.
Galesic, M., Gigerenzer, G., & Straubinger, N. (2009). Natural fre-
quencies help older adults and people with low numeracy to
evaluate medical screening tests. Medical Decision Making, 29,
368–371.
Gallup Organization. (1993). The American public’s attitude toward
organ donation and transplantation. Princeton, NJ: Author.
Gambetta, D. & Hamill, H. (2005). Streetwise. How taxi drivers estab-
lish their customers’ trustworthiness. New York: Russell Sage.
Garb, H. N. (1998). Studying the clinician: Judgment research and psy-
chological assessment. Washington, DC: American Psychological
Association.
Garcia-Retamero, R. & Dhami, M. K. (2009). Take-the-best in expert-
novice decision strategies for residential burglary. Psychonomic
Bulletin & Review, 16, 163–169.
García-Retamero, R., Takezawa, M., & Gigerenzer, G. (2006). How to
learn good cue orders: When social learning benefits simple heu-
ristics. In R. Sun (Ed.), Proceedings of the 28th Annual Conference
of the Cognitive Science Society (pp. 1352–1357). Mahwah, NJ:
Erlbaum.
García-Retamero, R., Wallin, A., & Dieckmann, A. (2007). Does causal
knowledge help us be faster and more frugal in our decisions?
Memory & Cognition, 35, 1399–1409.
Gartner, B. (2004, July 22). Nach Ihnen, Konsul [After you, Consul]. Die
Zeit. Retrieved August 18, 2009, from http://www.zeit.de/2004/31/
A-Verkehr_in_Rom.
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and
the bias/variance dilemma. Neural Computation, 4, 1–58.
General Medical Council. (1998, November). Seeking patients’ con-
sent: The ethical considerations. Retrieved September 17, 2001,
from http://www.gmc-uk.org/standards/consent.htm.
Ghiselli, E. E., Campbell, J. P., & Zedeck, S. (1981). Measurement theory
for the behavioral sciences. San Francisco, CA: Freeman.
Ghosh, A. K. & Ghosh, K. (2005). Translating evidence-based informa-
tion into effective risk communication: Current challenges and
opportunities. Journal of Laboratory and Clinical Medicine, 145,
171–180.
Gibson, J. J. (1979). The ecological approach to visual perception.
Boston: Houghton Mifflin.
514 REFERENCES
Gigerenzer, G. (1991). From tools to theories: A heuristic of discovery

in cognitive psychology. Psychological Review, 98, 254–267.
Gigerenzer, G. (1996). On narrow norms and vague heuristics: A reply
to Kahneman and Tversky. Psychological Review, 103, 592–596.
Gigerenzer, G. (1998a). Ecological intelligence. In D. Cummins & C.
Allen (Eds.), The evolution of mind (pp. 9–29). New York: Oxford
University Press.
Gigerenzer, G. (1998b). We need statistical thinking, not statistical ritu-
als. Behavioral and Brain Sciences, 21, 199–200.
Gigerenzer, G. (2000). Adaptive thinking: Rationality in the real world.
Gigerenzer, G. (2002). Calculated risks: How to know when numbers
deceive you. New York: Simon & Schuster.
Gigerenzer, G. (2003). The adaptive toolbox and lifespan development:
Common questions? In U. M. Staudinger & U. Lindenberger (Eds.),
Understanding human development: Dialogues with lifespan psy-
chology (pp. 423–435). Boston: Kluwer.
Gigerenzer, G. (2004a). Mindless statistics. Journal of Socio-Economics,
33, 587–606.
Gigerenzer, G. (2004b). Striking a blow for sanity in theories of ratio-
nality. In M. Augier & J. G. March (Eds.), Models of a man: Essays
in memory of Herbert A. Simon (pp. 389–409). Cambridge, MA:
MIT Press.
Gigerenzer, G. (2005). I think therefore I err. Social Research, 72,
195–218.
Gigerenzer, G. (2007). Gut feelings: The intelligence of the unconscious.
New York: Viking Press.
Gigerenzer, G. & Brighton, H. (2009). Homo heuristicus: Why biased minds
make better inferences. Topics in Cognitive Science, 1, 107–143.
Gigerenzer, G., Czerlinski, J., & Martignon, L. (1999). How good are fast
and frugal heuristics? In J. Shanteau, B. A. Mellers, & D. A. Schum
(Eds.), Decision science and technology: Reflections on the contri-
butions of Ward Edwards (pp. 81–103). Norwell, MA: Kluwer.
Gigerenzer, G. & Edwards, A. (2003). Simple tools for understanding
risks: From innumeracy to insight. British Medical Journal, 327,
741–744.
Gigerenzer, G. & Engel, C. (Eds.). (2006). Heuristics and the law.
Cambridge, MA: MIT Press.
Gigerenzer, G., Gaissmaier, W., Kurz-Milcke, E., Schwartz, L. M., &
Woloshin, S. W. (2007). Helping doctors and patients make sense
of health statistics. Psychological Science in the Public Interest, 8,
53–96.
Gigerenzer, G. & Goldstein, D. G. (1996). Reasoning the fast and frugal
way: Models of bounded rationality. Psychological Review, 103,
650–669.
Gigerenzer, G. & Goldstein, D. G. (1999). Betting on one good reason:
The take the best heuristic. In G. Gigerenzer, P. M. Todd, & the ABC
REFERENCES 515
Research Group, Simple heuristics that make us smart (pp. 75–95).

Gigerenzer, G., & Goldstein, D. G. (2011). The recognition heuristic: A
decade of research. Judgment and Decision Making, 6, 100–121.
Gigerenzer, G., & Gray, J. A. M. (Eds.). (2011). Better doctors, better
patients, better decisions: Envisioning health care 2020. Cambridge,
MA: MIT Press.
Gigerenzer, G., Hell, W., & Blank, H. (1988). Presentation and con-
tent: The use of base rates as a continuous variable. Journal of
14, 513–525.
Gigerenzer, G., Hertwig, R., Broek, E. van den, Fasolo, B., & Katsikopoulos
K. V. (2005). “A 30% chance of rain tomorrow”: How does the
public understand probabilistic weather forecasts? Risk Analysis,
25, 623–629.
Gigerenzer, G., Hertwig, R., & Pachur, T. (2011). Heuristics: The
foundations of adaptive behavior. New York: Oxford University
Press.
Gigerenzer, G. & Hoffrage, U. (1995). How to improve Bayesian reason-
ing without instruction: Frequency formats. Psychological Review,
102, 684–704.
Gigerenzer, G. & Hoffrage, U. (1999). Overcoming difficulties in
Bayesian reasoning: A reply to Lewis and Keren (1999) and Mellers
and McGraw (1999). Psychological Review, 106, 425–430.
Gigerenzer, G. & Hoffrage, U. (2007). The role of representation in Bayesian
reasoning: Correcting common misconceptions [Commentary on
Barbey and Sloman]. Behavioral and Brain Sciences, 30, 264–267.
Gigerenzer, G., Hoffrage, U., & Ebert, A. (1998). AIDS counselling for
low-risk clients. AIDS Care, 10, 197–211.
Gigerenzer, G., Hoffrage, U., & Goldstein, D. G. (2008). Fast and frugal
heuristics are plausible models of cognition: Reply to Dougherty,
Franco-Watkins, and Thomas (2008). Psychological Review, 115,
230–239.
Gigerenzer, G., Hoffrage, U., & Kleinbölting, H. (1991). Probabilistic
mental models: A Brunswikian theory of confidence. Psychological
Review, 98, 506–528.
Gigerenzer, G., Mata, J., & Frank, R. (2009). Public knowledge about
breast and prostate cancer screening: A representative survey of
nine European countries. Journal of the National Cancer Institute,
101, 1216–1220.
Gigerenzer, G. & Selten, R. (Eds.). (2001). Bounded rationality: The
adaptive toolbox. Cambridge, MA: MIT Press.
Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Krüger,
L. (1989). The empire of chance. How probability changed science
and everyday life. Cambridge: Cambridge University Press.
Gigerenzer, G. & Todd, P. M. (1999). Fast and frugal heuristics: The
adaptive toolbox. In G. Gigerenzer, P. M. Todd, & the ABC Research
516 REFERENCES
Group, Simple heuristics that make us smart (pp. 3–34). New York:
Oxford University Press.
Gigerenzer, G., Todd, P. M., & the ABC Research Group. (1999). Simple
heuristics that make us smart. New York: Oxford University
Press.
Gigone, D. & Hastie, R. (1997). The impact of information on small
group choice. Journal of Personality and Social Psychology, 72,
132–140.
Gilbert, D. T. (1991). How mental systems believe. American Psychologist,
46, 107–119.
Gilbert, D. T., Krull, D. S., & Malone, P. S. (1990). Unbelieving the
unbelievable: Some problems in the rejection of false information.
Gilbert, D. T., Tafarodi, R. W., & Malone, P. S. (1993). You can’t not
believe everything you read. Journal of Personality and Social
Psychology, 65, 221–233.
Gimbel, R. W., Strosberg, M. A., Lehrman, S. E., Gefenas, E., & Taft, T.
(2003). Presumed consent and other predictors of cadaveric organ
donation in Europe. Progress in Transplantation, 13, 17–23.
Girotto, V. & Gonzalez, M. (2001). Solving probabilistic and statistical
problems: A matter of information structure and question form.
Cognition, 78, 247–276.
Gladwell, M. (2005). Blink: The power of thinking without thinking.
New York: Little, Brown.
Goldberg, L. R. (1970). Man versus model of man: A rationale, plus
some evidence of improving on clinical inferences. Psychological
Bulletin, 73, 422–432.
Goldberger, A. S. (1991). A course in econometrics. Cambridge, MA:
Harvard University Press.
Goldstein, D. G. & Gigerenzer, G. (1999). The recognition heuristic:
How ignorance makes us smart. In G. Gigerenzer, P. M. Todd, &
Goldstein, D. G. & Gigerenzer, G. (2002). Models of ecological rationality:
The recognition heuristic. Psychological Review, 109, 75–90.
Goldstein, D. G., Johnson, E. J., Herrmann, A., & Heitmann, M. (2008).
Nudge your customers toward better choices. Harvard Business
Review, 86(12), 99–105.
Good, I. J. (1967). On the principle of total evidence. The British Journal
for the Philosophy of Science, 17, 319–321.
Good, I. J. (1983). Good thinking: The foundations of probability and
its applications. Minneapolis: University of Minnesota Press.
Gordon, K. (1924). Group judgments in the field of lifted weights.
Journal of Experimental Psychology, 3, 398–400.
Gøtzsche, P. C. & Nielsen, M. (2006). Screening for breast cancer with
mammography. Cochrane Database of Systematic Reviews 2006, 4,
Art. No. CD001877.
REFERENCES 517
Green, D. & Over, D. E. (2000). Decision theoretic effects in testing a

causal conditional. Cahiers de Psychologie Cognitive, 19, 51–68.
Green, L. & Mehr, D. R. (1997). What alters physicians’ decisions to admit
to the coronary care unit? Journal of Family Practice, 45, 219–226.
Green, W. A. & Lazarus, H. (1991). Are today’s executives meeting with
success? Journal of Management Development, 10, 14–25.
Greene, W. H. (1991). Econometric analysis. New York: Macmillan.
Greene, W. H. (1992). A statistical model for credit scoring (Working
Paper No. EC-92-29). New York University, Stern School of
Business, Department of Economics.
Greene, W. H. (2003). Econometric analysis (5th ed.). Upper Saddle
River, NJ: Prentice Hall.
Grice, H. P. (1975). Logic and conversation. In P. Cole & J. L. Morgan
(Eds.), Syntax and semantics 3: Speech acts (pp. 41–58). New
York: Academic Press.
Griffin, D. & Tversky, A. (1992). The weighting of evidence and the
determinants of confidence. Cognitive Psychology, 24, 411–435.
Griffiths, M. D. & Parke, J. (2003a). The environmental psychology
of gambling. In G. Reith (Ed.), Gambling: Who wins? Who loses?
(pp. 277–292). Amherst, NY: Prometheus Books.
Griffiths, M. D. & Parke, J. (2003b). The psychology of the fruit machine.
Psychology Review, 9, 12–16.
Griffiths, T. L. & Tenenbaum, J. B. (2006). Optimal predictions in every-
day cognition. Psychological Science, 17, 767–773.
Griffiths, W. E., Hill, R. C., & Judge, G. G. (1993). Learning and practic-
ing econometrics. New York: Wiley.
Grimes, D. A. & Snively, G. R. (1999). Patients’ understanding of
medical risks: Implications for genetic counseling. Obstetrics and
Gynecology, 93, 910–914.
Groffman, B. & Owen, G. (1986). Condorcet models, avenues for future
research. In B. Groffman & G. Owen (Eds.), Information pooling and
group decision making (pp. 93–102). Greenwich, CT: JAI Press.
Groß, R., Houston, A. I., Collins, E. J., McNamara, J. M., Dechaume-
Moncharmont, F.-X., & Franks, N. R. (2008). Simple learning rules
to cope with changing environments. Journal of the Royal Society
Interface, 5, 1193–1202.
Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C.
(2000). Clinical versus mechanical prediction: A meta-analysis.
Psychological Assessment, 12, 19–30.
Gurmankin, A. D., Baron, J., & Armstrong, K. (2004). The effect of numeri-
cal statements of risk on trust and comfort with hypothetical physi-
cian risk communication. Medical Decision Making, 24, 265–271.
Guttman, L. (1944). A basis for scaling qualitative data. American
Sociological Review, 9, 139–150.
Hacking, I. (2003). On drawing trees: Logical, genealogical, biologi-
cal,. . . Presentation at the Institute for the History & Philosophy of
Science & Technology, University of Toronto.
518 REFERENCES
Hallowell, N., Statham, H., Murton, F., Green, J., & Richards, M. (1997).
“Talking about chance”: The presentation of risk information
during genetic counseling for breast and ovarian cancer. Journal of
Genetic Counseling, 6, 269–286.
Hamilton, D. L. & Gifford, R. K. (1976). Illusory correlation in inter-
personal perception: A cognitive basis of stereotypic judgments.
Journal of Experimental Social Psychology, 12, 392–407.
Hamilton, D. L. & Sherman, S. J. (1989). Illusory correlations: Implications
for stereotype theory and research. In D. Bar-Tal, C. F. Graumann,
A. W. Kruglanski, & W. Stroebe (Eds.), Stereotype and prejudice:
Changing conceptions (pp. 59–82). New York: Springer.
Hamm, R. M. & Smith, S. L. (1998). The accuracy of patients’ judge-
ments of disease probability and test sensitivity and specificity.
Journal of Family Practice, 47, 44–52.
Hammond, K. R. & Wascoe, N. E. (1980). Realizations of Brunswik’s
representative design. New Directions for Methodology of Social
and Behavioral Science, 3, 271–312.
Hann, A. (1999). Propaganda versus evidence based health promo-
tion: The case of breast screening. International Journal of Health
Planning and Management, 14, 329–334.
Hansell, M. (2005). Animal architecture. New York: Oxford University
Press.
Harrigan, K. A. (2007). Slot machine structural characteristics: Distorted
player views of payback percentages. Journal of Gambling Issues,
20, 215–234.
Harrigan, K. A. (2008). Slot machine structural characteristics: Creating
near misses using high award symbol ratios. International Journal
of Mental Health and Addiction, 6, 353–368.
Hasher, L., Goldstein, D., & Toppino, T. (1977). Frequency and the
conference of referential validity. Journal of Verbal Learning and
Verbal Behavior, 16, 107–112.
Hasher, L. & Zacks, R. T. (1984). Automatic processing of fundamen-
tal information: The case of frequency of occurrence. American
Psychologist, 39, 1372–1388.
Hasson, U., Simmons, J. P., & Todorov, A. (2005). Believe it or not:
On the possibility of suspending belief. Psychological Science, 16,
566–571.
Hastie, R. & Kameda, T. (2005). The robust beauty of majority rules in
group decisions. Psychological Review, 112, 494–508.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical
learning: Data mining, inference, and prediction. New York: Springer.
Hauser, M. D., Feigenson, L., Mastro, R. G., & Carey, S. (1999). Non-
linguistic number knowledge: Evidence of ordinal representations
in human infants and rhesus macaques. Poster presented at the
Society for Research in Child Development, Albuquerque, NM.
Hausmann, D. (2004). Informationssuche im Entscheidungsprozess
[Information search in the decision process]. Unpublished doc-
toral dissertation, University of Zürich, Switzerland.
REFERENCES 519
Hausmann, D., Läge, D., Pohl, R., & Bröder, A. (2007). Testing the
QuickEst: No evidence for the quick-estimation heuristic. European
Journal of Cognitive Psychology, 19, 446–456.
Hayek, F. (1945). The use of knowledge in society. American Economic
Review, 35, 519–530.
Heilbrun, K., Philipson, J., Berman, L., & Warren, J. (1999). Risk communi-
cation: Clinicians’ reported approaches and perceived values. Journal
of the American Academy of Psychiatry and Law, 27, 397–406.
Heller, R. F., Sandars, J. E., Patterson, L., & McElduff, P. (2004). GP’s
and physicians’ interpretation of risks, benefits and diagnostic test
results. Family Practice, 21, 155–159.
Helversen, B. von, & Rieskamp, J. (2008). The mapping model: A cog-
nitive theory of quantitative estimation. Journal of Experimental
Psychology: General, 137, 73–79.
Henrich, J. & Gil-White, F. J. (2001). The evolution of prestige: Freely
conferred deference as a mechanism for enhancing the benefits
of cultural transmission. Evolution and Human Behavior, 22,
165–169.
Hertel, G., Kerr, N. L., & Messe, L. A. (2000). Motivation gains in per-
formance groups: Paradigmatic and theoretical developments on
the Koehler effect. Journal of Personality and Social Psychology,
79, 580–601.
Hertwig, R., Davis, J. R., & Sulloway, F. J. (2002). Parental investment:
How an equity motive can produce inequality. Psychological
Bulletin, 128, 728–745.
Hertwig, R. & Gigerenzer, G. (1999). The “conjunction fallacy” revis-
ited: How intelligent inferences look like reasoning errors. Journal
of Behavioral Decision Making, 12, 275–305.
Hertwig, R., Gigerenzer, G., & Hoffrage, U. (1997). The reiteration effect
in hindsight bias. Psychological Review, 104, 194–202.
Hertwig, R., Herzog, S. M., Schooler, L. J., & Reimer, T. (2008). Fluency
heuristic: A model of how the mind exploits a by-product of infor-
mation retrieval. Journal of Experimental Psychology: Learning,
Hertwig, R., Hoffrage, U., & the ABC Research Group. (in press). Simple
heuristics in a social world. New York: Oxford University Press.
Hertwig, R., Hoffrage, U., & Martignon, L. (1999). Quick estimation:
Letting the environment do some of the work. In G. Gigerenzer,
P. M. Todd, & the ABC Research Group, Simple heuristics that
make us smart (pp. 209–234). New York: Oxford University Press.
Hertwig, R., Pachur, T., & Kurzenhäuser, S. (2005). Judgments of risk
frequencies: Tests of possible cognitive mechanisms. Journal of
621–642.
Hertwig, R. & Todd, P. M. (2003). More is not always better: The benefits
of cognitive limits. In D. Hardman and L. Macchi (Eds.), Thinking:
Psychological perspectives on reasoning, judgment and decision
making (pp. 213–231). Chichester, UK: Wiley.
520 REFERENCES
Herzog, S. M., & Hertwig, R. (in press). The ecological validity of flu-
ency. In C. Unkelbach & R. Greifeneder (Eds.), The experience of
thinking. London: Psychology Press.
Hey, J. D. (1982). Search for rules for search. Journal of Economic
Behavior and Organization, 3, 65–81.
Hibbard, J. H. & Peters, E. (2003). Supporting informed consumer
health care decisions: Data presentation approaches that facilitate
the use of information in choice. Annual Review of Public Health,
24, 413–433.
Hilgard, E. R. & Bower, G. H. (1975). Theories of learning (4th ed.).
Englewood Cliffs, NJ: Prentice-Hall.
Hinsz, V. B., Tindale, R. S., & Vollrath, D. A. (1997). The emerging con-
ceptualization of groups as information processors. Psychological
Bulletin, 121, 43–64.
Hintzman, D. L. (1990). Human learning and memory: Connections
and dissociations. Annual Review of Psychology, 41, 109–139.
Hintzman, D. L. & Curran, T. (1994). Retrieval dynamics of recogni-
tion and frequency judgments: Evidence for separate processes of
familiarity and recall. Journal of Memory and Language, 33, 1–18.
Hoffrage, U. (2008). Skewed information structures. Working paper,
University of Lausanne.
Hoffrage, U. (2011). Recognition judgments and the performance of
the recognition heuristic depend on the size of the reference class.
Judgment and Decision Making, 6, 43–57.
Hoffrage, U. & Gigerenzer, G. (1998). Using natural frequencies to
improve diagnostic inferences. Academic Medicine, 73, 538–540.
Hoffrage, U., Gigerenzer, G., Krauss, S., & Martignon, L. (2002).
Representation facilitates reasoning: What natural frequencies are
and what they are not. Cognition, 84, 343–352.
Hoffrage, U. & Hertwig, R. (2006). Which world should be repre-
sented in representative design? In K. Fiedler & P. Juslin (Eds.),
Information sampling and adaptive cognition (pp. 381–408). New
York: Cambridge University Press.
Hoffrage, U., Hertwig, R., & Gigerenzer, G. (2000). Hindsight bias: A
by-product of knowledge updating? Journal of Experimental
Psychology: Learning, Memory, and Cognition, 26, 566–581.
Hoffrage, U., Lindsey, S., Hertwig, R., & Gigerenzer, G. (2000).
Communicating statistical information. Science, 290, 2261–2262.
Hofstee, W. K. B. (1984). Methodological decision rules as research
policies: A betting reconstruction of empirical research. Acta
Hogarth, R. M. (1974). Process tracing in clinical judgment. Behavioral
Science, 19, 298–313.
Hogarth, R. M. (1978). A note on aggregating opinions. Organizational
Hogarth, R. M. (1981). Beyond discrete biases: Functional and dysfunc-
tional aspects of judgmental heuristics. Psychological Bulletin, 90,
197–217.
REFERENCES 521
Hogarth, R. M. (1987). Judgement and choice (2nd ed.). Chichester,

England: Wiley.
Hogarth, R. M. & Karelaia, N. (2005a). Ignoring information in binary
choice with continuous variables: When is less “more”? Journal of
Mathematical Psychology, 49, 115–124.
Hogarth, R. M. & Karelaia, N. (2005b). Simple models for multi-attri-
bute choice with many alternatives: When it does and does not pay
to face tradeoffs with binary attributes. Management Science, 51,
1860–1872.
Hogarth, R. M. & Karelaia, N. (2006a). Regions of rationality: Maps for
bounded agents. Decision Analysis, 3, 124–144.
Hogarth, R. M. & Karelaia, N. (2006b). Take-the best and other simple
strategies: Why and when they work “well” in binary choice.
Theory and Decision, 61, 205–249.
Hogarth, R. M. & Karelaia, N. (2007). Heuristic and linear models
of judgment: Matching rules and environments. Psychological
Review, 114, 733–758.
Hollingshead, A. B. (1996). The rank-order effect in group decision
making. Organizational Behavior and Human Decision Processes,
68, 181–193.
Holt, R. R. (1958). Clinical and statistical prediction: A reformulation
and some new data. Journal of Abnormal and Social Psychology,
56, 1–12.
Holt, R. R. (1962). Individuality and generalization in the psychology
of personality: A theoretical rationale for personality assessment
and research. Journal of Personality, 30, 405–422.
Holt, R. R. (2004). A few dissents from a magnificent piece of work.
Applied & Preventive Psychology, 11, 43–44.
Holte, R. C. (1993). Very simple classification rules perform well on
most commonly used datasets. Machine Learning, 11, 63–90.
Holzworth, R. J. (2001). Multiple cue probability learning. In K. R. Hammond
& T. R. Stewart (Eds.), The essential Brunswik: Beginnings, explica-
tions, applications (pp. 348–350). New York: Oxford University Press.
Horwich, P. (1982). Probability and evidence. Cambridge: Cambridge
University Press.
Howe, C. Q. & Purves, D. (2005). Perceiving geometry: Geometrical illu-
sions explained by natural scene statistics. New York: Springer.
Howson, C. & Urbach, P. (1989). Scientific reasoning: The Bayesian
approach. La Salle, IL: Open Court.
Huberman, G. & Jiang, W. (2006). Offering vs. choice in 401(k) plans:
Equity exposure and number of funds. Journal of Finance, 61,
763–801.
Hurwitz, B. (2004). How does evidence based guidance influence
determinations of medical negligence? British Medical Journal,
329, 1024–1028.
Hutchinson, J. M. C. & Gigerenzer, G. (2005). Simple heuristics and
rules of thumb: Where psychologists and behavioural biologists
might meet. Behavioural Processes, 69, 97–124.
522 REFERENCES
Hutchinson, J. M. C. & Halupka, K. (2004). Mate choice when males are

in patches: Optimal strategies and good rules of thumb. Journal of
Theoretical Biology, 231, 129–151.
Hutchinson, J. M. C., McNamara, J. M., & Cuthill, I. C. (1993). Song,
sexual selection, starvation and strategic handicaps. Animal
Behaviour, 45, 1153–1177.
Jacoby, L. L. & Brooks, L. R. (1984). Nonanalytic cognition: Memory,
perception and concept learning. In G. H. Bower (Ed.), Psychology
of learning and motivation (Vol. 18, pp. 1–47). New York: Academic
Press.
Jacoby, L. L., Kelley, C., Brown, J., & Jasechko, J. (1989). Becoming
famous overnight: Limits on the ability to avoid unconscious influ-
ences of the past. Journal of Personality and Social Psychology, 56,
326–338.
Jäger, A. O., Süß, H.-M., & Beauducel, A. (1997). Berliner Intelligenz-
Struktur-Test. Göttingen, Germany: Hogrefe.
Jain, B. P., McQuay, H., & Moore, A. (1998). Number needed to treat and
relative risk reduction. Annals of Internal Medicine, 128, 72–73.
James, W. (1890). The principles of psychology (Vol. 1). New York:
Holt.
Janis, I. L. (1982). Victims of groupthink. Boston, MA: Houghton Mifflin.
Jemal, A., Siegel, R., Ward, E., Murray, T., Xu, J., Smigal, C., et al.
(2006). Cancer statistics, 2006. CA Cancer Journal for Clinicians,
56, 106–130.
Jepson, R. G., Forbes, C. A., Sowden, A. J., & Lewis, R. A. (2001).
Increased informed uptake and non-uptake of screening: Evidence
from a systematic review. Health Expectations, 4, 116–130.
Johnson, E. J. & Goldstein, D. G. (2003). Do defaults save lives? Science,
302, 1338–1339.
Johnson, E. J. & Payne, J. W. (1985). Effort and accuracy in choice.
Management Science, 31, 395–414.
Johnson, E. J., Schulte-Mecklenbeck, M., & Willemsen, M. C. (2008).
Process models deserve process data: A comment on Brandstätter,
Gigerenzer, and Hertwig (2006). Psychological Review, 115,
263–273.
Johnson, J. & Raab, M. (2003). Take the first: Option generation and
resulting choices. Organizational Behavior and Human Decision
Processes, 91, 215–229.
Johnson, M. K., Hastroudi, S., & Lindsay, D. S. (1993). Source monitor-
ing. Psychological Bulletin, 114, 3–28.
Johnson, M. P. & Raven, P. H. (1973). Species number and endemism:
The Galapagos archipelago revisited. Science, 179, 893–895.
Johnston, J. (1991). Econometric methods (3rd ed.). New York: McGraw-
Hill.
Jorland, G. (1987). The Saint Petersburg paradox 1713–1937. In L.
Krüger, G. Gigerenzer, & M. S. Morgan (Eds.), The probabilistic
revolution, Vol. 1. Ideas in the sciences (pp. 157–190). Cambridge,
MA: MIT Press.
REFERENCES 523
Juslin, P. (1994). The overconfidence phenomenon as a consequence

of informal experimenter-guided selection of almanac items.
226–246.
Juslin, P. & Olsson, H. (2005). Capacity limitations and the detection of
correlations: A comment on Kareev (2000). Psychological Review,
112, 256–267.
Juslin, P., Olsson, H., & Björkman, M. (1997). Brunswikian and
Thurstonian origins of bias in probability assessment: On the
origin and nature of stochastic components of judgment. Journal
of Behavioral Decision Making, 10, 189–209.
Juslin, P., Olsson, H., & Olsson, A.-C. (2003). Exemplar effects in cat-
egorization and multiple-cue judgment. Journal of Experimental
Juslin, P., Olsson, H., & Winman, A. (1998). The calibration issue:
Theoretical comments on Suantak, Bolger, and Ferrell (1996).
Organizational Behavior and Human Decision Processes, 73, 3–26.
Juslin, P. & Persson, M. (2002). PROBabilities from EXemplars
(PROBEX): A “lazy” algorithm for probabilistic inference from
generic knowledge. Cognitive Sciences, 26, 563–607.
Juslin, P., Winman, A., & Olsson, H. (2000). Naive empiricism and dog-
matism in confidence research: A critical examination of the hard-
easy effect. Psychological Review, 107, 384–396.
Jussim, L. (1991). Social perception and social reality: A reflection-
construction model. Psychological Review, 98, 54–73.
Kahneman, D. (2003). A perspective on judgement and choice: Mapping
bounded rationality. American Psychologist, 58, 697–720.
Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1982). Judgment
under uncertainty: Heuristics and biases. Cambridge: Cambridge
University Press.
Kahneman, D. & Tversky, A. (1973). On the psychology of prediction.
Kahneman, D. & Tversky, A. (1982). Subjective probability: A judgment
of representativeness. In D. Kahneman, P. Slovic, & A. Tversky
(Eds.), Judgment under uncertainty: Heuristics and biases (pp. 32–
47). Cambridge: Cambridge University Press.
Kahneman, D. & Tversky, A. (1996). On the reality of cognitive illusion.
Kao, S.-F. & Wasserman, E. A. (1993). Assessment of an information
integration account of contingency judgment with examination of
subjective cell importance and method of information presenta-
tion. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 19, 1363–1386.
Kareev, Y. (2000). Seven (indeed, plus or minus two) and the detection
of correlations. Psychological Review, 107, 397–402.
Karelaia, N. (2006). Thirst for confirmation in multi-attribute choice:
Does search for consistency impair decision performance? Orga-
nizational Behavior and Human Decision Processes, 100, 128–143.
524 REFERENCES
Karelaia, N. & Hogarth, R. M. (2006). On predicting performance of

DEBA models in the presence of error. Barcelona: Universitat
Pompeu Fabra.
Katsikopoulos, K. V. & Fasolo, B. (2006). New tools for decision analysts.
IEEE Transactions on Systems, Man, and Cybernetics: Systems and
Humans, 36, 960–967.
Katsikopoulos, K. V. & Gigerenzer, G. (2008). One-reason decision-
making: Modeling violations of expected utility theory. Journal of
Risk and Uncertainty, 37, 35–56.
Katsikopoulos, K. V. & Martignon, L. (2006). Naive heuristics for paired
comparisons: Some results on their relative accuracy. Journal of
Katsikopoulos, K. V., Pachur, T., Machery, E., & Wallin, A. (2008). From
Meehl to fast and frugal heuristics (and back): New insights on how
to bridge the clinical–actuarial divide. Theory and Psychology, 18,
443–464.
Keeney, R. L. & Raiffa, H. (1993). Decisions with multiple objectives:
Preferences and value tradeoffs. Cambridge: Cambridge University
Press.
Keller, C. & Siegrist, M. (2009). Effect of risk communication formats
on risk perception depending on numeracy. Medical Decision
Making, 29, 483–490.
Kelley, C. M. & Jacoby, L. L. (1998). Subjective reports and process dis-
sociation: Fluency, knowing, and feeling. Acta Psychologica, 98,
127–140.
Kelley, C. M. & Lindsay, D. S. (1993). Remembering mistaken for know-
ing: Ease of retrieval as a basis for confidence in answers to general
knowledge questions. Journal of Memory and Language, 32, 1–24.
Keppel, G. (1967). A reconsideration of the extinction-recovery theory.
Journal of Verbal Learning and Verbal Behavior, 6, 476–486.
Keren, G. (1997). On the calibration of probability judgments: Some crit-
ical comments and alternative perspectives. Journal of Behavioral
Decision Making, 10, 269–278.
Kerlikowske, K., Grady, D., Barclay, J., Sickles, E. A., & Ernster, V. (1996).
Effect of age, breast density, and family history on the sensitivity of
first screening mammography. Journal of the American Medical
Association, 276, 33–38.
Keykhah, M. (2002). Catastrophic risk and reinsurance: Financial
decision making for the catastrophe society. Unpublished manu-
script, School of Geography and the Environment, University of
Oxford.
Kiso, T. (2004). History of slot machines. Retrieved February 3, 2005,
from http://gaming.unlv.edu/research/subject/slot_history.html.
Klauer, K. C. & Meiser, T. (2000). A source-monitoring analyses of illu-
sory correlations. Personality and Social Psychology Bulletin, 26,
1074–1093.
Klayman, J. (1995). Varieties of confirmation bias. Psychology of
Learning and Motivation, 32, 385–418.
REFERENCES 525
Klayman, J. & Ha, Y.-W. (1987). Confirmation, disconfirmation, and infor-

mation in hypothesis testing. Psychological Review, 94, 211–228.
Kleffner, D. A. & Ramachandran, V. S. (1992). On the perception of
shape from shading. Perception and Psychophysics, 52, 18–36.
Kleinmuntz, B. (1990). Why we still use our heads instead of formulas.
Kleiter, G. D. (1994). Natural sampling: Rationality without base rates.
In G. H. Fischer & D. Laming (Eds.), Contributions to mathematical
psychology, psychometrics, and methodology (pp. 375–388). New
York: Springer.
Knowles, G., Sherony, K., & Haupert, M. (1992). The demand for major
league baseball: A test of the uncertainty of outcome hypothesis.
American Economist, 36, 72–80.
Koehler, J. J. (1996a). The base rate fallacy reconsidered: Descriptive,
normative, and methodological challenges. Behavioral and Brain
Sciences, 19, 1–53.
Koehler, J. J. (1996b). On conveying the probative value of DNA evi-
dence: Frequencies, likelihood ratios, and error rates. University of
Colorado Law Review, 67, 859–886.
Kohli, R. & Jedidi, K. (2007). Representation and inference of lexico-
graphic preference models and their variants. Marketing Science,
26, 380–399.
Koriat, A., Goldsmith, M., & Pansky, A. (2000). Toward a psychology of
memory accuracy. Annual Review of Psychology, 51, 481–537.
Koriat, A., Lichtenstein, S., & Fischhoff, B. (1980). Reasons for confi-
dence. Journal of Experimental Psychology: Human Learning and
Memory, 6, 107–118.
Koriat, A. & Ma’ayan, H. (2005). The effects of encoding fluency and
retrieval fluency on judgments of learning. Journal of Memory and
Language, 52, 478–492.
Krauss, S., Martignon, L., & Hoffrage, U. (1999). Simplifying Bayesian
inference: The general case. In L. Magnani, N. Nersessian, &
P. Thagard (Eds.), Model-based reasoning in scientific discovery
(pp. 165–179). New York: Plenum Press.
Kreps, D. M. (1990). Game theory and economic modelling. Oxford:
Clarendon Press.
Kroll, L. (2008, March 5). World’s billionaires [Electronic version].
Forbes. Retrieved from http://www.forbes.com/2008/03/05/richest-
billionaires-people-billionaires08-cx_lk_0305intro.html.
Krosnick, J. A. & Alwin, D. F. (1987). An evaluation of a cognitive theory
of response-order effects in survey measurement. Public Opinion
Quarterly, 51, 201–219.
Krueger, J. & Mueller, R. A. (2002). Unskilled, unaware, or both? The
better-than-average heuristic and statistical regression predict
errors in estimates of own performance. Journal of Personality and
Social Psychology, 82, 180–188.
Krugman, P. R. (1996). The self-organizing economy. Cambridge, MA:
Blackwell.
526 REFERENCES
Kuhl, J. (1994). Action versus state orientation: Psychometric proper-

ties of the Action Control Scale (ACS-90). In J. Kuhl & J. Beckmann
(Eds.), Volition and personality. Action versus state orientation
(pp. 47–59). Göttingen, Germany: Hogrefe & Huber.
Kukla, A. (1993). The structure of self-fulfilling and self-negating
prophecies. Theory and Psychology, 4, 5–33.
Kurzenhäuser, S. (2003). Welche Informationen vermitteln deutsche
Gesundheitsbroschüren über die Screening-Mammographie? [What
information is provided in German health information pamphlets
on mammography screening?] Zeitschrift für ärztliche Fortbildung
und Qualitätssicherung, 97, 53–57.
Kurzenhäuser, S. & Hoffrage, U. (2002). Teaching Bayesian reasoning:
An evaluation of a classroom tutorial for medical students. Medical
Teacher, 24, 531–536.
Kurzenhäuser, S. & Lücking, A. (2004). Statistical formats in Bayesian
inference. In R. Pohl (Ed.), Cognitive illusions: A handbook on fal-
lacies and biases in thinking, judgment, and memory (pp. 61–77).
Hove, UK: Psychological Press.
Ladouceur, R. (1993). Causes of pathological gambling. In W. R.
Eadington & J. A. Cornelius (Eds.), Gambling behavior and prob-
lem gambling (pp. 333–336). Reno, NV: Institute for the Study of
Gambling and Commercial Gaming.
Ladouceur, R. & Dubé, D. (1997). Monetary incentive and erroneous
perceptions in American roulette. Psychology: A Journal of Human
Behavior, 34(3–4), 27–32.
Ladouceur, R., Dubé, D., Giroux, I., Legendre, N., & Gaudet, C. (1995).
Cognitive biases in gambling: American roulette and 6/49 lottery.
Journal of Social Behavior and Personality, 10, 473–479.
Läge, D., Hausmann, D., & Christen, S. (2005). Wie viel bezahlen für
eine valide Information? Suchkosten als limitierender Faktor
der Informationssuche. AKZ-Forschungsbericht Nr. 7. Zürich:
Angewandte Kognitionspsychologie.
Läge, D., Hausmann, D., Christen, S., & Daub, S. (2005). Was macht
einen “guten Cue” aus? Strategien der Informationssuche beim heu-
ristischen Entscheiden unter Unsicherheit. AKZ-Forschungsbericht
Nr. 5. Zürich: Angewandte Kognitionspsychologie.
Lakatos, I. (1978). The methodology of scientific research programmes.
Cambridge: Cambridge University Press.
Laland, K. (2001). Imitation, social learning, and preparedness as
mechanisms of bounded rationality. In G. Gigerenzer & R. Selten
(Eds.), Bounded rationality: The adaptive toolbox (pp. 233–247).
Lambos, C. & Delfabbro, P. (2007). Numerical reasoning ability and
irrational beliefs in problem gambling. International Gambling
Studies, 7, 157–171.
Landauer, T. K. (1986). How much do people remember? Some esti-
mates of the quantity of learned information in long-term memory.
Cognitive Science, 10, 477–493.
REFERENCES 527
Langer, E. J. (1982). The illusion of control. In D. Kahneman, P. Slovic,

& A. Tversky (Eds.), Judgment under uncertainty: Heuristics and
biases (pp. 231–238). Cambridge: Cambridge University Press.
Langley, P. (1995). Order effects in incremental learning. In P. Reimann
& H. Spada (Eds.), Learning in humans and machines: Towards an
interdisciplinary learning science (pp. 154–165). Oxford: Elsevier.
Larrick, R. & Soll, J. B. (2006). Intuitions about combining opinions:
Misappreciation of the averaging rule. Management Science, 52,
111–127.
Larson, J. R., Foster-Fishman, P. G., & Keys, C. B. (1994). Discussion
of shared and unshared information in decision-making groups.
Laughlin, P. R. & Ellis, A. L. (1986). Demonstrability and social com-
bination processes on mathematical intellective tasks. Journal of
Experimental Social Psychology, 22, 177–189.
Lee, M. D. (2006). A hierarchical Bayesian model of human decision-
making on an optimal stopping problem. Cognitive Science, 30,
555–580.
Lee, M. D. & Cummins, T. D. R. (2004). Evidence accumulation in
decision making: Unifying the “take the best” and the “rational”
models. Psychonomic Bulletin and Review, 11, 343–352.
Lee, M. D., Loughlin, N., & Lundberg, I. B. (2002). Applying one
reason decision making: The prioritization of literature searches.
Australian Journal of Psychology, 54, 137–143.
Lee, P. J. & Brown, N. R. (2004). The role of guessing and boundar-
ies on date estimation biases. Psychonomic Bulletin & Review, 11,
748–754.
Legato, F. (2004, March). The 20 greatest slot innovations: Monumental
ideas in the history of slots that changed the way we play today.
Strictly Slots, 54–60.
Lehman, S., Jackson, A. D., & Lautrup, B. E. (2006). Measures for mea-
sures. Nature, 444, 1003–1004.
Lemaire, R. (2006). Informed consent—A contemporary myth? Journal
of Bone and Joint Surgery, 88, 2–7.
Lerman, C., Trock, B., Rimer, B. K., Jepson, C., Brody, D., & Boyce, A.
(1991). Psychological side effects of breast cancer screening. Health
Levin, I. P., Wasserman, E. A., & Kao, S.-F. (1993). Multiple methods
for examining biased information use in contingency judgments.
228–250.
Levitt, S. D. & Dubner, S. J. (2005). Freakonomics: A rogue econo-
mist explores the hidden side of everything. New York: Harper
Collins.
Levy, M. & Solomon, S. (1997). New evidence for the power-law distri-
bution of wealth. Physica A, 242, 90–94.
Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of
probabilities: The state of the art to 1980. In D. Kahneman, P. Slovic,
528 REFERENCES
& A. Tversky (Eds.), Judgment under uncertainty: Heuristics and

biases (pp. 306–334). Cambridge: Cambridge University Press.
Lichtenstein, S., Gregory, R., Slovic, P., & Wagenaar, W. A. (1990). When
lives are in your hands: Dilemmas of the societal decision maker.
In R. M. Hogarth (Ed.), Insights in decision making: A tribute to
Hillel J. Einhorn (pp. 91–106).Chicago: University of Chicago
Press.
Lichtenstein, S., Slovic, P., Fischhoff, B., Layman, M., & Combs, B.
(1978). Judged frequency of lethal events. Journal of Experimental
Psychology: Human Learning and Memory, 4, 551–578.
Lindsey, S., Hertwig, R., & Gigerenzer, G. (2003). Communicating sta-
tistical DNA evidence. Jurimetrics: The Journal of Law, Science,
and Technology, 43, 147–163.
Lipe, M. G. (1990). A lens-model analysis of covariation research.
Journal of Behavioral Decision Making, 3, 47–59.
Lipkus, I. M. (2007). Numeric, verbal, and visual formats of convey-
ing health risks: Suggested best practices and future recommenda-
tions. Medical Decision Making, 27, 696–713.
Lipkus, I. M., Samsa, G., & Rimer, B. K. (2001). General performance
on a numeracy scale among highly educated samples. Medical
Lipsey, R. G. (1956). The general theory of the second best. Review of
Economic Studies, 24, 11–32.
Lipshitz, R. (2000). Two cheers for bounded rationality. Behavioral and
Brain Sciences, 23, 756–757.
Lloyd, A. J. (2001). The extent of patients’ understanding of the risk of
treatments. Quality in Health Care, 10(Suppl. 1), i14–i18.
Lloyd, A. J., Hayes, P. D., London, N. J. M., Bell, P. R. F., & Naylor, A.
R. (1999). Patients’ ability to recall risk associated with treatment
options. Lancet, 353, 645.
Locke, J. (1959). An essay concerning human understanding. (A. C.
Fraser, Ed.). New York: Dover. (Original work published 1690)
Logan, J. (1996). The critical mass. American Scientist, 84, 263–277.
Lopes, L. L. (1981). Decision making in the short run. Journal of Experi-
mental Psychology: Human Learning and Memory, 7, 377–385.
Lopes, L. L. (1984). Risk and distributional inequality. Journal of
10, 456–485.
Lopes, L. L. (1992). Risk perception and the perceived public. In D. W.
Bromley & K. Segerson (Eds.), The social response to environmen-
tal risk (pp. 57–73). Boston: Kluwer Academic.
Lopes, L. L. & Oden, G. D. (1991). The rationality of intelligence. In E.
Eels & T. Maruszewski (Eds.), Poznan studies in the philosophy of
the sciences and the humanities (Vol. 21, pp. 225–249). Amsterdam:
Rodopi.
Luce, R. D. (1980). Several possible measures of risk. Theory and
Decision, 12, 217–228.
REFERENCES 529
Luce, R. D. (2000). Fast, frugal, and surprisingly accurate heuristics.

Behavioral and Brain Sciences, 23, 757–758.
Luchins, A. S. (1942). Mechanization in problem solving. Psychological
Monographs, 54, 1–95.
Luchins, A. S. & Luchins, E. H. (1959). Rigidity of behavior: A varia-
tional approach to the effect of Einstellung. Eugene, OR: University
of Oregon Books.
Luria, A. R. (1968). The mind of a mnemonist. New York: Basic Books.
Lyman, P. & Varian, H. R. (2003). How much information? 2003.
Retrieved July 8, 2008, from http://www.sims.berkeley.edu/how-
much-info-2003.
Mackie, J. L. (1963). The paradox of confirmation. British Journal for
the Philosophy of Science, 13, 265–277.
MacQueen, J. & Miller, R. G., Jr. (1960). Optimal persistence policies.
Operations Research, 8, 362–380.
Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M.,
Lewandowski, R., et al. (1982). The accuracy of extrapolation (time
series) methods: Results of a forecasting competition. Journal of
Forecasting, 1, 111–153.
Makridakis, S., Chatfield, C., Hibon, M., Lawrence, M., Mills, T., Ord, K.,
et al. (1993). The M-2 competition: A real-time judgmentally based
forecasting study. International Journal of Forecasting, 9, 5–23.
Makridakis, S. & Hibon, M. (1979). Accuracy of forecasting: An empiri-
cal investigation (with discussion). Journal of the Royal Statistical
Society, Series A, 142, 97–145.
Makridakis, S. & Hibon, M. (2000). The M3-competition: Results, con-
clusions and implications. International Journal of Forecasting,
16, 451–476.
Mandel, D. R. & Lehman, D. R. (1998). Integration of contingency infor-
mation in judgments of cause, covariation, and probability. Journal
of Experimental Psychology: General, 127, 269–285.
Marewski, J. N., Gaissmaier, W., Dieckmann, A., Schooler, L. J.,
& Gigerenzer, G. (2005, August). Ignorance-based reasoning?
Applying the recognition heuristic to elections. Paper presented at
the 20th Biennial Conference on Subjective Probability, Utility and
Decision Making, Stockholm.
Marewski, J. N., Gaissmaier, W., Schooler, L. J., Goldstein, D. G., &
Gigerenzer, G. (2010). From recognition to decisions: Extending
and testing recognition-based models for multi-alternative infer-
ence. Psychonomic Bulletin & Review, 17, 287–309.
Marewski, J. N., & Schooler, L. J. (2011). Cognitive niches: An ecological
model of strategy selection. Psychological Review, 118, 393–437.
Markowitz, H. M. (1952). Portfolio selection. Journal of Finance, 7,
77–91.
Marr, D. (1982). Vision: A computational investigation into the human
representation and processing of visual information. San Francisco:
Freeman.
530 REFERENCES
Marshall, K. G. (1996). The ethics of informed consent for preventive

screening programs. Canadian Medical Association Journal, 155,
377–383.
Marteau, T. M. (1995). Towards informed decisions about prenatal test-
ing: A review. Prenatal Diagnosis, 15, 1215–1226.
Marteau, T. M. & Dormandy, E. (2001). Facilitating informed choice
in prenatal testing: How well are we doing? American Journal of
Medical Genetics, 106, 185–190.
Marteau, T. M., Saidi, G., Goodburn, S., Lawton, J., Michie, S., &
Bobrow, M. (2000). Numbers or words? A randomized controlled
trial of presenting screen negative results to pregnant women.
Prenatal Diagnosis, 20, 714–718.
Martignon, L. & Hoffrage, U. (1999). Why does one-reason deci-
sion making work? A case study in ecological rationality. In
G. Gigerenzer, P. M. Todd, & the ABC Research Group, Simple
heuristics that make us smart (pp. 119–140). New York: Oxford
University Press.
Martignon, L. & Hoffrage, U. (2002). Fast, frugal, and fit: Simple heuris-
tics for paired comparison. Theory and Decision, 52, 29–71.
Martignon, L., Katsikopoulos, K. V., & Woike, J. K. (2008). Categorization
with limited resources: A family of simple heuristics. Journal of
Martignon, L. & Laskey, K. B. (1999). Bayesian benchmarks for fast and
frugal heuristics. In G. Gigerenzer, P. M. Todd, & the ABC Research
Group, Simple heuristics that make us smart (pp. 169–188). New
Martignon, L., Vitouch, O., Takezawa, M., & Forster, M. (2003). Naïve and
yet enlightened: From natural frequencies to fast and frugal decision
trees. In D. Hardman & L. Macchi (Eds.), Thinking: Psychological
perspectives on reasoning, judgment, and decision making (pp.
189–211). Chichester, UK: Wiley.
Martin, A. & Moon, P. (1992). Purchasing decisions, partial knowledge,
and economic search: Experimental and simulation evidence.
Journal of Behavioral Decision Making, 5, 253–266.
Massachusetts Institute of Technology. (2003). The basics of designing
& facilitating meetings. Article and tools available from Department
of Human Resources, http://web.mit.edu/hr/oed/learn/meetings/
art_basics.html.
Mata, R., Schooler, L. J., & Rieskamp, J. (2007). The aging decision
maker: Cognitive aging and the adaptive selection of decision strat-
egies. Psychology & Aging, 22, 796–810.
Matter-Walstra, K. & Hoffrage, U. (2001). Individuelle Entschei-
dungsfindung am Beispiel der Brustkrebs-Früherkennung—
Erfahrungen aus Fokusgruppen in der Schweiz. [Individual
decision making concerning breast cancer screening—Observa-
tions with focus groups in Switzerland]. Schweizer Zeitschrift für
Managed Care und Care Management 3/01(5), 26–29.
REFERENCES 531
Mayseless, O. & Kruglanski, A. W. (1987). What makes you so sure? Effects

of epistemic motivations on judgmental confidence. Organizational
McBeath, M. K., Shaffer, D. M., & Kaiser, M. K. (1995). How baseball
outfielders determine where to run to catch fly balls. Science, 268,
569–573.
McCammon, I. & Hägeli, P. (2007). An evaluation of rule-based deci-
sion tools for travel in avalanche terrain. Cold Regions Science and
Technology, 47, 193–206.
McClelland, A. G. R. & Bolger, F. (1994). The calibration of subjec-
tive probability: Theories and models 1980–1994. In G. Wright &
P. Ayton (Eds.), Subjective probability (pp. 453–482). Chichester,
England: Wiley.
McCloy, R., Beaman, C. P., Frosch, C., & Goddard, K. (2010). Fast
and frugal framing effects? Journal of Experimental Psychology:
Learning, Memory and Cognition, 36, 1042–1052.
McKenzie, C. R. M. (1994). The accuracy of intuitive judgment strat-
egies: Covariation assessment and Bayesian inference. Cognitive
McKenzie, C. R. M. (1998). Taking into account the strength of an alter-
native hypothesis. Journal of Experimental Psychology: Learning,
McKenzie, C. R. M. (1999). (Non)Complementary updating of belief in
two hypotheses. Memory & Cognition, 27, 152–165.
McKenzie, C. R. M. (2004a). Framing effects in inference tasks—and why
they are normatively defensible. Memory & Cognition, 32, 874–885.
McKenzie, C. R. M. (2004b). Hypothesis testing and evaluation. In
D. J. Koehler & N. Harvey (Eds.), Blackwell handbook of judgment
and decision making (pp. 200–219). Oxford: Blackwell.
McKenzie, C. R. M. (2005). Judgment and decision making. In
K. Lamberts & R. L. Goldstone (Eds.), Handbook of cognition
(pp. 321–338). London: Sage.
McKenzie, C. R. M. (2006). Increased sensitivity to differentially diag-
nostic answers using familiar materials: Implications for confirma-
tion bias. Memory & Cognition, 34, 577–588.
McKenzie, C. R. M., Ferreira, V. S., Mikkelsen, L. A., McDermott, K.
J., & Skrable, R. P. (2001). Do conditional hypotheses target rare
events? Organizational Behavior and Human Decision Processes,
85, 291–309.
McKenzie, C. R. M. & Mikkelsen, L. A. (2000). The psychological side
of Hempel’s paradox of confirmation. Psychonomic Bulletin and
Review, 7, 360–366.
McKenzie, C. R. M. & Mikkelsen, L. A. (2007). A Bayesian view of cova-
riation assessment. Cognitive Psychology, 54, 33–61.
McKenzie, C. R. M., Wixted, J. T., Noelle, D. C., & Gyurjyan, G. (2001).
Relation between confidence in yes–no and forced-choice tasks.
Journal of Experimental Psychology: General, 130, 140–155.
532 REFERENCES
McQueen, M. J. (2002). Some ethical and design challenges of screening

programs and screening tests. Clinica Chimica Acta, 315, 41–48.
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical
analysis and a review of the evidence. Minneapolis: University of
Minnesota Press.
Mellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency rep-
resentations eliminate conjunction effects? Psychological Science,
12, 269–275.
Mellers, B. & McGraw, P. (1999). How to improve Bayesian reason-
ing without instruction: Comment on Gigerenzer and Hoffrage.
Menard, S. (2002). Applied logistic regression analysis (2nd ed.).
Thousand Oaks, CA: Sage.
Mennecke, B. E. (1997). Using group support systems to discover
hidden profiles: An examination of the influence of group size
and meeting structures on information sharing and decision
quality. International Journal of Human Computer Studies, 47,
387–405.
Merenstein, D. (2004). Winners and losers. JAMA: Journal of the
American Medical Association, 291, 15–16.
Metsch, L. R., McCoy, C. B., McCoy, H. V., Pereyra, M., Trapido, E., &
Miles, C. (1998). The role of the physician as an information source
of mammography. Cancer Practice, 6, 229–236.
Metzger, M. A. (1985). Biases in betting: An application of laboratory
findings. Psychological Reports, 56, 883–888.
Meyers, D. G. (1993). Social psychology (4th ed.). New York: McGraw
Hill.
Meyers-Levy, J. (1989). Gender differences in information process-
ing: A selectivity interpretation. In P. Cafferata & A. Tybout (Eds.),
Cognitive and affective responses to advertising (pp. 219–260).
Lexington, MA: Lexington Books.
Michalewicz, Z. & Fogel, D. (2000). How to solve it: Modern heuristics.
New York: Springer.
Miller, G. A. (1956). The magical number seven, plus or minus two: Some
limits on our capacity for processing information. Psychological
Review, 63, 81–97.
Miller, G. F. & Todd, P. M. (1998). Mate choice turns cognitive. Trends
in Cognitive Sciences, 2, 190–198.
Miller, N. V. & Currie, S. R. (2008). A Canadian population level analy-
sis of the roles of irrational gambling cognitions and risky gam-
bling practices as correlates of gambling intensity and pathological
gambling. Journal of Gambling Studies, 24, 257–274.
Mittelhammer, R. C., Judge, G. G., & Miller, D. J. (2000). Econometric
foundations. New York: Cambridge University Press.
Mobil Oil AG. (1997). Erdöl und Erdgas: Suchen, Fördern,
Verarbeiten [Brochure]. Hamburg: Mobil Oil AG, Abteilung für
Öffentlichkeitsarbeit.
REFERENCES 533
Monge, P. R., McSween, C., & Wyer, J. A. (1989). A profile of meet-

ings in corporate America: Results of the 3M meeting effectiveness
study. Los Angeles: University of Southern California.
Morgan, M. G. & Lave, L. (1990). Ethical considerations in risk commu-
nication practice and research. Risk Analysis, 10, 355–358.
Mosvick, R. K. & Nelson, R. (1987). We’ve got to start meeting like this!
A guide to successful business meeting management. Glenview, IL:
Scott, Foresman.
Mugford, S. T., Mallon, E. B., & Franks, N. R. (2001). The accuracy of
Buffon’s needle: A rule of thumb used by ants to estimate area.
Behavioral Ecology, 12, 655–658.
Mühlhauser, I. & Höldke, B. (1999). Übersicht: Mammographie-
Screening—Darstellung der wissenschaftlichen Evidenz-Grundlage
zur Kommunikation mit der Frau [Mammography screening—
Presentation of the scientific evidence base for communicating with
the woman]. Sonderbeilage arznei-telegramm, 10/99, 101–108.
Mullen, P. D., Allen, J. D., Glanz, K., Fernandez, M. E., Bowen, D. J.,
Pruitt, S. L., et al. (2006). Measures used in studies of informed
decision making about cancer screening: A systematic review.
Annals of Behavioral Medicine, 32, 188–201.
Musch, J., Brockhaus, R., & Bröder, A. (2002). Ein Inventar zur Erfassung
von zwei Faktoren sozialer Erwünschtheit. Diagnostica, 48,
121–129.
Mushlin, A. I., Kouides, R. M., & Shapiro, D. E. (1998). Estimating the
accuracy of screening mammography: A meta-analysis. American
Journal of Preventive Medicine, 14, 143–153.
Mynatt, C. R., Doherty, M. E., & Tweney, R. D. (1977). Confirmation bias
in a simulated research environment: An experimental study of
scientific inference. Quarterly Journal of Experimental Psychology,
29, 85–95.
Myung, I. J. & Pitt, M. A. (1997). Applying Occam’s razor in model-
ing cognition: A Bayesian approach. Psychonomic Bulletin and
Review, 4, 79–95.
Napoli, M. (1997). What do women want to know? Journal of the
National Cancer Institute Monographs, 22, 11–13.
Narula, S. C. & Wellington, J. W. (1977). Prediction, linear regres-
sion, and minimum sum of relative errors. Technometrics, 19,
185–190.
National Cancer Institute. (2005). Cancer trends progress report—2005
update. Retrieved January 23, 2007, from http://progressreport.
cancer.gov.
Nelson, J. D. (2005). Finding useful questions: On Bayesian diagnos-
ticity, probability, impact, and information gain. Psychological
Review, 112, 979–999.
Nerlove, M. (1963). Returns to scale in electricity supply. In C. F. Christ
(Ed.), Measurement in economics (pp. 167–200). Stanford, CA:
Stanford University Press.
534 REFERENCES
Nesselroade, J. R., Stigler, S. M., & Baltes, P. B. (1980). Regression

toward the mean and the study of change. Psychological Bulletin,
88, 622–637.
Nestor, B. (1999). The unofficial guide to casino gambling. New York:
Macmillan.
Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996).
Applied linear regression models. Chicago: Irwin.
Newbold, P. & Granger, C. W. J. (1974). Experience with forecasting
univariate time series and the combination of forecasts (with dis-
cussion). Journal of the Royal Statistical Society, Series A, 137,
131–165.
Newell, B. R. (2005). Re-visions of rationality. Trends in Cognitive
Sciences, 9, 11–15.
Newell, B. R. & Fernandez, D. (2006). On the binary quality of recogni-
tion and the inconsequentiality of further knowledge: Two critical
tests of the recognition heuristic. Journal of Behavioral Decision
Making, 19, 333–346.
Newell, B. R., Rakow, T., Weston, N. J., & Shanks, D. R. (2004). Search
strategies in decision making: The success of “success.” Journal of
Behavioral Decision Making, 17, 117–137.
Newell, B. R. & Shanks, D. R. (2003). Take the best or look at the rest?
Factors influencing “one-reason” decision-making. Journal of
53–65.
Newell, B. R. & Shanks, D. R. (2004). On the role of recognition in
decision making. Journal of Experimental Psychology: Learning,
Newell, B. R., Weston, N. J., & Shanks, D. R. (2003). Empirical tests
of a fast-and-frugal heuristic: Not everyone “takes-the-best.”
82–96.
Newman, M. E. J. (2005). Power laws, Pareto distributions and Zipf’s
law. Contemporary Physics, 46, 323–351.
Newstead, S. E. (2000). What is an ecologically rational heuristic?
Behavioral and Brain Sciences, 23, 759–760.
Nickerson, R. S. (1996). Hempel’s paradox and Wason’s selection task:
Logical and psychological puzzles of confirmation. Thinking and
Reasoning, 2, 1–31.
Nieder, A. & Dehaene, S. (2009). Representation of number in the brain.
Annual Review of Neuroscience, 32, 185–208.
Noble, J., Todd, P. M., & Tuci, E. (2001). Explaining social learning of
food preferences without aversions: An evolutionary simulation
model of Norway rats. Proceedings of the Royal Society of London
B: Biological Sciences, 268, 141–149.
Nosofsky, R. M. (1984). Choice, similarity, and the context theory of
classification. Journal of Experimental Psychology: Learning,
REFERENCES 535
Nosofsky, R. M. & Bergert, F. B. (2007). Limitations of exemplar models

of multi-attribute probabilistic inference. Journal of Experimental
Nosofsky, R. M. & Palmeri, T. J. (1997). Comparing exemplar-retrieval
and decision-bound models of speeded perceptual classification.
Perception & Psychophysics, 59, 1027–1048.
Nosofsky, R. M., & Palmeri, T. J. (1998). A rule-plus-exception model for
classifying objects in continuous-dimension spaces. Psychonomic
Bulletin and Review, 5, 345–369.
Nosofsky, R. M., Palmeri, T. J., & McKinley, S. C. (1994). Rule-plus-
exception model of classification learning. Psychological Review,
101, 53–79.
Nunnally, J. C. & Bernstein, I. H. (1994). Psychometric theory (3rd ed.).
New York: McGraw-Hill.
Nystroem, L., Larsson, L. G., Wall, S., Rutqvist, L. E., Andersson, I.,
Bjurstam, N., et al. (1996). An overview of the Swedish randomised
mammography trials: Total mortality pattern and the representiv-
ity of the study cohorts. Journal of Medical Screening, 3, 85–87.
Oaksford, M. (2000). Speed, frugality, and the empirical basis of take-
the-best. Behavioral and Brain Sciences, 23, 760–761.
Oaksford, M. & Chater, N. (1994). A rational analysis of the selection task
as optimal data selection. Psychological Review, 101, 608–631.
Oaksford, M. & Chater, N. (Eds.). (1998). Rational models of cognition.
Oxford: Oxford University Press.
O’Brien, D. P. (1993). Mental logic and human irrationality: We can put
a man on the moon, so why can’t we solve those logical-reason-
ing problems? In K. I. Manktelow & D. E. Over (Eds.), Rationality:
Psychological and philosophical perspectives (pp. 110–35).
London: Routledge.
Odling-Smee, F. J., Laland, K. N., & Feldman, M. W. (2003). Niche
construction: The neglected process in evolution. Princeton, NJ:
Princeton University Press.
Oliveira, M. (2005). Broken rationality: The ecological rationality of
simple inference heuristics. Unpublished doctoral dissertation,
University of Coimbra, Portugal.
Oppenheimer, D. M. (2003). Not so fast! (and not so frugal!): Rethinking
the recognition heuristic. Cognition, 90, B1–B9.
Oppenheimer, D. M. (2004). Spontaneous discounting of availability in
frequency judgment tasks. Psychological Science, 15, 100–105.
Ortmann, A., Gigerenzer, G., Borges, B., & Goldstein, D. G. (2008). The
recognition heuristic: A fast and frugal way to investment choice.
In C. R. Plott & V. L. Smith (Eds.), Handbook of experimental eco-
nomics results (Vol. 1, pp. 993–1003). Amsterdam, Netherlands:
Elsevier/North-Holland.
Over, D. E. & Jessop, A. (1998). Rational analysis of causal conditionals
and the selection task. In M. Oaksford & N. Chater (Eds.), Rational
models of cognition (pp. 399–414). Oxford: Oxford University Press.
536 REFERENCES
Oz, M. C., Kherani, A. R., Rowe, A., Roels, L., Crandall, C., Tomatis,
L., et al. (2003). How to improve organ donation: Results of the
ISHLT/FACT Poll. Journal of Heart and Lung Transplantation, 22,
389–410.
Pachur, T. (2010). Recognition-based inference: When is less more in
the real world? Psychonomic Bulletin and Review, 17, 589–598.
Pachur, T. (2011). The limited value of precise tests of the recognition
heuristic. Judgment and Decision Making, 6, 413–422.
Pachur, T. & Biele, G. (2007). Forecasting from ignorance: The use and
usefulness of recognition in lay predictions of sports events. Acta
Pachur, T., Bröder, A., & Marewski, J. N. (2008). The recognition heuris-
tic in memory-based inference: Is recognition a non-compensatory
cue? Journal of Behavioral Decision Making, 21, 183–210.
Pachur, T. & Hertwig, R. (2006). On the psychology of the recognition
heuristic: Retrieval primacy as a key determinant of its use. Journal
983–1002.
Pachur, T., Hertwig, R., & Rieskamp, J. (in press). The mind as an
intuitive pollster: Frugal search in social spaces. In R. Hertwig U.
Hoffrage, & the ABC Research Group, Simple heuristics in a social
world. New York: Oxford University Press.
Pachur, T., Mata, R., & Schooler, L. J. (2009). Cognitive aging and the
use of recognition in decision making. Psychology and Aging, 24,
901–915.
Pachur, T., Todd, P. M., Gigerenzer, G., Schooler, L. J., & Goldstein, D.
G. (2011). The recognition heuristic: A review of theory and tests.
Frontiers in Cognitive Science, 2, 147.
Paepke, S., Schwarz-Boeger, U., Minckwitz, G. von, Kaufmann, M.,
Schultz-Zehden, B., Beck, H., et al. (2001). Brustkrebsfrüherkennung—
Kenntnisstand und Akzeptanz in der weiblichen Bevölkerung. [Early
detection of breast cancer—Knowledge and acceptance in the female
population]. Deutsches Ärzteblatt, 98, 2178–2186.
Parducci, A. (1968). The relativism of absolute judgment. Scientific
American, 19, 84–90.
Pareto V. (1897). Cours d’économie politique. Lausanne: F. Rouge
& Cie.
Parke, J. & Griffiths, M. D. (2006). The psychology of the fruit machine:
The role of structural characteristics (revisited). International
Journal of Mental Health and Addiction, 4, 151–179.
Paulhus, D. L. (1984). Two-component models of socially desirable
responding. Journal of Personality and Social Psychology, 46,
598–609.
Paulus, P. B., Dugosh, K. L., Dzindolet, M. T., Coskun, H., & Putman,
V. L. (2002). Social and cognitive influences in group brainstorm-
ing: Predicting production gains and losses. In W. Stroebe &
M. Hewstone (Eds.), European review of social psychology (Vol.
12, pp. 299–325). London: Wiley.
REFERENCES 537
Payne, J. W. (1976). Task complexity and contingent processing in

decision making: An information search and protocol analysis.
Organizational Behavior and Human Performance, 16, 366–387.
Payne, J. W., Bettman, J. R., & Johnson, E. J. (1988). Adaptive strategy
selection in decision making. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 14, 534–552.
Payne, J. W., Bettman, J. R., & Johnson, E. J. (1993). The adaptive deci-
sion maker. Cambridge: Cambridge University Press.
Pearl, J. (2000). Causality: Models, reasoning and inference. Cambridge:
Cambridge University Press.
Perlich, C., Provost, F., & Simonoff, J. S. (2003). Tree-induction vs.
logistic regression: A learning curve analysis. Journal of Machine
Learning Research, 4, 211–255.
Persson, M. (2003). Decision strategies as adaptations to cue structures.
Unpublished manuscript, Uppsala University, Sweden.
Peters, E., Dieckmann, N., Dixon, A., Hibbard, J. H., & Mertz, C. K.
(2007). Less is more in presenting quality information to consum-
ers. Medical Care Research and Review, 64, 169–190.
Peters, E., Västfjäll, D., Slovic, P., Mertz, C. K., Mazzocco, K., & Dickert,
S. (2006). Numeracy and decision making. Psychological Science,
17, 407–413.
Peterson, C. R. & Beach, L. R. (1967). Man as an intuitive statistician.
Petrie, M. & Halliday, T. (1994). Experimental and natural changes
in the peacock’s (Pavo cristatus) train can affect mating success.
Behavioral and Ecological Sociobiology, 35, 213–217.
Pfeifer, P. E. (1994). Are we overconfident in the belief that probabil-
ity forecasters are overconfident? Organizational Behavior and
Phillips, K. A., Glendon, G., & Knight, J. A. (1999). Putting the risk of
breast cancer in perspective. New England Journal of Medicine,
340, 141–144.
Pichert, D. & Katsikopoulos, K. V. (2008). Green defaults: Information
presentation and pro-environmental behaviour. Journal of
Environmental Psychology, 28, 63–73.
Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a method of select-
ing among computational models of cognition. Psychological
Review, 109, 472–491.
Place, S. S., Todd, P. M., Penke, L., & Asendorpf, J. B. (2010). Humans
show mate copying after observing real mate choices. Evolution
and Human Behavior, 31, 320–325.
Pohl, R. (2006). Empirical tests of the recognition heuristic. Journal of
Behavioral Decision Making, 19, 251–271.
Poletiek, F. (2001). Hypothesis-testing behaviour. East Sussex, UK:
Psychology Press.
Popper, K. R. (1959). The logic of scientific inquiry. London: Hutchinson.
Pozen, M. W., D’Agostino, R. B., Selker, H. P., Sytkowski, P. A., &
Hood, W. B., Jr. (1984). A predictive instrument to improve
538 REFERENCES
coronary-care-unit admission practices in acute ischemic heart

disease. New England Journal of Medicine, 310, 1273–1278.
Priestley, M. B. (1979). Discussion of the paper by Professor Makridakis
and Dr. Hibon. Journal of the Royal Statistical Society, Series A,
142, 127–128.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo,
CA: Morgan Kaufmann.
Raaij, W. F. van. (1983). Techniques for process tracing in decision
making. In L. Sjöberg, T. Tyszka, & J. Wise (Eds.), Human decision
making (pp. 179–196). Bodafors, Sweden: Doxa.
Raffle, A. E. (2001). Information about screening—Is it to achieve
high uptake or to ensure informed choice? Health Expectations, 4,
92–98.
Rakow, T., Hinvest, N., Jackson, E., & Palmer, M. (2004). Simple heuris-
tics from the adaptive toolbox: Can we perform the requisite learn-
ing? Thinking and Reasoning, 10, 1–29.
Ratcliff, R. & Smith, P. L. (2004). A comparison of sequential sampling
models for two-choice reaction time. Psychological Review, 111,
333–367.
Real, L. & Caraco, T. (1986). Risk and foraging in stochastic environ-
ments. Annual Review of Ecological Systems, 17, 371–390.
Reber, R., Schwarz, N., & Winkielman, P. (2004). Processing fluency and
aesthetic pleasure: Is beauty in the perceiver’s processing experi-
ence? Personality and Social Psychology Review, 8, 364–382.
Reddy, R. (1988). Foundations and grand challenges of Artificial
Intelligence: AAAI presidential address. AI Magazine, 9, 9–21.
Regulations 2010 FIFA World Cup South Africa. (2010). Retrieved June
25, 2010, from http://www.fifa.com/mm/document/tournament/
competition/56/42/69/fifawcsouthafrica2010inhalt_e.pdf.
Reichert, S. E. & Hammerstein, P. (1983). Game theory in the ecological
context. Annual Review of Ecology, Evolution, and Systematics,
14, 377–409.
Reimer, T. (1999). Argumentieren und Problemloesen [Arguing and
problem solving]. Lengerich: Pabst Science.
Reimer, T., Bornstein, A.-L., & Opwis, K. (2005). Positive and nega-
tive transfer effects in groups. In T. Betsch & S. Haberstroh (Eds.),
Routine decision making (pp. 175–92). Mahwah, NJ: Erlbaum.
Reimer, T. & Hoffrage, U. (2005). Can simple group heuristics detect
hidden profiles in randomly generated environments? Swiss
Journal of Psychology, 64, 21–37.
Reimer, T. & Hoffrage, U. (2006). The ecological rationality of simple
group heuristics: Effects of group member strategies on decision
accuracy. Theory and Decision, 60, 403–438.
Reimer, T., Hoffrage, U., & Katsikopoulos, K. (2007). Entschei-
dungsheuristiken in Gruppen [Heuristics in group decision making].
NeuroPsychoEconomics, 2, 7–29.
Reimer, T. & Katsikopoulos, K. (2004). The use of recognition in group
decision-making. Cognitive Science, 28, 1009–1029.
REFERENCES 539
Reimer, T., Kuendig, S., Hoffrage, U., Park, E., & Hinsz, V. (2007). Effects
of the information environment on group discussions and decisions
in the hidden-profile paradigm. Communication Monographs, 74,
1–28.
Reimer, T., Reimer, A., & Hinsz, V. (2010). Naïve groups can solve the
hidden-profile problem. Human Communication Research, 36,
443–467.
Renner, B. (2004). Biased reasoning: Adaptive responses to health risk
feedback. Personality and Social Psychology Bulletin, 30, 384–
396.
Rice, J. A. (1995). Mathematical statistics and data analysis. Belmont,
CA: Duxbury Press.
Richter, T. & Späth, P. (2006). Recognition is used as one cue among
others in judgment and decision making. Journal of Experimental
Rieskamp, J. (1997). Die Verwendung von Entscheidungsstrategien
unter verschiedenen Bedingungen: Der Einfluß von Zeitdruck und
Rechtfertigung. [The use of decision strategies in different condi-
tions: Influence of time pressure and accountability]. Unpublished
diploma thesis, Technical University of Berlin.
Rieskamp, J. (2006). Perspectives of probabilistic inferences:
Reinforcement learning and an adaptive network compared. Journal
1355–1370.
Rieskamp, J. (2008). The importance of learning when making infer-
ences. Judgment and Decision Making, 3, 261–277.
Rieskamp, J. & Hoffrage, U. (1999). When do people use simple heu-
ristics, and how can we tell? In G. Gigerenzer, P. M. Todd, &
Rieskamp, J. & Hoffrage, U. (2008). Inferences under time pressure:
How opportunity costs affect strategy selection. Acta Psychologica,
127, 258–276.
Rieskamp, J. & Otto, P. E. (2006). SSL: A theory of how people learn
to select strategies. Journal of Experimental Psychology: General,
135, 207–236.
Rilling, M. & McDiarmid, C. (1965). Signal detection in fixed-ratio
schedules. Science, 148, 526–527.
Rimer, B. K., Halabi, S., Skinner, C. S., Lipkus, I., Strigo, T. S., Kaplan,
E. B., et al. (2002). Effects of mammography decision-making
intervention at 12 and 24 months. American Journal of Preventive
Medicine, 22, 247–257.
Rivest, R. (1976). On self-organizing sequential search heuristics.
Communications of the ACM, 19, 63–67.
Roberts, S. & Pashler, H. (2000). How persuasive is a good fit? A com-
ment on theory testing. Psychological Review, 107, 358–367.
Rodkin, D. (1995, February). 10 keys for creating top high schools.
Chicago, 78–85.
540 REFERENCES
Roitberg, B. D., Reid, M. L., & Li, C. (1993). Choosing hosts and mates:
The value of learning. In D. R. Papaj & A. C. Lewis (Eds.), Insect
learning: Ecological and evolutionary perspectives (pp 174–194).
New York: Chapman & Hall.
Romano, N. C., Jr. & Nunamaker, J. F., Jr. (2001). Meeting analysis: Findings
from research and practice. In R. H. Sprague (Ed.), Proceedings
of the 34th Hawaii International Conference on System Sciences
(Vol. 1, p. 1072). Los Alamitos, CA: IEEE Computer Society.
Rose, D. A. (2009, July 11). A better way to get a kidney. New York Times,
p. A19. (Also online at http://www.nytimes.com/2009/07/11/
opinion/11rose.html).
Rosenberg, R. D., Yankasas, B. C., Abraham, L. A., Sickles, E. A.,
Lehman, C. D., Geller, B. M., et al. (2006). Performance benchmarks
for screening mammography. Radiology, 241, 55–66.
Ross, L. (1977). The intuitive psychologist and his shortcomings:
Distortions in the attribution process. In L. Berkowitz (Ed.), Advances
in experimental social psychology (Vol. 10, pp. 173–220). New York:
Academic Press.
Rothman, A. J., Bartels, R. D., Wlaschin, J., & Salovey, P. (2006).
The strategic use of gain- and loss-framed messages to promote
healthy behavior: How theory can inform practice. Journal of
Communication, 56, S202–S220.
Rothman, A. J. & Salovey, P. (1997). Shaping perceptions to motivate
healthy behavior: The role of message framing. Psychological
Bulletin, 121, 3–19.
Rubinstein, A. (1980). Ranking the participants in a tournament. SIAM
Journal on Applied Mathematics, 38, 108–111.
Russo, J. E. & Dosher, B. A. (1983). Strategies for multiattribute binary
choice. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 9, 676–696.
Ruxton, G. D., & Beauchamp, G. (2008). The application of genetic
algorithms in behavioural ecology, illustrated with a model of
anti-predator vigilance. Journal of Theoretical Biology, 250,
435–448.
Saad, G., Eba, A. & Sejean, R. (2009). Sex differences when search-
ing for a mate: A process-tracing approach. Journal of Behavioral
Sackett, D. L. (1996). On some clinically useful measures of the effects
of treatment. Evidence-Based Medicine, 1, 37–38.
Salomon, I. (1986). Towards a behavioural approach to city centre park-
ing: The case of Jerusalem’s CBD. Cities, 3, 200–208.
Sarfati, D., Howden-Chapman, P., Woodward, A., & Salmond, C. (1998).
Does the frame affect the picture? A study into how attitudes to
screening for cancer are affected by the way benefits are expressed.
Journal of Medical Screening, 5, 137–140.
Sargent, T. J. (1993). Bounded rationality in macroeconomics. New
Savage, L. J. (1954). The foundations of statistics. New York: Wiley.
REFERENCES 541
Savage, L. J. (1972). The foundations of statistics (2nd rev. ed.). New

York: Dover.
Sawyer, J. (1966). Measurement and prediction: Clinical and statisti-
cal. Psychological Bulletin, 66, 178–200.
Saxberg, B. V. H. (1987). Projected free fall trajectories: I. Theory and
simulation. Biological Cybernetics, 56, 159–175.
Scaf-Klomp, W., Sandermann, R., Weil, H. B. M. van de, Otter, R., &
Heuvel, W. J. A. van den. (1997). Distressed or relieved? Psychological
side effects of breast cancer screening in the Netherlands. Journal of
Epidemiology and Community Health, 51, 705–710.
Schacter, D. L. (1999). The seven sins of memory: Insights from psy-
chology and cognitive neuroscience. American Psychologist, 54,
182–203.
Scheibehenne, B. & Bröder, A. (2007). Predicting Wimbledon 2005
tennis results by mere player name recognition. International
Journal of Forecasting, 23, 415–426.
Scheibehenne, B., Greifeneder, R., & Todd, P. M. (2010). Can there ever
be too many options? A meta-analytic review of choice overload.
Journal of Consumer Research, 37, 409–425.
Schittekatte, M. & Hiel, A. van. (1996). Effects of partially shared infor-
mation and awareness of unshared information on information
sampling. Small Group Research, 27, 431–449.
Schmidt, F. L. (1971). The relative efficiency of regression and simple
unit weighting predictor weights in applied differential psychol-
ogy. Educational and Psychological Measurement, 31, 699–714.
Schmitt, C. (2008, April 9). Auf Tour gegen Brustkrebs [On tour against
breast cancer]. Die Tageszeitung, p. 7.
Schmitt, M. & Martignon, L. (2006). On the complexity of learning
lexicographic strategies. Journal of Machine Learning Research, 7,
55–83.
Schooler, L. J. & Anderson, J. R. (1997). The role of process in the ratio-
nal analysis of memory, Cognitive Psychology, 32, 219–250.
Schooler, L. J. & Hertwig, R. (2005). How forgetting aids heuristic infer-
ence. Psychological Review, 112, 610–628.
Schroeder, M. (1991). Fractals, chaos, power laws: Minutes from an
infinite paradise. New York: Freeman.
Schumpeter, J. A. (1942). Capitalism, socialism, and democracy, New
York: Harper & Row.
Schustack, M. W. & Sternberg, R. J. (1981). Evaluation of evidence in
causal inference. Journal of Experimental Psychology: General,
110, 101–120.
Schwartz, L. M., Woloshin, S., Black, W. C., & Welch, G. (1997). The
role of numeracy in understanding the benefit of screening mam-
mography. Annals of Internal Medicine, 127, 966–972.
Schwartz, L. M., Woloshin, S., Sox, H. C., Fischhoff, B., & Welch, G.
(2000). US women’s attitudes to false positive mammography
results and detection of ductal carcinoma in situ: Cross-sectional
survey. British Medical Journal, 320, 1636–1640.
542 REFERENCES
Schwarz, N., Bless, H., Strack, F., Klumpp, G., Rittenauer-Schatka,

H., & Simons, A. (1991). Ease of retrieval as information: Another
look at the availability heuristic. Journal of Personality and Social
Schwarz, N. & Vaughn, L. A. (2002). The availability heuristic revis-
ited: Ease of recall and content of recall as distinct sources of infor-
mation. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics
and biases: The psychology of intuitive judgment (pp. 103–119).
New York: Cambridge University Press.
Schwarzer, R. & Jerusalem, M. (Eds.). (1999). Skalen zur Erfassung
von Lehrer- und Schülermerkmalen. Berlin: Free University
Berlin.
Schwing, R. C. & Kamerud, D. B. (1988). The distribution of risks: Vehicle
occupant fatalities and time of week. Risk Analysis, 8, 127–133.
Seale, D. A. & Rapoport, A. (1997). Sequential decision making with
relative ranks: An experimental investigation of the “secretary prob-
lem.” Organizational Behavior and Human Decision Processes, 69,
221–236.
Seale, D. A. & Rapoport, A. (2000). Optimal stopping behavior with
relative ranks: The secretary problem with unknown population
size. Journal of Behavioral Decision Making, 13, 391–411.
Sedlmeier, P. & Betsch, T. (Eds.). (2002). Etc. Frequency processing and
cognition. Oxford: Oxford University Press.
Sedlmeier, P. & Gigerenzer, G. (2001). Teaching Bayesian reasoning in
less than two hours. Journal of Experimental Psychology: General,
130, 380–400.
Sedlmeier, P., Hertwig, R., & Gigerenzer, G. (1998). Are judgments of the
positional frequencies of letters systematically biased due to avail-
ability? Journal of Experimental Psychology: Learning, Memory,
Selten, R. (2001). What is bounded rationality? In G. Gigerenzer & R.
Selten (Eds.), Bounded rationality: The adaptive toolbox (pp. 13–36).
Serwe, S. & Frings, C. (2006). Who will win Wimbledon? The recogni-
tion heuristic in predicting sports events. Journal of Behavioral
Shaffer, D. M., Krauchunas, S. M., Eddy, M., & McBeath, M. K. (2004).
How dogs navigate to catch Frisbees. Psychological Science, 15,
437–441.
Shaffer, D. M. & McBeath, M. K. (2005). Naive beliefs in baseball:
Systematic distortion in perceived time of apex for fly balls. Journal
1492–1501.
Shah, A. K. & Oppenheimer, D. M. (2008). Heuristics made easy: An
effort-reduction framework. Psychological Bulletin, 134, 207–222.
Shanks, D. R. & Lagnado, D. (2000). Sub-optimal reasons for rejecting
optimality. Behavioral and Brain Sciences, 23, 761–762.
REFERENCES 543
Shannon, C. (1948). A mathematical theory of communication. Bell

Systems Technical Journal, 27, 379–423, 623–656.
Shanteau, J. (1978). When does a response error become a judgmental
bias? Commentary on “Judged frequency of lethal events.” Journal
of Experimental Psychology: Human Learning and Memory, 4,
579–581.
Shanteau, J. (1992). How much information does an expert use? Is it
relevant? Acta Psychologica, 81, 75–86.
Shanteau, J. & Thomas, R. P. (2000). Fast and frugal heuristics: What
about unfriendly environments? Behavioral and Brain Sciences,
23, 762–763.
Shepard, R. N. (1987a). Evolution of mesh between principles of the
mind and regularities of the world. In J. Dupré (Ed.), The latest
on the best: Essays on evolution and optimality (pp. 251–275).
Shepard, R. N. (1987b). Toward a universal law of generalization for
psychological science. Science, 237, 1317–1323.
Shepard, R. N. (2001). Perceptual–cognitive universals as reflections of
the world. Behavioral and Brain Sciences, 24, 581–601. (Reprinted
from Psychonomic Bulletin and Review, 1, 2–28). (Original work
published 1994)
Shiller, R. J. (2000). Irrational exuberance. Princeton, NJ: Princeton
University Press.
Shiloh, S., Koren, S., & Zakay, D. (2001). Individual differences in com-
pensatory decision-making style and need for closure as correlates
of subjective decision complexity and difficulty. Personality and
Individual Differences, 30, 699–710.
Showers, J. L. & Chakrin, L. M. (1981). Reducing uncollectible revenues
from residential telephone customers. Interfaces, 11, 21–31.
Siegel-Jacobs, K. & Yates, J. F. (1996). Effects of procedural and out-
come accountability on judgment quality. Organizational Behavior
and Human Decision Processes, 65, 1–17.
Simon, H. A. (1955a). A behavioral model of rational choice. Quarterly
Journal of Economics, 69, 99–118.
Simon, H. A. (1955b). On a class of skew distribution functions.
Biometrika, 42, 425–440.
Simon, H. A. (1956). Rational choice and the structure of environments.
Psychological Review 63, 129–138.
Simon, H. A. (1978). Rationality as process and as product of thought.
American Economic Review, 68, 1–16.
Simon, H. A. (1979a). Information processing models of cognition.
Annual Review of Psychology, 30, 363–396.
Simon, H. A. (1979b). Rational decision making in business organiza-
tions. American Economic Review, 69, 493–513.
Simon, H. A. (1989). The scientist as problem solver. In D. Klahr & K.
Kotovsky (Eds.), Complex information processing: The impact of
Herbert A. Simon (pp. 375–397). Hillsdale, NJ: Elbaum.
544 REFERENCES
Simon, H. A. (1990). Invariants of human behavior. Annual Review of

Sivak, M., Soler, J., & Tränkle, U. (1989). Cross-cultural differences
in driver self-assessment. Accident Analysis & Prevention, 21,
371–375.
Skubisz, C., Reimer, T., & Hoffrage, U. (2009). Communicating sta-
tistical risk information. Communication Monographs, 33, 176–
211. (Published annually for the International Communication
Association, C. S. Beck, Ed., Vol. 33. New York: Routledge)
Slavin, R. E. (1995). Cooperative learning. Boston: Allyn & Bacon.
Slaytor, E. K. & Ward, J. E. (1998). How risks of breast cancer and ben-
efits of screening are communicated to women—Analysis of 58
pamphlets. British Medical Journal, 317, 263–264.
Sloman, S. A., Over, D., Slovak, L., & Stibel, J. M. (2003). Frequency
illusions and other fallacies. Organizational Behavior and Human
Slovic, P. (1987). Perception of risk. Science, 236, 280–285.
Slovic, P., Finucane, M., Peters, E., & MacGregor, D. G. (2002). The
affect heuristic. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.),
Heuristics and biases: The psychology of intuitive judgment
Slovic, P., Fischhoff, B., & Lichtenstein, S. (1982). Facts versus fears:
Understanding perceived risk. In D. Kahneman, P. Slovic, & A.
Tversky (Eds.), Judgment under uncertainty: Heuristics and biases
Slovic, P. & Lichtenstein, S. (1971). Comparison of Bayesian and
regression approaches to the study of information processing in
judgment. Organizational Behavior and Human Performance, 6,
649–744.
Slovic, P., Monahan, J., & MacGregor, D. G. (2000). Violence risk assess-
ment and risk communication: The effects of using actual cases,
providing instruction, and employing probability versus frequency
formats. Law and Human Behavior, 24, 271–296.
Smith, E. E. & Medin, D. L. (1981). Categories and concepts. Cambridge,
MA: Harvard University Press.
Smith, E. R. (1991). Illusory correlation in a simulated exemplar-
based memory. Journal of Experimental Social Psychology, 27,
107–123.
Smith, R. W. & Preston, F. W. (1984). Vocabularies of motives for gam-
bling behavior. Sociological Perspectives, 27, 325–348.
Smith, V. L. (2003). Constructivist and ecological rationality in eco-
nomics. American Economic Review, 93, 465–508.
Sniezek, J. A., Paese, P. W., & Switzer, F. S. (1990). The effect of choos-
ing on confidence in choice. Organizational Behavior and Human
Snook, B. & Cullen, R. M. (2006). Recognizing national hockey league
greatness with an ignorance-based heuristic. Canadian Journal of
Experimental Psychology, 60, 33–43.
REFERENCES 545
Snyder, M. (1984). When belief creates reality. In L. Berkowitz (Ed.),

Advances in experimental social psychology (Vol. 18, pp. 247–305).
New York: Academic Press.
Soll, J. B. (1996). Determinants of overconfidence and miscalibration:
The roles of random error and ecological structure. Organizational
Sorkin, R. D., Hays C. J., & West, R. (2001). Signal-detection analysis of
group decision making. Psychological Review, 108, 183–203.
Sorkin, R. D., West, R., & Robinson, D. E. (1998). Group performance
depends on the majority rule. Psychological Science, 9, 456–463.
Spector, L. C. & Mazzeo, M. (1980). Probit analysis and economic edu-
cation. Journal of Economic Education, 11, 37–44.
Squire, L. R. (1989). On the course of forgetting in very long-term
memory. Journal of Experimental Psychology: Learning, Memory
Stanovich, K. E. & West, R. F. (2000). Individual differences in reason-
ing: Implications for the rationality debate? Behavioral and Brain
Sciences, 23, 645–665.
Stasser, G. (1992). Information salience and the discovery of hidden profiles
by decision-making groups: A “thought experiment.” Organizational
Behavior and Human Decision Making, 52, 156–181.
Stasser, G. & Birchmeier, Z. (2003). Group creativity and collective
choice. In P. B. Paulus & B. A. Nijstad (Eds.), Group creativity
Stasser, G., Stewart, D. D., & Wittenbaum, G. M. (1995). Expert roles and
information exchange during discussion: The importance of know-
ing who knows what. Journal of Experimental Social Psychology,
31, 244–265.
Stasser, G., Taylor, L. A., & Hanna, C. (1989). Information sampling in
structured and unstructured discussions of three- and six-person
groups. Journal of Personality and Social Psychology, 57, 67–78.
Stasser, G. & Titus, W. (1985). Pooling of unshared information in group
decision making: Biased information sampling during discussion.
Staudinger, U. M. & Lindenberger, U. E. R. (2003). (Eds.). Understanding
human development: Lifespan psychology in exchange with other
disciplines. Dordrecht: Kluwer.
Steenbergh, T. A., Meyers, A. W., May, R. K., & Whelan, J. P. (2002).
Development and validation of the Gamblers’ Beliefs Questionnaire.
Psychology of Addictive Behaviors, 16, 143–149.
Steiner, I. D. (1972). Group process and productivity. New York:
Academic Press.
Stewart, D. D., Billings, R. S., & Stasser, G. (1998). Accountability and
the discussion of unshared, critical information in decision-mak-
ing groups. Group Dynamics: Theory, Research, and Practice, 2,
18–23.
Stigler, G. J. (1961). The economics of information. Journal of Political
Economy, 69, 213–225.
546 REFERENCES
Stigler, S. M. (1990). A Galtonian perspective on shrinkage estimators.

Statistical Science, 5, 147–155.
Stigler, S. M. (1999). Statistics on the table: The history of statistical con-
cepts and methods. Cambridge, MA: Harvard University Press.
Stiglitz, J. E. (2010). Freefall: America, free markets, and the sinking of
the world economy. New York: Norton.
Stroebe, W. & Diehl, M. (1994). Why groups are less effective than their
members: On productivity losses in idea-generating groups. In
W. Stroebe & M. Hewstone (Eds.), European review of social psy-
chology (Vol. 5, pp. 271–303). London: Wiley.
Studdert, D. M., Mello, M. M., Sage, W. M., DesRoches, C. M., Peugh, J.,
Zappert, K., et al. (2005). Defensive medicine among high-risk spe-
cialist physicians in a volatile malpractice environment. Journal of
the American Medical Association, 293, 2609–2617.
Stumpf, H., Angleitner, A., Wieck, T., Jackson, D. N., & Beloch-Till,
H. (1984). Deutsche Personality Research Form (PRF). Göttingen,
Germany: Hogrefe.
Suantak, L., Bolger, F., & Ferrell, W. R. (1996). The hard-easy effect
in subjective probability calibration. Organizational Behavior and
Sundali, J. & Croson, R. (2006). Biases in casino betting: The hot hand and
the gambler’s fallacy. Judgment and Decision Making, 1, 1–12.
Suppes, P. (1984). Conflicting intuitions about causality. Midwest
Studies in Philosophy, 9, 151–168.
Surowiecki, J. (2005). The wisdom of crowds. New York: Anchor.
Svenson, O., Fischhoff, B., & MacGregor, D. (1985). Perceived driving
safety and seatbelt usage. Accident Analysis and Prevention, 17,
119–133.
Takezawa, M., Gummerum, M., & Keller, M. (2006). A stage for the
rational tail of the emotional dog: Roles of moral reasoning in group
decision making. Journal of Economic Psychology, 27, 117–139.
Taleb, N. N. (2007). The black swan: The impact of the highly improb-
able. New York: Random House.
Tamaki, M. (1985). Adaptive approach to some stopping problems.
Journal of Applied Probability, 22, 644–652.
Tamaki, M. (1988). Optimal stopping in the parking problem with
U-turn. Journal of Applied Probability, 25, 363–374.
Tatsuoka, M. M. (1988). Multivariate analysis: Techniques for educa-
tional and psychological research. New York: Macmillan.
Taylor, S. E. (1991). Asymmetrical effects of positive and negative
events: The mobilization-minimization hypothesis. Psychological
Bulletin, 110, 67–85.
Taylor, S. E. & Brown, J. D. (1988). Illusion and well-being: A social-
psychological perspective on mental health. Psychological Bulletin,
103, 193–210.
Tenenbaum, J. B., Griffiths, T. L., & Kemp, C. (2006). Theory-based
Bayesian models of inductive learning and reasoning. Trends in
Cognitive Sciences, 10, 309–318.
REFERENCES 547
Tetlock, P. E. & Boettger, R. (1989). Accountability: A social magnifier

of the dilution effect. Journal of Personality and Social Psychology,
57, 388–398.
Thaler, R. H. & Benartzi, S. (2004). Save more tomorrow: Using behav-
ioral economics to increase employee saving. Journal of Political
Economy, 112, 164–187.
Thaler, R. H. & Sunstein, C. R. (2008). Nudge: Improving decisions about
health, wealth, and happiness. New Haven, CT: Yale University
Press.
Thompson, R. G. & Richardson, A. J. (1998). A parking search
model. Transportation Research Part A: Policy and Practice, 32,
159–170.
Thorngate, W. (1980). Efficient decision heuristics. Behavioral Science,
25, 219–225.
Todd, J. T. (1981). Visual information about moving objects. Journal of
Experimental Psychology: Human Perception and Performance, 7,
8795–8810.
Todd, P. M. (2001). Fast and frugal heuristics for environmentally
bounded minds. In G. Gigerenzer & R. Selten (Eds.), Bounded
rationality: The adaptive toolbox (pp. 51–70). Cambridge, MA:
MIT Press.
Todd, P. M. & Dieckmann, A. (2005). Heuristics for ordering cue search
in decision making. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.),
Advances in neural information processing systems (Vol. 17,
pp. 1393–1400). Cambridge, MA: MIT Press.
Todd, P. M., & Gigerenzer, G. (1999). What we have learned (so far).
In G. Gigerenzer, P. M. Todd, & the ABC Research Group, Simple
heuristics that make us smart (pp. 357–365). New York: Oxford
University Press.
Todd, P. M. & Gigerenzer, G. (2000). Précis of Simple heuristics that
make us smart. Behavioral and Brain Sciences, 23, 727–741.
Todd, P. M. & Gigerenzer, G. (2001). Shepard’s mirrors or Simon’s scis-
sors? Commentary on R. N. Shepard, Perceptual-cognitive univer-
sals as reflections of the world. Behavioral and Brain Sciences, 24,
704–705.
Todd, P. M. & Goodie, A. S. (2002). Testing the ecological rationality of base
rate neglect. In B. Hallam, D. Floreano, J. Hallam, G. Hayes, and J.-A.
Meyer (Eds.), From animals to animats 7: Proceedings of the Seventh
International Conference on Simulation of Adaptive Behavior (pp.
215–223). Cambridge, MA: MIT Press/Bradford Books.
Todd, P. M. & Heuvelink, A. (2007). Shaping social environments with
simple recognition heuristics. In P. Carruthers, S. Laurence, & S.
Stich (Eds.), The innate mind, Vol. 2: Culture and cognition (pp.
165–180). Oxford: Oxford University Press.
Todd, P. M. & Kirby, S. (2001). I like what I know: How recognition-
based decisions can structure the environment. In J. Kelemen & P.
Sosík (Eds.), Advances in artificial life: 6th European Conference
Proceedings (ECAL 2001) (pp. 166–175). Berlin: Springer.
548 REFERENCES
Todd, P. M. & Miller, G. F. (1999). From pride and prejudice to persua-

sion: Satisficing in mate search. In G. Gigerenzer, P. M. Todd, & the
ABC Research Group, Simple heuristics that make us smart (pp.
287–308). New York: Oxford University Press.
Todd, P. M. & Schooler, L. J. (2007). From disintegrated architectures
of cognition to an integrated heuristic toolbox. In W. D. Gray (Ed.),
Integrated models of cognitive systems (pp. 151–164). New York:
Oxford University Press.
Toth, J. P. & Daniels, K. A. (2002). Effects of prior experience on judg-
ments of normative word frequency: Automatic bias and correc-
tion. Journal of Memory and Language, 46, 845–874.
Towle, A., Godolphin, W., Grams, G., & Lamarre, A. (2006). Putting
informed and shared decision making into practice. Health
Expectations, 9, 321–332.
Tucker, W. (1987). Where do the homeless come from? National Review,
39, 34–44.
Tuddenham, R. D. & Snyder, M. M. (1954). Physical growth of California
boys and girls from birth to eighteen years. Berkeley: University of
California Press.
Turner, N. E. & Horbay, R. (2004). How do slot machines and other
electronic gambling machines actually work? Journal of Gambling
Issues, 11. Retrieved August 18, 2009, from http://www.camh.net/
egambling/issue11/index.html.
Tversky, A. (1972). Elimination by aspects: A theory of choice.
Tversky, A. & Kahneman, D. (1973). Availability: A heuristic for judg-
ing frequency and probability. Cognitive Psychology, 5, 207–232.
Tversky, A. & Kahneman, D. (1974). Judgment under uncertainty:
Heuristics and biases. Science, 185, 1124–1131.
Ubel, P. A. & Loewenstein, G. (1997). The role of decision analysis in
informed consent: Choosing between intuition and systematicity.
Social Science and Medicine, 44, 647–656.
Uexküll, J. von. (1957). A stroll through the worlds of animals and
men: A picture book of invisible worlds. In C. H. Schiller (Ed. &
Trans.), Instinctive behavior: The development of a modern concept
(pp. 5–80). New York: International Universities Press.
Underwood, B. J., Zimmerman, J., & Freund, J. S. (1971). Retention of
frequency information with observations on recognition and recall.
Journal of Experimental Psychology, 87, 149–162.
Vanderbilt, T. (2008). Traffic: Why we drive the way we do (and what it
says about us). New York: Knopf.
Van der Goot, D. (1982). A model to describe the choice of parking
places. Transportation Research Part A: General, 16, 109–115.
Volz, K. G., Schooler, L. J., Schubotz, R. I., Raab, M., Gigerenzer, G.,
& Cramon, D. Y. von. (2006). Why you think Milan is larger than
Modena: Neural correlates of the recognition heuristic. Journal of
Cognitive Neuroscience, 18, 1924–1936.
REFERENCES 549
von Neumann, J. & Morgenstern, O. (1947). Theory of games and

economic behavior. Princeton, NJ: Princeton University Press.
Vroom, V. H. (1969). Industrial social psychology. In G. Lindzey & E.
Aronson (Eds.), Handbook of social psychology (pp. 196–268).
Reading, MA: Addison-Wesley.
Wagenaar, W. A. (1988). Paradoxes of gambling behavior. Hillsdale, NJ:
Erlbaum.
Wagenaar, W. A., Keren, G. B., & Pleit-Kuiper, A. (1984). The multiple
objectives of gamblers. Acta Psychologica, 56, 167–178.
Wainer, H. (1976). Estimating coefficients in linear models: It don’t
make no nevermind. Psychological Bulletin, 83, 213–217.
Wald, A. (1947). Sequential analysis. New York: Wiley.
Waldmann, M. R., Holyoak, K. J., & Fratianne, A. (1995). Causal models
and the acquisition of category structure. Journal of Experimental
Waldmann, M. R. & Martignon, L. (1998). A Bayesian network model
of causal learning. In M. A. Gernsbacher & S. J. Derry (Eds.),
Proceedings of the Twentieth Annual Conference of the Cognitive
Science Society (pp. 1102–1107). Mahwah, NJ: Erlbaum.
Walker, M. B. (1990). The presence of irrational thinking among poker
machine players. In M. G. Dickerson (Ed.), 200-UP. Canberra:
National Association for Gambling Studies.
Walker, M. B. (1992a). Irrational thinking among slot machine players.
Journal of Gambling Studies, 8, 245–261.
Walker, M. B. (1992b). The psychology of gambling. Oxford: Pergamon.
Wallin, A. & Gärdenfors, P. (2000). Smart people who make simple heu-
ristics work. Behavioral and Brain Sciences, 23, 765.
Wallsten, T. S., Budescu, D. V., Zwick, R., & Kemp, S. M. (1993).
Preference and reasons for communicating probabilistic informa-
tion in numerical or verbal terms. Bulletin of the Psychonomic
Society, 31, 135–138.
Wang, X. T. (1996). Domain-specific rationality in human choices:
Violations of utility axioms and social contexts. Cognition, 60,
31–63.
Wason, P. C. (1960). On the failure to eliminate hypotheses in a con-
ceptual task. Quarterly Journal of Experimental Psychology, 12,
129–140.
Wason, P. C. (1968). Reasoning about a rule. Quarterly Journal of
Experimental Psychology, 20, 273–281.
Wasserman, E. A., Dorner, W. W., & Kao, S.-F. (1990). Contributions of
specific cell information to judgments of interevent contingency.
Journal of Experimental Psychology: Learning, Memory, and
Cognition, 16, 509–521.
Weber, E. U., Siebenmorgen, N., & Weber, M. (2005). Communicating
asset risk: How name recognition and the format of historic volatil-
ity information affect risk perception and investment decisions.
Risk Analysis, 25, 597–609.
550 REFERENCES
Weinstein, N. D. (1999). What does it mean to understand a risk?

Evaluating risk comprehension. Journal of the National Cancer
Institute Monographs, 25, 15–20.
Weisberg, S. (1985). Applied linear regression. New York: Wiley.
Weiss, H. & Bradley, R. S. (2001). What drives societal collapse?
Science, 291, 609–610.
Whittlesea, B. W. A. (1993). Illusions of familiarity. Journal of Experimental
Whittlesea, B. W. A. & Leboe, J. P. (2003). Two fluency heuristics (and how
to tell them apart). Journal of Memory and Language, 49, 62–79.
Widrow, B. & Hoff, M. E. (1960). Adaptive switching circuits. IRE
WESCON Convention Record, 4, 96–104.
Wiegmann, D. D. & Morris, M. R. (2005). Search behavior and mate
choice. Recent Research Developments in Experimental &
Theoretical Biology, 1, 201–216.
Wilks, S. S. (1938). Weighting schemes for linear functions of correlated
variables when there is no dependent variable. Psychometrika, 3,
23–40.
Williams, T. M., Estes, J. A., Doak, D. F., & Springer, A. M. (2004). Killer
appetites: Assessing the role of predators in ecological communi-
ties. Ecology, 85, 3373–3384.
Wilson, D. K., Purdon, S. E., & Wallston, K. A. (1988). Compliance to
health recommendations: A theoretical overview of message fram-
ing. Health Education Research, 3, 161–171.
Winkielman, P. & Cacioppo, J. T. (2001). Mind at ease puts a smile
on the face: Psychophysiological evidence that processing facili-
tation leads to positive affect. Journal of Personality and Social
Winkielman, P., Schwarz, N., Fazendeiro, T. A., & Reber, R. (2003). The
hedonic marking of processing fluency: Implications for evalu-
ative judgment. In J. Musch & K. C. Klauer (Eds.), The psychol-
ogy of evaluation: Affective processes in cognition and emotion
(pp. 189–217). Mahwah, NJ: Erlbaum.
Wittenbaum, G. M. & Stasser, G. (1996). Management of information
in small groups. In J. L. Nye & A. B. Brower (Eds.), What’s social
about social cognition? (pp. 3–28). London: Sage.
Woike, J., Hertwig, R., & Hoffrage, U. (2009). Estimating the world.
Manuscript in preparation.
Woloshin, S., Schwartz, L. M., Byram, S. J., Sox, H. C., Fischhoff, B.,
& Welch, G. (2000). Women’s understanding of the mammography
screening debate. Archives of Internal Medicine, 160, 1434–1440.
Woodley, W. L., Simpson, J., Biondini, R., & Berkeley, J. (1977). Rainfall
results 1970–75: Florida area cumulus experiment. Science, 195,
735–742.
Wottawa, H. & Hossiep, R. (1987). Grundlagen psychologischer diag-
nostik: Eine einfuehrung. [Foundations of psychological diagnos-
tics: An introduction]. Göttingen, Germany: Hogrefe Verlag.
REFERENCES 551
Wübben, M. & Wangenheim, F. V. (2008). Instant customer base analy-

sis: Managerial heuristics often “get it right.” Journal of Marketing,
72 (May), 82–93.
Yamagishi, K. (1997). When a 12.86% mortality is more dangerous than
24.14%: Implications for risk communication. Applied Cognitive
Yaniv, I. & Hogarth, R. M. (1993). Judgmental versus statistical predic-
tion: Information asymmetry and combination rules. Psychological
Science, 4, 58–62.
Yee, M., Hauser, J., Orlin, J., & Dahan, E. (2007). Greedoid-based non-
compensatory two-stage consideration-then-choice inference.
Marketing Science, 26, 532–549.
Young, W. (1986). A model of vehicles movements in parking facilities.
Mathematics and Computers in Simulation, 28, 305–309.
Zacks, R. T., Hasher, L., & Sanft, H. (1982). Automatic encoding of event
frequency: Further findings. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 8, 106–116.
Zajonc, R. B. (1968). Attitudinal effects of mere exposure. Journal of
Personality and Social Psychology, 9, 1–27.
Zajonc, R. B. (1980). Feeling and thinking: Preferences need no infer-
ences. American Psychologist, 35, 151–175.
Zakay, D. (1990). The role of personal tendencies in the selection of
decision-making strategies. Psychological Record, 40, 207–213.
Zapka, J. G., Geller, B. M., Bulliard, J. L., Jacques, F., Helene, S. G., &
Ballard-Barbash, R. (2006). Print information to inform decisions
about mammography screening participation in 16 countries with
population-based programs. Patient Education and Counseling,
63, 126–137.
Zellner, A. & Revankar, N. (1970). Generalized production function.
Review of Economic Studies, 36, 241–250.
Zerssen, D. (1994). Persönlichkeitszüge als Vulnerabilitätsindikatoren—
Probleme ihrer Erfassung. Fortschritt der Neurologie, Psychiatrie
und ihrer Grenzgebiete, 62, 1–13.
Zipf, G. K. (1949). Human behavior and the principle of least effort.
Cambridge, MA: Addison-Wesley.
Zola, I. (1963). Observations on gambling in lower-class settings. Social
Problems, 10, 353–361.
Name Index
Abbondanza, M., 468 Andersson, P., 129

Abelson, R. P., 219, 221 Ariely, D., 413
Absetz, P., 452 Aristotle, 360
Adamowicz, W. A., 339–340 Armelius, B., 258
Aiken, L. S., 191 Armelius, K., 258
Albers, W., 385n Armor, D. A., 86
Albon, S. D., 241 Armstrong, J. S., 61
Allan, L. G., 316 Armstrong, K., 450
Allen, C., 217 Arnott, R., 458
Allen, F. W., 428 Aro, A. R., 452
Allison, R. I., 117 Arrow, K. J., 495
Allison, T., 204 Asendorpf, J. B., 13
Alloy, L. B., 305 Ashby, F. G., 361
Altmann, E. M., 145 Asuncion, A., 206, 376
Alwin, D. E., 248 Au, W. T., 97
Anas, A., 153 Austin, A. A., 263
Anderson, C., 383 Axelrod, R., 10
Anderson, J. R., 26, 33, 83, 133, Ayton, P., 24, 128, 305
146, 147–150, 151, 155, 156,
248, 249, 258, 259, 279, 320, Babler, T. G., 6
333, 404, 406 Bachmann, L. M., 430
Anderson, S. P., 457 Bäck, T., 474
552
NAME INDEX 553
Bak, P., 383, 404 Billings, R. S., 338

Baltes, P. B., 100 Biondini, R., 204
Banks, S. M., 451 Birchmeier, Z., 338
Baranski, J. V., 97 Björk, E. L., 145
Baratgin, J., 93 Björk, R. A., 145
Barber, B., 62 Björkman, M., 83, 97
Barbey, A. K., 438 Black, W. C., 430, 432, 442, 445, 449
Barclay, J., 432 Blank, H., 93
Bar-Hillel, M., 434 Bless, H., 228
Barlow, H., 81, 82 Blymmides, N., 360
Baron, J., 450 Boer, H., 443
Baron, R. S., 330, 339 Boethius, A. M. S., 360
Barratt, A., 432, 445 Boettger, R., 305
Bartels, R. D., 451 Bohr, N., 33
Baucells, R. D., 76–77, 378 Bolger, F., 97, 216
Baumann, M. R., 340 Bonner, B. L., 340
Beach, L. R., 33, 100, 230, 231, 239 Bookstaber, R., 25, 40
Beaman, C. P., 125, 141 Borges, B., 122, 125
Bearden, J. N., 459 Borkenau, P., 228
Beauchamp, G., 474 Bornstein, A.-L., 336
Beauducel, A., 229 Both, C., 40
Becker, B., 124 Bothell, D., 156
Becker, G. S., 414, 420 Bottorff, J. L., 450
Bedell, B. T., 451 Bouwhuis, S., 40
Begg, I. M., 153 Bower, G. H., 288, 316
Bell, P. R. F., 449 Boyd, M., 125
Benartzi, S., 412 Boyd, R., 10, 258, 339, 423
Bennis, W., 421, 422 Boyle, P., 430
Bentley, J. L., 279, 280, 284, 302 Bradley, R. S., 40
Berg, M. van den, 380 Brakman, S., 380, 405
Berg, N., 13, 418, 490 Brand, S., 336
Bergert, F. B., 29, 43, 138, 267 Brandstätter, E., 20, 264
Berkeley, J., 204 Brase, G. L., 435, 438
Berman, L., 449 Brehmer, B., 226, 305
Berndt, E. R., 205 Breiman, L., 42, 75, 360, 376, 386
Bernstein, I. H., 71 Brighton, H., 33, 60, 258, 262, 268
Bernstein, J., 446 Bröder, A., 21, 123, 124, 128, 133,
Berretty, P. M., 243, 251, 361 137n, 138, 189, 193, 215, 217,
Berwick, D. M., 434 218, 219, 220, 221, 225, 229,
Bettman, J. R., 21, 33, 73, 83, 155, 231, 233–234, 236, 238, 252,
189, 193, 226, 238, 247, 274, 391 253, 255, 267, 268, 270, 271,
Betsch, T., 225, 288 274, 289, 290, 403
Beyth-Marom, R., 324 Broek, E. van den, 439
Biele, G., 13, 115, 122, 124, 128, Brown, G. D. A., 406
129, 131, 132n, 138 Brown, J., 132, 153
Bienenstock, E., 35 Brown, J. D., 94, 105
554 NAME INDEX
Brown, N. R., 141–142 Cohen, P., 191

Brunswik, E., 34, 72, 82, 95, 119, Colinvaux, P. A., 404, 405
187, 379 Combs, B., 84
Bruner, J. S., 263 Condorcet, N. C., 176, 183
Bruss, F. T., 9 Connolly, T., 459
Buchanan, M., 383 Cook, M., 304
Bucher, H. C., 442 Cooksey, R. W., 243
Budescu, D. V., 87, 97, 450 Coombs, C. H., 91
Buffett, W., 379 Cooper, G. F., 361
Bullock, S., 304 Cooper, R., 217
Burkell, J., 450 Corbin, R. M., 468
Busemeyer, J. R., 245, 247, 249 Corrigan, B., 59, 69–71, 258
Coskun, H., 337
Camerer, C. F., 23 Cosmides, L., 437
Campbell, J. P., 71 Costa, P. T., 227
Caraco, T., 90 Coulter, A., 431
Carbone, C., 405 Covello, V. T., 428
Cardoza, A., 424 Cover, T., 42
Carey, S., 287 Covey, J., 451
Carnap, R., 3 Cowan, N., 108
Carasco, J. A., 76, 378 Croson, R., 421
Castellan, N. J., 305 Cullen, R. M., 124, 128, 131
Cayley, A., 366 Cummins, T. D., 192, 217, 220,
Chakrin, L. M., 65 221, 226, 238, 240
Chamot, E., 450 Curran, T., 121
Chapman, G. B., 248 Currie, S. R., 421
Chapman, J. P., 108 Cuthill, I. C., 469
Chapman, L. J., 108 Czerlinski, J., 9, 33, 41, 43, 45,
Charles, C., 431 195, 197, 203, 219, 222, 257,
Charniak, E., 334 258, 269, 275, 344, 345, 388,
Charvet, A. L., 450 393, 398, 399, 405, 491, 493
Chase, V. M., 331, 332, 333, 334
Chater, N., 41–43, 59, 83, 103, D’Agostino, R. B., 362
217, 256, 258, 311, 316, 330, Dagum, P., 334, 491
333, 406, 488 Dahan, E., 20
Cheng, P. W., 316 Dannemiller, J. L., 6
Christen, S., 271, 272, 290 Darwin, C., 13
Christensen, L. R., 205 Daub, S., 271
Christensen-Szalanski, J. J. J., 230, Dasarathy, B., 42
231, 239 Daston, L. J., 12
Christiansen, E. M., 420 Davis, J. H., 172, 336, 337,
Chu, P. C., 230, 231, 239 341, 355n
Claudy, J. G., 70 Davis, J. R., 8
Clutton-Brock, T. H., 241 Dawes, R. M., 9, 59, 61, 64, 65,
Cockburn, C., 432 69–71, 99, 252, 258, 311, 398
Cockburn, J., 445, 450 Dawkins, R., 5–6
Collett, T. S., 8 DeGroot, M. H., 458
Cohen, J., 191, 192, 208 Dehaene, S., 288
NAME INDEX 555
de Koning, H. J., 452 Eichler, A., 133, 137n, 218

Delfabbro, P., 421 Einhorn, H. J., 59, 64–65, 70, 75,
DeMiguel, V., 4, 5, 10, 492 226, 263, 311, 316, 372, 398
Dennett, D. A., 243 Ekman, M., 129
Detweiler, J. B., 451 Ellis, A. L., 172, 339, 340
Dhami, M. K., 23–24, 71, 83, Elmore, J. G., 440
305, 371 Elwyn, G., 451
Dickerson, M., 421 Engel, C., viii, 362, 427
Dieckmann, A., 21, 125, 194, 209, Enquist, M., 241
211, 213, 254, 257, 270, 271, Epstein, R. A., 426
281, 285, 305, 373, 402 Erev, I., 87, 97, 99
Dieckmann, N., 449 Ernster, V., 432
Diehl, M., 337 Estes, J. A., 405
Dillner, L., 428 Estes, W. K., 288, 301
Dixon, A., 449 Ettenson, R., 24
Doak, D. F., 405 Evans, J. S. B. T., 313, 315, 324, 488
Dobias, K. S., 443 Ewald, P. W., 125
Doherty, M. E., 324
Doherty, M. L., 134 Fahrenberg, J., 228
Domenighetti, G., 430, 442 Fair, R. C., 71
Domingos, P., 59, 373 Farinacci, S., 153
Dormandy, E., 450 Fasolo, B., 19, 188, 214, 215, 270,
Dorner, W. W., 317 371, 439
Dosher, B. A., 247 Faust, D., 64
Dougherty, M. R. P., 121 Fazendeiro, T. A., 153
Douglas, M., 91 Feigenson, L., 287
Doursat, R., 35 Feldman, M. W., 427
Doya, K., 27 Ferguson, T. S., 458
Doyal, L., 431, 448, 449 Ferlay, J., 430
Doyle, A. C., 144 Fermat, P., 12
Drossaert, C. H. C., 443 Fermi, E., 383–384
Dubé, D., 421 Fernandez, D., 135, 137
Dubner, S. J., 383 Ferreira, V. S., 326, 332, 334
Dudey, T., 455, 459 Ferrell, W. R., 97
Dugosh, K. L., 337 Fey, M., 424
Dunn, A. S., 446, 447, 448, 450 Fiedler, K., 101, 103, 104, 165, 259
Dzindolet, M. T., 337 Fildes, R., 68
Fineberg, H. V., 434
Eadington, W. R., 420 Finkelstein, M. O., 105
Eba, A., 246 Finucane, M.,86
Ebbinghaus, H., 240 Fischer, J. E., 365–366, 371
Ebert, A., 444 Fischhoff, B., 84, 85, 86, 90, 93,
Echterhoff, W., 105 94, 105, 324, 430
Edgell, S. E., 305 Fishburn, P. C., 243, 274
Edman, J., 129 Fisher, R. A., 244
Eddy, M., 7 Flexser, A. J., 288
Edwards, A. G. K., 440, 442, 451, 452 Fogel, D., 17, 249
Edwards, W., 27 Forbes, C. A., 452
556 NAME INDEX
Ford, J. K., 134, 138 Ghosh, A. K., 442

Forster, M. R., 314, 367 Ghosh, K., 442
Foster-Fishman, P. G., 338 Gibson, J. J., 82
Fox, J., 40, 59 Gifford, R. K., 104
Foxall, G. R., 228 Gigerenzer, G., viii, 9, 10, 11, 12,
Franklin, B., 12–13, 14 13, 14, 15, 20, 25, 29, 33, 34,
Frank, R., 429 41, 60, 71, 74, 81, 82, 92, 93,
Franks, N. R., 12 95, 106, 114, 115, 116–117,
Fratianne, A., 373 118, 119, 120–121, 122, 123,
Frege, G., 487 125, 127, 128, 129, 131, 132,
Freund, J. S., 288 135, 138, 151, 152–153,
Friedman, J. H., 42, 59, 75, 155–156, 164, 165, 167,
360, 386 168–169, 171, 178, 183, 188,
Friedman, M., 490, 100 190, 192, 193, 195, 197, 203,
Friedman, N., 192 217, 219, 222, 232, 233, 238,
Frings, C., 122, 124, 128, 244, 246, 248, 252, 253, 254,
131, 152 257, 258, 259, 261, 262, 264,
Frosch, C., 125, 141 266, 268, 275, 276, 277, 278,
Fudenberg, D., 464 283, 284, 287, 294, 306, 338,
Fuller, B., 409 341, 344, 348, 361, 362, 363,
Funder, D. C., 311 367, 370, 384, 388, 399, 405,
Furby, L., 86, 100 414, 417, 422, 427, 429, 430,
Furedi, A., 428 431, 433, 434, 435, 436, 437,
Furnival, A., 432 438, 439, 440, 442, 443, 444,
445, 446, 447, 451, 487, 489,
Gabaix, X., 382 490, 491, 495, 496
Gaboury, A., 421 Gigone, D., 172, 338
Gafni, A., 431 Gilbert, D. T., 140–141
Gaissmaier, W., 14, 125, 138, 218, Gil-White, F. J., 339
236, 253, 259, 260, 267, 403, Gimbel, R. W., 409
429, 495 Girotto, V., 435, 438
Galef, B. G., Jr., 118 Giroux, I., 421
Galesic, M., 429, 449 Gittleman, J. L., 405
Galileo, 310 Gladwell, M., 75
Galton, F., 72, 89 Glendon, G., 451
Gambetta, D., 242 Goddard, K., 141
Garcia-Retamero, R., 23, Godolphin, W., 447
276, 305 Goldberg, L. R., 63
Gärdenfors, P., 276 Goldberger, A. S., 71
Garlappi, L., 4, 10, 492 Goldstein, D. G., 9, 10, 33, 41, 74,
Garretsen, H., 380 114, 115, 116–117, 118, 119,
Gartner, B., 416 120–121, 122, 123, 125, 127,
Gates, B., 379 128, 129, 132, 135, 138, 140,
Gefenas, E., 409 151, 152–153, 155–156, 165,
Geiger, D., 192 167, 168–171, 178, 183, 188,
Geman, S., 35, 46, 59 192, 193, 194, 195, 197, 203,
Gettys, C. F., 121 219, 222, 233, 244, 246, 252,
Ghiselli, E. E., 71 253, 254, 257, 258, 261, 266,
NAME INDEX 557
268, 275, 276, 277, 278, 283, Hägeli, P., 264

284, 287, 294, 306, 344, 361, Halliday, T., 8
363, 367, 384, 388, 409–411, Hallowell, N., 450
413, 417, 422, 491 Halupka, K., 455, 459
Goldsmith, M., 145 Hamill, H., 242
Goldsmith, R. E., 228 Hamilton, D. L., 104
Goldszmidt, M., 192 Hamm, R. M., 432, 445
Gonzalez, M., 435, 438 Hammond, K. R., 83
Good, I. J., 3, 313 Hann, A., 430
Goodie, A. S., 239 Hanna, C., 338
Goodnow, J. J., 263 Hansell, M., 427
Gordon, K., 72 Harrigan, K. A., 421, 425
Gorman, P., 424 Hart, P., 42
Gorman, S., 424 Hasher, L., 140, 288, 301
Gøtzsche, P. C., 441, 452 Hasson, U., 140
Grady, D., 432 Hastie, R., 172, 338, 340, 354, 356
Gramm, K., 104 Hastie, T., 42, 59
Grams, G., 447 Hastroudi, S., 133
Granger, C. W. J., 66 Haupert, M., 417
Grant, M., 416 Hauser, J., 20
Gray, J. A. M., 430 Hauser, M. D., 287
Gray, W. D., 145 Hausmann, D., 217, 240, 271,
Green, D., 332, 334 272, 290, 403, 404
Green, J., 450 Hayek, F., 167, 413, 414
Green, L., 75, 362–365, 367–368, Hayes, P. D., 449
370, 371, 372, 375 Hays, C. J., 336
Green, W. A., 335 Heilbrun, K., 449
Greene, W. H., 71, 205 Heitmann, M., 413
Gregory, R., 90 Hell, W., 93
Greifeneder, R., 17 Heller, R. F., 442
Grice, H. P., 334 Helversen, B. von, 398, 400, 401,
Griffin, D., 94–95, 96, 97 403n, 404
Griffiths, M. D., 421, 425 Hennessey, J. E., 305
Griffiths, T. L., 27, 59, 71, Henrich, J., 339
Grimes, D. A., 449 Henry, E., 445
Groffman, B., 176, 336 Herrmann, A., 413
Groß, R., 69 Hertel, G., 336
Grove, W. M., 64 Hertwig, R., viii, 8, 9, 20, 29, 71,
Gummerum, M., 8 79, 83, 87, 88, 89n, 92, 109,
Gurmankin, A. D., 450 115, 117, 121, 122, 125, 126,
Guttman, L., 53 130, 131–132, 139, 140, 142n,
Gyr, K., 442 151, 153, 154, 156, 157–164,
Gyurjyan, G., 324 166, 217, 251, 259, 264, 344,
384, 385, 386, 387, 388, 398,
Ha, Y.-W., 102, 256, 311, 313, 402, 404, 406, 435, 438,
324, 330, 333 439, 496
Haberstroh, S., 225 Herzog, S. M., 126, 151, 161,
Hacking, I., 360, 366 259, 404
558 NAME INDEX
Heuvel, W. J. A. van den, 441 Hurwitz, B., 448

Heuvelink, A., 122, 427 Hutchinson, J. M. C., 12, 20, 455,
Hey, J. D., 9, 455 459, 469
Hibbard, J. H., 449
Hibon, M., 66–69, 72 Ishii, S., 27
Hiel, A. van, 338, 358 Jackson, E., 256
Hilgard, E. R., 316 Jackson, A. D., 383
Hill, D., 71, 445 Jacoby, L. L., 132, 153
Hinvest, N., 256 Jäger, A. O., 229
Hinsz, V. B., 172, 337, 355 Jain, B. P., 428
Hintzman, D. L., 121, 165 James, W., 145
Hoff, M. E., 283 Janis, I. L., 337
Hoffrage, U., viii, 17, 21, 51, 53, Jasechko, J., 132, 153
57, 71, 76, 83, 92, 95, 114, 153, Jedidi, K., 20
189, 193, 217, 219, 222, Jepson, R. G., 452
223–224, 232, 235, 238, Jerusalem, M., 228
249–250, 251, 254, 256, 261, Jessop, A., 331
262, 266, 272, 278, 289, 290, Jiang, W., 4
305, 335, 339, 340, 341, 343, Joab, S. A., 450
344, 345, 347, 348, 349, 353, Johnson, E. J., 10, 20, 21, 23, 33,
354, 355, 371, 372, 378, 384, 73, 83, 155, 189, 193, 222,
391n, 402, 433, 434, 435, 436, 226, 238, 247, 274, 391,
437, 438, 439, 444, 446, 447, 409–411, 413
452, 493, 496 Johnson, J., 242, 259
Hogarth, R. M., 9, 18, 59, 64, 65, Johnson, J. G., 245, 247
70, 72, 75–78, 138, 189, 190, Johnson, J. L., 450
191, 192, 254, 262, 265, 269, Johnson, M. K., 133
311, 316, 353–354, 371, 378, Johnson, M. P., 205
393, 398, 402, 494 Johnston, J., 71
Hofstee, W. K. B., 79 Jorland, G., 107
Höldke, B., 431, 432, Judge, G. G., 71
438, 440 Juslin, P., 83, 95–96, 97, 99, 100,
Hollingshead, A. B., 358 109, 247, 277, 281, 302, 401,
Holt, R. R., 63 403, 404
Holte, R. C., 264 Jussim, L., 103
Holzworth, R. J., 243
Holyoak, K. J., 373 Kahneman, D., 27, 79, 81, 92,
Hood, W. B., Jr., 362 121, 164, 248, 259, 488, 496
Hope, C., 90 Kaiser, M. K., 7, 10
Horbay, R., 425 Kameda, T., 172, 340, 354, 356
Horowitz, C. R., 446 Kamerud, D. B., 105
Horwich, P., 309, 314 Kant, E., 487, 488
Hossiep, R., 366 Kao, S.-F., 317, 319
Howden-Chapman, P., 442 Kareev, Y., 108, 165
Howe, C. Q., 80 Karelaia, N., 9, 18, 76–78, 138,
Howson, C., 309 189, 190, 191, 192, 254, 262,
Huberman, G., 4 263, 265, 269, 353–354, 371,
Hults, B. H., 134 378, 393, 402, 494
NAME INDEX 559
Katsikopoulos, K. V., 20, 127, Kuhl, J., 228

128, 168, 171, 174–175, Kukla, A., 103
177–179, 182, 349–350, 361, Kurzenhäuser, S., 87, 88, 121,
363, 371, 378, 412, 439, 493 164, 435, 437, 443, 447
Katz, S. J., 443 Kurz-Milcke, E., 14, 429, 495
Keeney, R. L., 73 Kutner, M. H., 191
Keillor, G., 105
Keller, C., 449 Ladouceur, R., 421
Keller, M., 8 Läge, D., 271, 272, 290, 403
Kelley, C. M., 132, 153 Lagnado, D., 217
Kemp, C., 27 Lakatos, I., 310
Kemp, S. M., 450 Laland, K. N., 251, 427
Kepler, J., 490 Lamarre, A., 447
Keppel, G., 149–150 Lambos, C., 421
Keren, G. B., 97, 421 Land, M. F., 8
Kerlikowske, K., 432 Landauer, T. K., 391
Kerr, N. L., 336, 339 Langer, E. J., 91
Keykhah, M., 90 Langley, P., 287
Keynes, J. M., 79 Langsam, J., 25, 40
Keys, C. B., 338 Larrick, R., 72
Kirby, S., 163 Larson, J. R., 338, 358
Kiso, T., 424 Laskey, K. B., 59, 192
Klauer, K. C., 104 Laughlin, P. R., 172, 339, 340
Klayman, J., 102, 256, 311, 313, Lautrup, B. E., 383
322, 324, 330, 333 Lave, L., 451, 452
Kleffner, D. A., 80 Layman, M., 84
Kleinbolting, H., 95, 114, 217, Lazarus, H., 335
266, 438 Lebiere, C., 151, 155, 156, 404
Kleinmuntz, B., 64 Leboe, J. P., 153
Kleiter, G. D., 437 Lebow, B. S., 64
Knight, J. A., 451 Lee, M. D., 27, 192, 217, 220, 221,
Knowles, G., 417 226, 238, 240, 459
Kohli, R., 20 Lee, P. J., 142
Koehler, J. J., 92, 434, 440 Legato, F., 424n
Koren, S., 226 Legendre, N., 421
Koriat, A., 93–94, 145, 153 Lehman, D. R., 319
Kouides, R. M., 432 Lehman, S., 383
Krauchunas, S. M., 7 Lehner, P. E., 91
Krauss, S., 438 Lehrman, S. E., 409
Kreps, D. M., 457n Leibniz, G. W., 24
Krogstad, J., 24 Leimar, O., 241
Kroll, L., 379, 380, 381 Lemaire, R., 448, 449
Krosnick, J. A., 248 Lennon, J., 113
Krueger, J., 100 Lerman, C., 441
Kruglanski, A. W., 94 Lessells, C. M., 40
Krugman, P. R., 382, 383, 405 Levav, J., 413
Krull, D. S., 140 Levi, A., 219, 221
Kuendig, S., 355, 359 Levin, B., 105
560 NAME INDEX
Levin, I. P., 317 223, 224, 235, 243, 249, 250,

Levitt, S. D., 383 251, 254, 256, 261, 262, 278,
Levy, M., 380–382, 393 289, 305, 343, 344, 348, 353,
Lewis, R. A., 452 354, 361, 367, 368, 370, 371,
Li, C., 469 372, 373, 375, 378, 384, 391n,
Lichtenstein, S., 84–85, 87–88, 399, 402, 438, 493
90, 93, 226 Martin, A., 455
Lindenberger, U. E. R., 108 Marr, D., 488
Lindsay, D. S., 133, 153 Mastro, R. G., 287
Lindsey, S., 92, 435 Mata, J., 429, 430, 431, 442, 443,
Lipe, M. G., 317 445, 446, 447
Lipkus, I. M., 429, 446, 449 Mata, R., 21, 139
Lipsey, R. G., 490 Matessa, M., 156
Lipshitz, R., 217, 276 Matter-Walstra, K., 452
Lloyd, A. J., 449 Matthews, E., 451
Locke, J., 109 May, R. K., 421
Loewenstein, G., 431 Mayseless, O., 94
Logan, J., 384 Mazzeo, M., 206
London, N. J. M., 449 McAchran, S. E., 443
Lopes, L. L., 90, 91, 105, 107, 165 McBeath, M. K., 7, 10, 29
Lou, W., 446 McBride, A., 432
Loughlin, N., 27 McCammon, I., 264
Lovato, C. Y., 450 McCartney, P., 113
Luby, M., 334, 491 McClelland, A. G. R., 216
Luce, R. D., 91, 217, 247 McClelland, G. H., 19, 188,
Luchins, A. S., 225, 259 214, 270
Luchins, E. H., 225 McCloy, R., 125, 141
Lücking, A., 435, 437 McCrae, R. R., 227
Lundberg, I. B., 27 McDermott, K. J., 326, 334
Luria, A. R., 144–145 McDiarmid, C., 288
Lyman, P., 391 McElduff, P., 442
McGeoch, C. C., 279, 280,
Ma’ayan, H., 153 284, 302
MacGregor, D. G., 86, 94, 105, 440 McGraw, P., 435
Machery, E., 363 McKenzie, C. R. M., 83, 103, 311,
MacQueen, J., 458, 466, 481 314, 316, 317–318, 320–323,
Makridakis, S., 66–69, 72 324, 326, 327, 332, 333, 334
Mallon, E. B., 12 McKinley, S. C., 386
Mallon, L., 432 McNamara, J. M., 469
Malone, P. S., 140 McQuay, H., 428
Mandel, D. R., 319 McQueen, M. J., 431
Marewski, J. N., 21, 123, 125, 128, McQuoid, L. M., 118
138, 154, 162 McSween, C., 335
Markowitz, H. M., 4, 13, 492 Meehl, P. E., 63–64
Marshall, K. G., 431, 443 Mehr, D. R., 75, 362–365,
Marteau, T. M., 438, 448, 450 367–368, 370, 371, 372, 375
Martignon, L. F., 18, 41–43, 51, Meiser, T., 104
53, 57, 59, 76, 192, 193, 222, Mellers, B., 79, 435
NAME INDEX 561
Menard, S., 191, 192 Napoli, M., 450, 451

Mennecke, B. E., 338, 358 Narula, S. C., 204
Merenstein, D., 448 Naylor, A., 449
Mertz, C. K., 449 Nease, R. F., Jr., 430
Messe, L. A., 336 Nelson, C., 64
Metsch, L. R., 443 Nelson, J. D., 330
Metzger, M. A., 421 Nelson, R., 335
Meyers, D. G., 94 Nerlove, M., 206
Meyers, A. W., 421 Nero, 416
Meyers-Levy, J., 263 Nesselroade, J. R., 100
Michalewicz, Z., 17, 249 Nestor, B., 424
Mikkelsen, L. A., 103, 314, 317, Neter, J., 191
320–323, 324, 326, 332, 333 Newbold, P., 66
Miller, D. J., 71 Newell, B. R., 133, 134, 135,
Miller, G. A., 108 137, 189, 217, 221, 225,
Miller, G. F., 9, 19, 246, 226, 231, 240, 271, 272,
260, 455 289, 290
Miller, N., 339 Newman, D. J., 206, 376
Miller, N. V., 421 Newman, M. E. J., 382n, 383,
Miller, R. G., Jr., 458, 466, 481 388, 404
Milson, R., 146 Newstead, S. E., 217
Mineka, S., 304 Neyman, J., 244, 490
Mitchell, T. R., 33, 230, 231, 239 Nickel, S., 101
Mittelhammer, R. C., 71 Nickerson, R. S., 315
Monahan, J., 440 Nieder, A., 288
Monge, P. R., 335 Nielsen, M., 441, 452
Moon, P., 455 Noelle, D. C., 324
Moore, A., 428 Nosofsky, R. M., 29, 43, 138, 248,
Moore, M. T., 147 267, 361, 370n, 386
Morgan, M. G., 451, 452 Noveck, I. A., 93
Morgenstern, O., 414 Novick, L. R., 316
Morris, M. R., 455 Nunamaker, J. F., Jr., 335
Mosvick, R. K., 335 Nunnally, J. C., 71
Moyer, C. A., 443 Nystroem, L., 441
Mueller, R. A., 100
Mugford, S. T., 12 Oaksford, M., 41, 59, 83, 103,
Mühlhäuser, I., 431, 432, 217, 256, 258, 311, 315, 330,
438, 440 333, 488
Mulford, M., 99 O’Brien, D. P., 216
Mullen, P. D., 452 Oden, G. D., 165
Murton, F., 450 Odling-Smee, F. J., 427
Musch, J., 228 Ogden, E. E., 121
Mushlin, A. I., 432 Oliveira, M., 269
Mynatt, C. R., 324 Olshen, R. A., 42, 75, 360, 386
Myung, I. J., 37, 194 Olson, C. L., 468
Olsson, A.-C., 401
Nachtsheim, C. J., 192 Olsson, H., 83, 95, 97, 109,
Nakisa, R., 41, 258 247, 401
562 NAME INDEX
Önkal, D., 128 Peters, E., 86, 429, 449

Oppenheimer, D. M., 26, 117, Peterson, C. R., 100
128, 129, 133 Petrie, M., 8
Opwis, K., 336 Petrusic, W. M., 97
Orlin, J., 20 Pfeifer, P. E., 97
Ortmann, A., 122, 125 Philipson, J., 449
Ostendorf, F., 228 Phillips, K. A., 451
Otter, R., 441 Phillips, L. D., 93
Otto, P. E., 21, 22, 189, 193, 215, Pichert, D., 412
217, 224, 226, 239, 252, 255, 268 Pill, R., 451
Over, D. E., 313, 315, 331, 332, Pit, S., 445
334, 438 Pitt, M. A., 37, 194
Owen, G., 176, 336 Place, S. S., 13
Oz, M. C., 413 Planck, M., 62
Pleit-Kuiper, A., 421
Pachur, T., viii, 30, 87, 88, 115, Pohl, R., 21, 22, 123, 125, 127,
117, 119, 121, 122, 123, 124, 130, 131–132, 403
125, 127, 128, 129, 130, Poletiek, F., 322, 332
131–132, 133, 135, 137–138, Popper, K. R., 219, 309, 487
139, 140, 142n, 164, 363 Porphyry, 360
Paepke, S., 443 Pouget, A., 27
Paese, P. W., 94 Pozen, M. W., 362
Palmer, M., 256 Preston, F. W., 421
Palmeri, T. J., 248n, 370n, 386 Pronin, E., 451
Pansky, A., 145 Provost, F., 53
Parducci, A., 103 Ptolemy,C., 489
Pareto, V., 379, 382n Purdon, S. E., 451
Park, E., 355 Purves, D., 80
Parke, J., 421, 425 Putman, V. L., 337
Pascal, B., 12
Pashler, H., 37, 194, 221 Quinlan, J. R., 42
Pasteur, L., 75
Patterson, L., 442 Raab, M., 242, 259
Paulhus, D. L., 227 Raaij, W. F. van, 219
Paulus, P. B., 337 Raffle, A. E., 452
Payne, J. W., 21, 26, 33, 73, 83, Raiffa, H., 73
155, 189, 193, 219, 222, 225, Rakow, T., 217, 256, 290
226, 230, 231, 238, 239, 240, Ramachandran, V. S., 80
247, 257, 270, 272, 274, 391 Rao, R. P. N., 27
Pazzani, M., 59, 373 Rapoport, A., 246, 249, 455,
Pearl, J., 373 458–459, 468–469
Pearson, E. S., 196, 244, 319, Ratcliff, R., 260
397, 490 Ratner, P. A., 450
Penke, L., 13 Raven, P. H., 205
Perlich, C., 53 Real, L., 90
Perneger, T. V., 450 Reber, R., 153
Persson, M., 269, 277, 281, Reddy, R., 491
302, 403 Redington, M., 41, 258
NAME INDEX 563
Redman, S., 445 Sanft, H., 288

Reichert, S. E., 474 Sarfati, D., 442
Reid, M. L., 469 Sargent, T. J., 249
Reimer, T., 126, 127, 128, 151, Savage, L. J., 25, 414, 492
168, 171, 174–175, 177–178, Sawyer, J., 64
179n, 182, 259, 335, 336, 337, Saxberg, B. V. H., 6, 29
339, 340, 341, 347, 349, 350, Scaf-Klomp, W., 441
355, 359, 404, 446 Schacter, D. L., 113
Reimer, A., 355 Schechtman, S. L., 134
Renner, B., 450 Scheibehenne, B., 17, 124, 128
Revankar, N., 205 Schiffer, S., 193, 215, 218, 219,
Rice, J. A., 204 220, 221, 233, 234, 252, 289,
Richards, M., 450 290, 403n
Richardson, A. J., 460 Schmidt, F. L., 59, 70, 398
Richerson, P. J., 10, 258, 339, 423 Schmitt, C., 444
Richter, T., 126, 135–138 Schmitt, M., 41–43, 372
Rieskamp, J., 21, 22, 121, 139, Schmitt, N., 134
189, 193, 194, 209, 211, 213, Schittekatte, M., 338, 358
215, 217, 219, 223, 224, 226, Schooler, L. J., 9, 21, 83, 115, 122,
238, 239, 252, 255, 257, 268, 125, 126, 138, 139, 146,
270, 272, 289, 290, 343, 398, 147–150, 151, 154, 155, 156,
400, 401, 402, 403n, 404 157–158, 160, 162, 258, 259,
Rilling, M., 288 404, 406
Rimer, B. K., 449, 452 Schreck, M., 452
Rivest, R., 279, 302 Schroeder, M., 383
Robredo, T., 21 Schulte-Mecklenbeck, M., 20
Roberts, S., 37, 194, 221 Schumpeter, J. A., 413
Roddick, A., 21 Schustack, M. W., 317
Roitberg, B. D., 469 Schwartz, L. M., 14, 429, 430,
Romano, N. C., Jr., 335 445, 446, 449, 495
Rose, D. A., 413 Schwarz, N., 122n, 132, 133, 153
Rosenberg, R. D., 440 Schwarzer, R., 228
Ross, L., 91 Schwefel, H.-P., 474
Rothman, A. J., 451 Schwing, R. C., 105
Rowse, J., 458 Seale, D. A., 246, 455, 458–459,
Rubinstein, A., 417 468–469
Rudolph, G., 474 Sedlmeier, P., 121, 164, 288, 447
Russer, S., 104 Sejean, R., 246
Russo, J. E., 247, 335 Selker, H. P., 362
Selten, R., viii, 11, 25, 246, 253
Saad, G., 246 Serwe, S., 122, 124, 128, 131, 152
Sackett, D. L., 442 Seydel, E. R., 443
Salmond, C., 442 Shaffer, D. M., 7, 10, 29
Salomon, I., 460 Shah, A. K., 26
Salovey, P., 451 Shanks, D. R., 133, 134, 189, 217,
Samsa, G., 449 221, 226, 271, 289, 290
Sandars, J. E., 442 Shannon, C., 256
Sandermann, R., 441 Shanteau, J., 24, 85, 190, 270
564 NAME INDEX
Shapiro, D. E., 432 Sonnad, S. S., 443

Shepard, R. N., 15, 34, 80, 81, 82, Sophocles, 241
216–217, 239, 406 Sorkin, R. D., 168, 172,
Shereshevski, S. V., 144–145 336, 354
Sherman, S. J., 104 Sox, H. C., 430
Sherony, K., 417 Späth, P., 126, 135–138
Shiller, R. J., 94 Spector, L. C., 206
Siegel-Jacobs, K., 304 Spires, E. E., 230, 231, 239
Shiloh, S., 226 Springer, A. M., 405
Shoemaker, P., 335 Squire, L. R., 147
Showers, J. L., 65 Stanovich, K. E., 23
Shridharani, K. V., 446 Stasser, G., 337, 338, 349,
Sickles, E. A., 432 354, 358
Siebenmorgen, N., 128 Statham, H., 450
Siegrist, M., 449 Staudinger, U. M., 108
Simmons, J. P., 140 Steenbergh, T. A., 421
Simon, H. A., viii, 3, 9, 14–15, 30, Steiner, I. D., 337
33, 34, 61, 82, 114–115, 143, Sternberg, R. J., 317
243, 246, 261, 273, 404, 405, Stewart, M., 274
414, 455, 489–493, 495, Stewart, D. D., 338, 358
496, 497 Stibel, J. M., 438
Simonoff, J. S., 53 Stigler, G. J., 26, 33
Simpson, J., 204 Stigler, S. M., 86, 89n,
Sivak, M., 107 100, 249
Skrable, R. P., 326 Stiglitz, J. E., 487, 492
Skubisz, C., 446 Stone, C. J., 42, 75, 360, 386
Slavin, R. E., 336 Straubinger, N., 429
Slaytor, E. K., 438, 443 Stroebe, W., 337
Sloman, S. A., 438 Strosberg, M. A., 409
Slovak, L., 438 Studdert, D. M., 362
Slovic, P., 81, 84, 85, 86, 90, 164, Stumpf, H., 228
226, 440 Suantak, L., 97
Smith, E. E., 316 Sulloway, F. J., 8
Smith, E. R., 104 Sundali, J., 421
Smith, P. L., 260 Sunstein, C. R., 412, 415
Smith, R. W., 421 Suppes, P., 373n
Smith, S. L., 432, 446 Surowiecki, J., 72
Smith, V. L., 15 Süß, H.-M., 229
Sniezek, J. A., 94 Svenson, O., 105
Snitz, B. E., 64 Switzer, F. S., 94
Snively, G. R., 449 Sytkowski, P. A., 362
Snook, B., 124, 128
Snowden, A. J., 452 Tabachnik, N., 395
Snyder, M., 103 Taft, T., 409
Snyder, M. M., 205 Takezawa, M., 8, 276, 367
Soler, J., 107 Taleb, N. N., 90
Soll, J. B., 72, 96 Tamaki, M., 458, 481
Solomon, S., 380, 381, 382, 393 Tatsuoka, M., M., 192
NAME INDEX 565
Taylor, S. E., 86, 94, 103,105 Varian, H. R., 391

Taylor, L. A., 338 Vaughn, L. A., 122n
Tenenbaum, J. B., 27, 59 Visser, M. E., 40
Tetlock, P. E., 304 Vitouch, O., 367
Thaler, R. H., 412, 415 Vollrath, D. A., 172, 337
Thomas, R. P., 190, 270 Volz, K. G., 22, 134
Thompson, D., 113 von Neumann, J., 414
Thompson, R. G., 460 Vroom, V. H., 336, 357
Thorngate, W., 73–74, 264
Tibshirani, R., 42 Wagenaar, W. A., 90, 420, 421
Tindale, R. S., 172, 337 Wainer, H., 70, 71
Tirole, J., 464 Wald, A., 244, 245, 490
Titus, W., 341, 349 Waldmann, M. R., 373
Todd, J. T., 6, 29 Walker, M. B., 420, 421
Todd, P. M., 9, 11, 13, 14, 15, 17, Wallin, A., 276, 305, 363
18, 19, 26, 34, 82, 83, 109, 114, Wallsten, T. S., 87, 97, 450
115, 118, 122, 125, 151, 155, Wallston, K. A., 451
163, 166, 188, 190, 201, 214, Walther, E., 101
222, 232, 233, 239, 243, 244, Wang, X. T., 91
246, 248, 254, 260, 266, 270, Wangenheim, F. V., 26
271, 275, 281, 285, 304, 338, Ward, J. E., 438, 443
341, 361, 370, 373, 414, 427, Warren, J., 450
455, 459, 489 Wascoe, N. E., 83
Todorov, A., 140 Wason, P. C., 324, 328
Tooby, J., 437 Wasserman, E. A., 317, 319
Toppino, T., 140 Wasserman, W., 192
Tosteson, A. N., 430 Watson, S. C., 90
Towle, A., 447 Weber, E. U., 128
Tränkle, U., 107 Weber, M., 128
Trump, D., 380, 385 Weil, H. B. M. van de, 441
Tucker, W., 205 Weinbacher, M., 442
Turner, N. E., 425 Weinstein, M. C., 434
Tversky, A., 20, 27, 81, 92, 94, 95, Weinstein, N. D., 449, 450
96, 97, 121, 138, 164, 246, 248, Weisberg, S., 204, 205
259, 488 Weiss, H., 40
Tweney, R. D., 324 Welch, G., 430, 445
Wellington, J. W., 204
Ubel, P. A., 431 West, P., 191
Uexküll, J. von, 18 West, R., 168, 336
Uhl, K. P., 117 West, R. F., 23
Underwood, B. J., 288 West, S. G., 191
Uppal, R., 4, 10, 492 Weston, N. J., 189, 217, 271,
Urbach, P., 309 289, 290
Whelan, J. P., 431
Vanderbilt, T., 455, 465, 467 Whelan, T., 431
Van der Goot, D., 454, Whiskin, E. E., 118
466, 467 Whittlesea, B. W. A., 153
Van Marrewijk, C., 380 Widrow, B., 283
566 NAME INDEX
Wiegmann, D. D., 455 Yamagishi, K., 449

Wilks, S. S., 70 Yaniv, I., 64
Willemsen, M. C., 20 Yates, J. F., 304
Williams, T. M., 405 Yee, M., 20
Wilson, D. K., 451
Winkielman, P., 153 Zacks, R. T., 288, 301
Winman, A., 95, 97, 247 Zajonc, R. B., 142
Wittenbaum, G. M., 337, 338, 358 Zakay, D., 226
Wixted, J. T., 324 Zald, D. H., 64
Wlaschin, J., 451 Zapka, J. G., 445
Woike, J. K., 361, 402, 403 Zedeck, S., 71
Woloshin, S. W., 14, 429, 430, Zellner, A., 205
445, 449, 495 Zerssen, D., 228
Wood, D. O., 205 Zhang, S., 37
Woodley, W. L., 204 Zimmerman, J., 288
Woodward, A., 442 Zipf, G. K., 382, 386, 388, 405
Wottawa, H., 366 Zola, I., 421
Wübben, M., 26 Zuckerberg, M., 379
Wyer, J. A., 335 Zwick, R., 450
Subject Index
1/N rule, 4–5, 15–16, 19, 25, 492. ACT-R. See Adaptive Control of
See also investment Thought–Rational (ACT-R)
definition, 4, 10 adaptive coin-flipping, 238
1R rule, 264 Adaptive Control of Thought–
Rational (ACT-R), 151,
abortion, 428 154–161, 259, 404. See also
absolute risk reduction, 442, 444. rational analysis of memory
See also relative risk adaptive decision maker, 21, 26,
reduction 33, 155, 226
accidents, 105–107, 204, 264, 388 adaptive function, 108, 141,
accountability, 304 145–146, 165. See also
accuracy, 33. See also evolution
generalization; overfitting; adaptive toolbox, 11, 20, 27, 46,
robustness 50, 59, 118, 164, 217,
cumulative (online), 284 221–222, 226, 238, 240, 245,
fitting, 37, 43, 250, 376–377 357, 402, 415, 488, 489, 493.
offline (batch learning), 284–285 See also heuristics
predictive, 43–45 adjustable power tool, 240
achievement motive, 227–228 admissions (college), 65, 326–327
action orientation, 227–228 adversarial collaboration, 79
activation (in memory, advertising, 263, 422
ACT-R), 154ff affect, 86, 142. See also emotions
567
568 SUBJECT INDEX
age effects (on heuristic use), availability (of information), 126

20–21, 139 availability heuristic, 85, 121,
agent-based model, 457, 459–461 134, 164–165, 259
aggression, 101–103 avalanche, 264
agreeableness, 227–228. See also average, 72, 105. See also mean
personality traits, Big Five average, better than, 105
AIDS, 324–325
airline, 137–138 babies, 287
airport, 135–136 ball-catching, 5–7, 28–29
Algeria, 418 baseball, 6, 28–29
algorithmic level (Marr), 488, 490 base rate, 91–92, 294, 436
aliens, 220, 222 ignoring, 288, 295–296
allergy, 316 base-rate fallacy, 91–93
alphabetization, 415. See also Bayes’s rule, 27, 92, 100, 433,
lexicographic rule 434, 436, 447, 494
alternatives (number of). See Bayesian inference, 13,
environment structure, 434–435, 438
number of alternatives Bayesian models, 4, 24, 27, 309ff,
America. See United States 368–369, 425, 490, 492.
animal cognition, 8, 12, 13, See also Bayesian network;
287–288. See also pigeons; naïve Bayes
monkeys; rats Bayesian network, 361
anterior frontomedial cortex beer, 117
(aFMC), 134. See also brain benefit–cost theory, 409, 414.
antibiotics, 365–366, 371 See also cost–benefit analysis
ants, 12 best member rule, 355–357
approval vote, 357. See also bias
group decision making, cognitive, 81, 84–86, 316, 319,
social combination rule 320, 324, 421
area estimation, 12 primary, 84–85, 87–89
ARMA (auto-regressive moving secondary, 84–86
average) model, 67. See also statistical, 17, 46–51,
forecasting 53–58, 72
artificial environments (creating), bias–variance dilemma, 35,
195–197, 291, 341–342, 46–51, 58–60, 491
345–346 Big Five. See personality traits,
artificial intelligence, 360. See Big Five
also machine learning billionaires, 379–381, 385
as-if models, 6, 489–490, 493 binary environment. See
aspiration level, 246, 455, 462– environment structure,
471, 483. See also satisficing binary
associative learning. See learning, biodiversity. See data sets,
associative biodiversity
atomic bomb. See Manhattan biology, 371
Project biopsy, 440–441, 447
attention, 61 birds, 40. See also peahen; pigeons
Austria, 409–411, 418 block-count heuristic, 469–471,
autocorrelation. See environment 473, 476, 481–483
structure, autocorrelation blood clot. See stroke; thrombosis
SUBJECT INDEX 569
books, 382, 391. See also textbookscausal selection task (Wason),

botulism, 84–85 328, 332
bounded rationality, 273, 310, causal theories, 75
495–496. See also heuristics, chance. See randomness
fast and frugal; satisficing change (adapting to), 79. See also
brain, 144. See also neuroimaging environment structure,
brainstorming, 337. See also dynamic
group decision making chess, 17, 491
breast cancer, 327–328, 373, Chicago, 384, 388
430ff. See also cancer; Chile, 418
mammography screening choice environment, 411. See
building blocks (of heuristics), also environment structure
8, 251–252, 275, 340, citations (publication), 382–383
363, 385 city population, 19, 22, 42–44,
decision rule, 8, 252, 275, 119, 123, 130–133, 135–138,
363, 385 168–171, 175, 178–183,
dependencies between, 265 249–250, 382, 384, 403, 405.
search rule, 8, 45, 251–252, See also German cities
275, 363, 385 classification, 251, 360ff. See also
stopping rule, 8, 43, 45, categorization
244–245, 251–252, 275, 363, classification and regression tree
385. See also stopping rule (CART), 42–45, 53–57, 360,
burglary, 23–24 376–378
business, 66–67, 335 classification tree, 261, 360ff. See
also classification and
C4.5, 42–45, 53–57 regression tree
cancer, 13–14, 64–65, 429ff, 494. climate change, 38–40, 124. See
See also breast cancer; also temperature; weather
prostate cancer screening; clinical prediction. See
screening prediction, clinical
candidate count rule, 459, 469 clinical psychology. See
car-count heuristic, 468–471, 473, psychology, clinical
476, 481 coefficient of variation (CV), 393,
carnivore, 405 396–397. See also
car parking. See parking environment structure,
CART. See classification and variance or variability
regression tree co-evolution, 15–16, 427.
casinos, 19, 419–426 See also evolution
catastrophe, 86, 90. See also risk coffee, 274
perception, dread risk cognitive capacity, 229–231, 238,
categorization, 29, 316, 360, 361, 448–449
400. See also classification cognitive effort, 73, 155, 239.
additive-weighting models, 29 See also effort–accuracy
exemplar models, 29 trade-off
caterpillars, 40 cognitive illusions, 81, 101, 438.
cats, 187, 316 See also bias; fallacies
causal attribution theory, 244 cognitive limitations. See
causal reasoning, 305, 316, constraints
327–332, 334, 373 cognitive load, 140, 230–231
570 SUBJECT INDEX
coin, 422 confirmation bias, 93, 102–103, 324

college, 125, 326–327. See also confirmation model (CONF), 78,
admissions 263. See also take-two
collision avoidance, 8 conflict of interest, 426
committee. See group decision conjunction fallacy, 438
making conjunctive rule, 263
communication. See doctors, connectionist model. See neural
communication with network
patients; framing; group conscientiousness, 227–228. See
decision making, social also personality traits, Big Five
communication rule; risk consensus, 178–181, 183
communication conservatism (judgment), 100.
comparative testing. See model See also underconfidence
comparison constitutions, 418
comparison. See two-alternative constraints, 26, 60, 109–110, 248,
choice 310. See also information,
compensation index, 200–201 limited; memory, constraints;
compensatory strategy, 20, 138, optimization, under
200, 202, 224, 343, 358, 416. constraints
See also Dawes’s rule; benefits of, 166
Franklin’s rule; heuristics, limited computational capacity,
compensatory; naïve 454–455
Bayes limited memory, 108–109,
as default, 230–231, 238 165, 302
competition. See mate limited time, 446. See also time
competition pressure
complexity, 61–62 consumer behavior, 27, 263, 288,
of models, 37, 66–68 383. See also customer
compliance rate. See behavior; shopping
mammography screening, contingency, 101–103, 313
compliance rate contingency model, 230–231, 239
computation costs, 26, 302 contraceptives, 428–429
computational level (Marr), cooperation, 10
488, 490 coordination, 413, 416, 418
computational models, 28 coronary care unit (CCU),
computer science, 276, 278–280 362–365, 367, 495. See also
conditional probabilities, heart disease
432–434, 436–437. See also correlation. See also covariation
negative predictive value; assessment
positive predictive value; detection of, 108
sensitivity; specificity illusory, 108
conditioning event (in cost–benefit analysis, 226, 249.
informativeness), 332–333 See also benefit–cost theory
Condorcet jury theorem, costs. See information, costs;
176–177, 336 search, costs; switching costs
confidence, 93, 95, 140, 153, 181. count rule (ordering). See self-
See also overconfidence organizing sequential search,
(bias); underconfidence count (tally) rule
SUBJECT INDEX 571
covariation assessment, 311, difference from sequential

316–321, 327, 333. See also search, 280
correlation experimental study, 289–302
cell frequencies in, 317–321 move-to-front, 282–286,
joint absence/presence, 298–302
317–319, 332, 333 random, 299–302
phi coefficient, 319 selective move-to-front,
cramming. See studying (school) 282–286, 298–302
credit scoring, 65 sensitivity to experience, 287
crime, 220, 232–237, 242 simple swap, 282–286,
criterion (knowledge of), 116, 298–302. See also self-
129–130 organizing sequential
cross-validation, 42–45, 66, 195, search, transpose rule
250, 366, 401. See also tally, 282–286, 288, 298–302,
accuracy, predictive; 303–304, 306
generalization tally-swap, 282–286, 298–302,
cue 303–304
binary, 76–77, 384, 388 for trees, 372–375
conditional dependency. validity, 281–286, 299–302,
See environment structure, 303–306
conditional dependency validity, poor performance of,
between cues 302–303
continuous, 77 cue ranking. See cue-ordering
correlation, 17–18, 55, 77, rules, for trees
187–188, 190, 192–193, 207, cultural transmission, 304.
262, 270, 305. See also See also learning, social
environment structure, culture, 409
redundancy cumulative dominance, 76–77
discriminating, 210–213, 252, cumulative lifetime risk
258, 261, 281–283. See also (breast cancer), 451
discrimination rate customer behavior, 26–27.
incorrect, 260 See also consumer behavior
misleading, 422–426 cutoff rule, 459, 468
profile, 367–370, 372
proximal, 34 data, 216
validity. See validity data fitting. See accuracy, fitting
weight, 41 data mining, 375
cue order, 41, 76, 249, 274ff data records (ordering).
current, 297 See self-organizing
learning, 254, 274ff sequential search
movement in, 297–298 data sets, 161, 203–206, 275, 344,
optimal, 249–250, 372 376–377. See also German
stability, 302 cities (data set)
unconditional, 295–296 20 real-world environments,
cue-ordering rules, 278, 281–288. 269, 344–345, 388–393
See also self-organizing athletes, 161–162
sequential search billionaires, 161–162, 379–381
delta rule, 282–286 biodiversity, 43, 45, 205
572 SUBJECT INDEX
data sets (Cont.) discrimination rate, 254, 257,

German companies, 161–163 286–287, 289, 419. See also
house prices, 43–44, 204 cue, discriminating; search,
mammal life-span, 43, 45, 204 by discrimination
medical, 376–377 negative correlation with
music sales, 161–163 validity, 254–255, 287
oxygen in dairy waste, 393–396 discussion (in group), 181, 336,
U.S. cities, 161–163 337, 338, 358
U.S. fuel consumption, diseases, 115, 125–126, 130, 142,
393–396 206, 253, 332. See also breast
Dawes’s rule, 193, 197ff, 226, 229, cancer; cancer; heart disease;
233, 235–236, 252, 254, 258, pneumonia
262, 267. See also tallying disruptive selection, 475
death (causes of), 84–90, 414, distance-and-density heuristic,
441–444. See also accidents; 482–483
diseases distributed practice. See
DEBA. See elimination-by- environment structure,
aspects, deterministic massed vs. distributed
decision rule, 8, 244. See also distribution. See also
building blocks, decision environment structure;
rule J-shaped distribution; wealth,
decision speed, 257, 416. See also distribution
response time binomial, 174–175
decision time. See response time moments of, 83ff. See also
decision tree, 42, 57. See also environment structure,
C4.5; classification tree; mean; environment structure,
classification and regression skew; environment structure,
tree variance or variability; mean;
declarative memory. See memory, skew; variance
declarative skewed, 106–108. See also
Deep Blue (chess), 17, 491 environment structure, skew;
deer (red), 241, 243, 246, 250, 260 J-shaped distribution; skew
default, 409, 412–414, 425 symmetrical, 105–107
default heuristic, 410–413, 494 doctors (physicians), 14, 64–65,
definition, 10 75, 253, 361–366, 368–369,
defensive decision making, 14. 428ff, 494–495
See also lawsuit beliefs about patients, 448–452
delta rule. See cue-ordering rules, communication with patients,
delta rule 363–364, 428ff
depression, 439 domain, 123–126, 152, 382.
descriptive models, 487–489, See also data sets;
495. See also normative environment structure
models geographic, 123
design. See environment design; dots illusion, 80–81
institutions, design of drivers, 19, 105–107, 242, 246,
detection behavior, 450–451 247, 250, 411, 416–417,
development, 20–21 454ff. See also parking;
diagnosis, 251, 264, 429ff traffic, right-of-way rules
SUBJECT INDEX 573
ease of retrieval, 133, 236, 259. environment structure, 5, 9–10,

See also fluency 16–19, 73, 80, 83, 189, 245,
ecological analysis, 82, 91, 497. See also artificial
122–126, 161 environments (creating);
ecological rationality, 3, 9–10, domain; payoff structure;
14–16, 22, 51, 344, 398, 425, social environment
429, 488 abundant information, 344
definition of, 3 autocorrelation, 481, 483–484
methodology for studying, 15, binary (noncompensatory),
27–30, 219–221, 273, 52–55, 57–58
493–494. See also co-adapting, 19
information matrix compensatory information,
normative perspective, 492–495 53–58, 76, 78, 223–225, 230,
ecological structure. See 269–270, 354
environment structure conditional dependency
economics, 66–67, 90, 94, 409, 417 between cues, 46, 52–58,
standard theory of choice, 410, 257, 361, 373
412, 413, 414, 420, 422, costly cues, 256–257,
425–427, 489–490 270–272, 373
effort–accuracy trade-off, 26–27, created by others, 456,
33–34, 41, 46, 60, 74, 230, 462–465, 472–473, 478–482
239, 391. See also cognitive degree of uncertainty, 5, 16–17
effort dispersion of cue
election, 125, 128. See also discrimination, 257
politics dispersion of cue validities,
electrocardiogram, 362, 363 190, 195–199, 202, 256–257,
electronic mail, 147 269–270, 344–354, 402
elementary information dynamic (changing), 239,
processes, 155. See also 258–259, 269, 280, 304, 459,
adaptive decision maker 478–482
elimination-by-aspects, 246 error, 76–77, 86–87, 97–98
deterministic (DEBA), 76–77, 262 friendly vs. unfriendly, 190, 215
El Paso, 181 Guttman (compensatory),
emergency, 369 53–58, 262
emotional stability, 227–228, in heuristic selection, 22
320–321. See also J-shaped distribution, 19,
personality traits 344–354, 357, 358, 380ff
emotions, 304. See also affect; fear massed vs. distributed, 149–151
employment agency, 366 mean (first moment), 83–86,
engineer–lawyer problem, 92 91–96
engineers, 91–92 memory and, 146–151
Enlightenment, 12 noncompensatory information,
environment design, 19, 27, 51–53, 76, 78, 223–225,
409ff, 428ff 229–230, 261–262,
environments, 203–208. See also 268–269, 354
domain; information number of alternatives, 5, 17
environment redundancy, 17–18, 58, 78,
natural (list of), 203–206 187ff, 256–257, 262, 270
574 SUBJECT INDEX
environment structure (Cont.) disruptive selection; natural

scarce information (object-to- selection
cue ratio), 262, 344, 354, 393, evolutionarily stable strategy
397–398 (ESS), 474, 475–478
size of learning sample, 5, 17 evolutionary algorithm, 461,
skew (third moment), 83, 474–477
90–91, 105–109, 320, 379ff, evolutionary game theory, 413
396–399, 401–404 evolved capacities, 8, 11
sources of, 18 EW. See linear model,
uniformly distributed criterion, equal-weight
401–404 exemplar model, 29, 42–43, 386,
variance or variability (second 401, 404. See also
moment), 18, 83, 86–90, categorization; nearest
96–104, 393, 396–399 neighbor classifier
equality heuristics, 8 exit node (decision tree), 367
equal-weight model. See linear expected utility maximization,
model, equal-weight 12, 13, 24, 248, 424, 488
equilibria (game theory), 457 expected utility theory, 73, 107
mixed, 465, 472, 474–477, 483 experience-based sampling. See
Nash, 463–467, 470–473, sampling process,
475–476, 482–483 experience-based
error. See also environment experimental tests, 79, 216ff,
structure, error; estimation, 244–245, 273
error expertise, 78, 337, 339, 356–357
decision, 79, 216, 220–221, experts, 23–24, 29, 63–65, 72,
248, 303–304, 417, 475. 129, 304, 311, 351, 433, 440.
See also Type III error See also doctors
prediction, 46–47, 51, expert systems, 28
58–59, 72 exploration, 23–24, 29,
error rate (test), 432–433, 438, 445, 253–254, 305
446. See also false-negative extinction, 91, 126
rate; false-positive rate extraversion, 227–228. See also
ESS. See evolutionarily stable personality traits
strategy
estimation, 84–86, 141–142, 251, face, 145
379ff, 421. See also fallacies, 29, 80–81, 84, 248, 488.
frequency, estimation of; See also base-rate fallacy;
overestimation; QuickEst conjunction fallacy;
heuristic; underestimation naturalistic fallacy
error, 491 false fame effect, 153
estimation tree, 386–392, 395–399. false-negative rate, 432, 445–446
See also classification tree false-positive rate, 364–365,
Europe, 430, 439. See also entries 432–433, 436, 437, 440, 444,
for individual countries 445–446
evidence-based medicine, 453 fan effect, 133
evolution, 39, 109, 118, 141, 165, fast and frugal heuristics. See
250, 276, 304, 437, 456, 459. heuristics, fast and frugal
See also adaptive function; fast and frugal tree, 25, 360ff, 495
SUBJECT INDEX 575
constructing, 372–375 format effects (statistical

frugality, 367 reasoning), 429
MaxVal rule for constructing, framing (communication),
374–377 450–451
Zig-Zag rule for constructing, Franklin’s rule, 192–193, 197ff,
375–377 224–226, 229, 233, 235–236.
speed, 367 See also weighting and adding
fast food industry, 420 frequency, 121–122, 151,
fear, 304, 452 154–155, 160, 164, 259, 440
feedback, 22, 223. See also absolute, 441
learning, by feedback estimation of, 84–89, 287–288,
FIFA, 124, 417–419. See also soccer 301, 380, 423–424
financial crash, 487, 492. See also natural. See natural frequencies
investment normalized, 436–437
fish, 8 relative, 436–437
fitting. See accuracy, fitting frequency-validity effect, 140
fixed-distance heuristic, 458, frugality (of information use),
462–471, 473, 475–478, 195, 201–202, 257, 261, 272,
482–483 274–275, 278–280, 284–287,
flat maximum, 23, 156, 476 361, 367, 419. See also fast
fluency (of judgments), 121–122, and frugal tree; heuristics,
126, 133, 153, 164–165, fast and frugal; stopping rule,
259–260. See also ease of one-reason
retrieval fundamental attribution error, 91
fluency heuristic, 21, 122, 142
definition, 9, 153 gain frame, 451. See also framing
retrieval time in, 159–163, 260 Galapagos biodiversity. See data
use of, 161–163 sets, biodiversity
fluency validity, 161–162 gambles (choices between), 20, 270.
fMRI. See neuroimaging See also priority heuristic; St.
food, 90, 118, 276, 304, 405 Petersburg paradox
fool, 144, 251 gambling, 19, 419–426. See also
football. See soccer slot machines
foraging, 90, 118, 423, 455. See game theory, 414, 456ff. See also
also information, foraging equilibria; evolutionarily
Forbes magazine, 379–382 stable strategy; evolutionary
forced-choice paired comparison. game theory; tit-for-tat;
See two-alternative choice ultimatum game
forecasting, 39, 66–69, 72, Gauss/Markov theorem, 59
128–129, 309–310, 312–315, gaze heuristic, 6–8, 28–29
439–440. See also prediction; definition, 6, 10
weather modified, 7
forgetting, 144ff, 247, 258. generalization, 24, 37, 52, 59,
See also memory 194–195, 250, 361, 369,
benefits of, 145, 156–160, 376–377, 401, 492. See also
163–164 accuracy, fitting; cross-
in word processors, 146 validation; overfitting;
fox, 405 robustness
576 SUBJECT INDEX
general purpose mechanism, 24, headlines, 147–150. See also

46, 50, 59 newspaper
genetic counseling, 450 health, 84, 331–332, 428ff
genotype, 317–318, 322 health care, 13–14, 250, 362, 413,
germ theory (Pasteur), 75 428ff, 494–495. See also
German cities (data set), 42–44, diseases; doctors; heart
114, 115, 119, 127, 128, 135, disease; organ donation; pill
137, 156–158, 169–171, 203, scare; psychology, clinical
260, 277–279, 284, 386–387. heart disease, 25, 84, 361–365,
See also city population 367–370, 430, 495
Germany, 105, 409–411, 413, 416, heart disease predictive
418, 443 instrument (HDPI), 362–365
God, 3, 109, 496–497 height (of sons and fathers), 89, 99
goodness of fit, 35–37 heuristics, 4, 7, 9–10, 73–74,
go-to-end strategy, 472 487ff. See also entries for
green electricity, 412 individual heuristics
group decision making, 18, 127, compensatory, 20. See also
167ff, 335ff, 417–419. compensatory strategy
See also discussion fast and frugal, 17, 261, 338,
benefits and risks, 336–337 358, 414
distribution of knowledge in, in groups. See group decision
348–353, 357–358 making
incomplete knowledge in, noncompensatory, 20, 338.
348–350 See also noncompensatory
individual decisions in, 336, strategy
339–340, 342–343 normative study of, 487ff
resources in, 337–338 selection of, 21–22, 226. See
shared information in, 348, also strategy selection
350–353, 354, 358 selection of by environment, 22
simulation of, 171–177, selection of by learning, 22
341–353 selection of by memory, 21
social combination rule, hiatus heuristic, 27
339–340, 342, 355–357 hidden-profile effect, 337, 349,
social communication rule, 355. See also group decision
339–340, 355 making, shared information in
test of, 178–182 high school, 320–321, 388
group-think, 337, 339 history, 409
guessing, 172, 180, 235–236 hit rate, 364
Guttman environment. See HIV test, 444. See also screening
environment structure, hockey, 125, 128. See also sports
Guttman hopeful monster (mutation),
474, 475
handball, 242, 246, 259. See also house buying, 187. See also data
sports sets, house prices
hard–easy effect, 95–97, 100–101. Hurricane Katrina, 90
See also overconfidence hypothesis testing, 69, 101–103,
(bias) 310ff, 438. See also
head butting, 241. See also deer contingency
SUBJECT INDEX 577
active, 327–332 costs, 189, 223, 238, 262–263,

passive (evaluation), 322–327 270–272, 289, 360. See also
rarity–sensitive heuristics, environment structure,
331–332 costly cues; relative
information cost; search,
if–then rules (in ACT-R), 155 costs
ignoring information, 3–4, 7, 17, foraging, 242, 248. See also
20, 33–35, 41, 46, 50, 58, 60, foraging; search, for cues
73, 118, 164, 167ff (information)
ill-defined problem, 491 limited, 61, 399, 415, 454.
illusion of control, 105 See also constraints
illusions, 80–81. See also overload, 19
cognitive illusions; illusion representation. See
of control representation of information
illusory correlation. See scarcity, 390–392. See also
correlation, illusory environment structure,
imbalance (environment). scarce information
See environment structure, search. See search, for cues
skew; see also J-shaped (information)
distribution; power law statistical, 431–446, 448.
imitation, 251. See also learning; See also conditional
mate choice, copying probabilities; relative risk
imitate the majority, 10, reduction; single-event
339, 413 probabilities
imitate the successful, 10, information environment, 429ff
339, 423 information matrix
impotence, 439 (methodology), 269
impression management, 227–228 information processing cube,
impulsivity, 227–228 338–342, 356
incentives, 79 information theory, 262, 309, 329
income distribution. See wealth, informativeness, 313–316, 326,
distribution 332–333
income tax, 414, 418 expected, 328, 330–332
individual differences (in rarity as cue to, 332
heuristic use), 22–24, 29, informavore, 243, 248. See also
135–137, 221, 226–231, information, foraging
300–301, 348, 475, 483 informed consent, 431, 448, 451,
infarction. See heart disease 452. See also organ donation,
inference, 21 explicit consent
from givens, 114, 134, 232, 243, innumeracy, 429–430, 448–449.
244–245, 266–268, 402, 404 See also numeracy
from memory, 114, 134, 232, institutions, 19, 276, 409ff
244, 267, 384, 402, 404 design of, 19, 409ff. See also
information environment design; health
conflict, 188, 215. See also cue, care; risk communication
correlation; environment insurance, 90
structure, friendly vs. intelligence, 30, 229, 238.
unfriendly See also cognitive capacity
578 SUBJECT INDEX
intuition, 484, 488 Las Vegas, 422. See also gambling

intractability (of optimization), laws, 409–410, 426
249, 491. See also lawsuit, 14, 362, 448
optimization, infeasibility of lawyers, 14, 91–92, 366, 426. See
intransitivity, 194. See also also legal decision making
minimalist heuristic; take-two learning, 29, 316. See also
introspection, 457 reinforcement learning
invasion (game theory), 477. algorithm, 40–45, 50, 57
See also evolutionarily stable associative, 283. See also
strategy neural network
investment, 4, 15–16, 25, 90, 94, batch, 373. See also accuracy,
128, 492 offline (batch learning)
irrationality, 3, 46, 81, 311, 334, by feedback, 22, 281, 373, 447,
398, 421, 487, 488, 496. 464, 467
See also bias; fallacies individual, 250–251, 336
is/ought schism, 30, 487–488, lack of, 225–226. See also
494–496. See also routine (habit) effects
naturalistic fallacy multiple cue probability, 305
online, 374. See also accuracy,
jackpot, 423–425. See also cumulative (online)
gambling operant conditioning, 251
jelly beans, 72 social, 250–251, 258, 304. See
job applicant, 366. See also also imitation
search committee (hiring) trial-and-error, 23, 29, 457
J-shaped distribution, 19, 357, learning curve, 53
380ff. See also environment learning-while-doing, 281, 284,
structure, J-shaped 287, 303, 305
distribution; power law; least squares method, 40, 59, 69
skew legal decision making, 305, 371,
of cue validities, 344–354, 358 426–427. See also laws;
just noticeable difference lawyers
(JND), 160 lenses (Brunswik’s), 34
leprosy, 130. See also diseases
killer whale, 405 less-is-more effect, 9, 26, 41–43,
knowledge. See also criterion 57, 74, 109, 119–120, 128–
(knowledge of); group 129, 169–171, 176–178,
decision making, distribution 181–182
of knowledge in; rarity, between groups, 181–182
knowledge of; recognition, prevalence of, 170, 176–177
knowledge; source strong vs. weak, 170, 176–177
knowledge lexicographic classifier, 370–372
task-specific “maps,” 78 lexicographic rule (strategy),
knowledge validity, 118–120, 74, 78, 173, 223, 246, 249,
169–171, 177–178. See also 270, 272, 274, 281, 344, 357,
validity 402, 415–419. See also
take-the-best
language use, 320, 326–327 knowledge-first, 174ff
Laplace’s demon, 16 recognition-first, 173ff
SUBJECT INDEX 579
Library of Congress, 391 definition, 172

library search. See search, library ecological rationality of,
light (source), 80 354–357
likelihood ratio, 313. See also log knowledge-based, 173ff
likelihood ratio recognition-based, 173ff
linear model, 51–52, 59, 219, malpractice, 448. See also lawsuit
371–372. See also Dawes’s mammal life-span. See data sets,
rule; Franklin’s rule; mammal life-span
multiple linear regression; mammogram. See mammography
tallying screening
equal-weight, 59, 70–74, 78 mammography screening, 373,
random-weight, 69–70 429ff
unit-weight, 9, 70–71, 258, compliance rate, 430
342–343 pamphlets, 443–446, 453
linear-operator heuristic, 469– risks and benefits, 430,
471, 473, 476–477, 481–483 446–447, 449–450, 452
linear regression. See multiple Manhattan Project, 383–384
linear regression mapping model (or heuristic),
list ordering. See cue order, 380, 400–402, 404
learning; self-organizing market (of ideas), 79
sequential search marriage, 13. See also mate choice
littering, 414 massed practice. See environment
loan decisions, 268–269 structure, massed vs.
local mental model, 116–117, distributed
130. See also probabilistic matching bias, 324
mental model mate choice, 13, 260, 456, 459
logic, 24, 487–488, 496 copying, 13
logical rationality, 15 mate competition, 241
logistic regression, 191–192, mate search, 455, 459
197ff, 362–366, 376–377 maximization of expected utility.
log likelihood ratio (LLR), See expected utility
313–315 maximization
expected, 330–331 maximum likelihood
London, 35–39, 46–47, 382 estimation, 220
long tail (distribution), 107, 383. maximum validity (MaxVal) rule.
See also J-shaped See fast and frugal tree,
distribution; skew MaxVal rule for constructing
loss frame, 451. See also framing mean, 72, 83–86, 91–96, 105–107
means–ends reasoning, 488
machine learning, 42–45, 264, mean–variance portfolio, 4–5,
360, 376–378 491, 492. See also
macrolides, 365–366. See also investment
health care media, 86, 115, 125–126, 329,
majority (in a group), 103 429, 430, 443, 446. See also
majority rule, 168, 172ff, 336, headlines; mammography
342–343, 350, 354–357. See screening, pamphlets;
also group decision making, movies; New York Times;
social combination rule newspaper; television
580 SUBJECT INDEX
medial parietal cortex, 134 miscalibration, 97–101. See also

median, 106–107 hard–easy effect;
medical decision making, 75, 250 overconfidence
medical treatments, 19. See also mismatch (behavior–environment),
breast cancer; prostate cancer 40, 109, 399–400, 421, 426.
screening See also mind–environment
medicine, 250, 304, 428–429. match
See also evidence-based mistakes. See error, decision;
medicine; health care; see also fallacies
paternalistic medicine mixed equilibrium. See
memory, 21, 113ff, 266–267. equilibria, mixed
See also forgetting; mnemonist, 144–145
mnemonist; recall; model comparison, 29–30
recognition; spacing effects model selection, 41
constraints, 21, 108–109. See models. See agent-based model;
also constraints as-if models; Bayesian
content (capacity), 391 models; computational
decay, 154–158, 469 models; descriptive models;
declarative, 154 linear model; normative
perfect, 144–145 models; polynomial model;
procedural, 154–155 process models; see also
retrieval, 232–237, 239, 383 entries for individual models
search. See search, in memory monkeys, 287, 304
short-term, 108–109 moral algebra, 12–13. See also
skewed structure, 406 Franklin’s rule
working, 108–109 moral philosophy, 488
memory-based heuristics. more-is-better, 3, 12, 26, 33, 230,
See fluency heuristic; 337, 338, 358, 494
recognition heuristic mortality. See death
mental model, 116–117. See also motivation, 82, 83, 85, 91, 94, 95,
local mental model; 227, 304, 336
probabilistic mental model move-to-front rule. See cue-
mere exposure effect, 142 ordering rules, move-to-front;
metaphors, 240 self-organizing sequential
methodology, 310–311. See also search, move-to-front rule
ecological rationality, movies, 383, 420
methodology for studying multiattribute utility theory, 73
military, 356 multiple linear regression, 33, 41,
mind–environment match, 5, 15, 59, 69–71, 192, 254, 258,
30, 81, 109, 427, 446 262, 386–392, 395–399, 401,
minimalist heuristic, 253, 261, 403, 404, 419, 491
275, 278–279, 284–286, murder case. See crime
342–343, 347–352, 355–356 mushrooms, 118
minimax heuristic, 264 music, 382, 423
minorities (moral judgments mutation, 474–477
about), 103–104
minority (in a group), 173, 180 naïve Bayes (model), 59, 192, 197ff,
mirrors (Shepard’s), 34 373. See also Bayesian models
SUBJECT INDEX 581
naïve fast and frugal tree. See fast recognition, noncompensatory

and frugal tree use of; take-the-best; take-two
Nash equilibrium. See equilibria, normalization (frequency). See
Nash frequency, normalized
natural frequencies, 434–438, normative models, 73, 107,
447–448, 495 487–488. See also
natural frequency tree, 367–369, descriptive models;
374, 376. See also heuristics, normative
classification tree study of
natural selection, 34. See also novelty robustness. See
evolution robustness, novelty
natural theology, 497. See also novices (laypeople), 23–24, 129,
omnipotence; omniscience 231, 430, 434–435, 452
naturalistic fallacy, 494. See also NP-completeness, 249
is/ought schism NP-hard problem, 372
nearest neighbor classifier, 42–45, null hypothesis testing, 244
53–57. See also exemplar number needed to treat, 442
model numeracy, 449, 453. See also
need for cognition, 227–228. innumeracy
See also personality traits
negative predictive value, oil drilling, 209–210, 290–292
432–433. See also positive omnipotence, 496–497
predictive value omniscience, 3, 46, 49, 249,
neoclassical economic theory. 347–348, 496–497
See economics, standard one-bounce rule, 9
theory of choice one-reason classification, 361,
Netherlands, 412–413 370, 372. See also fast and
neural network, 42–43, 283 frugal tree
neuroimaging, 22, 134 one-reason decision making, 27,
New York, 94, 439 252, 261, 274ff, 306, 361,
New York Times, 147–150 363, 370
newspaper, 156. See also openness, 227–228. See also
headlines; media; New York personality traits
Times optimal asset allocation, 4. See
Neyman–Pearson decision theory, also investment
244, 490 optimal foraging theory, 90. See
Nobel Prize, 255 also foraging
node (decision tree), 367 optimal stopping problem, 458.
noise, 46–47, 50, 58. See also See also parking problem
error optimism, 86, 105. See also
noncompensatory information. personality traits
See environment structure, optimization, 3, 12, 24–25, 249,
noncompensatory 414, 490, 496–497
information infeasibility of, 490–491
noncompensatory strategy, 20, 117, under constraints, 26, 33, 249,
134, 138, 174, 202, 224, 272, 495–496
343, 372, 415–419. See also opt-in/opt-out, 410–412, 494. See
heuristics, noncompensatory; also default; organ donation
582 SUBJECT INDEX
order effects, 248 strategy competition, 477–482

organ donation, 409–415, 422, travel time, 462–463, 465,
425, 494 466–467, 470, 474
explicit consent, 410–411 parking problem, 458, 466
presumed consent, 409–411 participation rate. See screening,
outcome measures, 219, 232–235 participation rate
out-of-population. See prediction; paternalistic medicine, 431, 448
robustness patients, 75, 250, 361–366,
out-of-sample. See prediction; 368–369, 431ff. See also
robustness doctors; health care
overconfidence (bias), 93–101, payoff structure, 229, 238, 421.
105, 438 See also environment
overestimation, 84–86, 89, 421, structure
445, 446, 449, 452 insensitivity to, 224
overfitting, 50, 53, 194, 376, 455. peahen (mate choice), 8
See also robustness pectinate tree. See rake (tree)
perception, 34, 80–81, 361. See
pair-comparison task, 189–190 also risk perception
pamphlets. See mammography performance-contingent payoff,
screening, pamphlets 291. See also payoff structure
parameters personality traits, 226–229, 239,
estimation, 38 317–318, 322
free, 37 Big Five, 227–228
Pareto law, 379–380, 382. See pessimism, 86. See also
also power law personality traits
Pareto/negative binomial phi coefficient. See covariation
distribution model, 27 assessment, phi coefficient
Paris, 38–39, 121, 382 physicians. See doctors
parking, 19, 454ff. See also drivers; physicists, 383–384
optimal stopping problem; pictorial stimuli, 233–234.
search; secretary problem See also representation of
ecological rationality of information
strategies, 478–482 pigeons, 288
emergent environment pill scare (contraceptives),
structure, 478–482 428–429, 442, 453. See also
lot, 460 health care
pricing, 457–458 PIN (personal identification
search performance measures, number), 151
466–467, 471 plane crashes, 86
strategies. See block-count planetary motion, 490
heuristic; car-count heuristic; plurality rule, 355–356
distance-and-density weighted, 355–356
heuristic; fixed-distance pneumonia, 365–366
heuristic; go-to-end strategy; Poisson process, 27
linear-operator heuristic; polar bear, 405
proportional-distance police, 24, 416
heuristic; space-count politics, 125, 335. See also
heuristic; x-out-of-y heuristic election
SUBJECT INDEX 583
polynomial model, 35–40, 47–51 probabilistic mental model (PMM),

population size 95, 116, 219. See also local
animal species, 126, 220, 382 mental model; mental model
city. See city population probability theory, 437
country, 403 Probex, 403
portfolio theory, 90. See also problem solving, 337
investment procedural memory. See memory,
positive predictive value, procedural
432–433, 435–437, 443, process (tracing) data, 138–139,
444, 446, 447, 449 219, 232, 235, 372. See also
positive testing, 102–103, 324. response time
See also search, active process losses (in groups), 337
information process models, 28, 489–490, 493
posterior probability, 315–316. professors, 107, 167, 204, 255
See also Bayesian models proportional-distance heuristic,
power law, 19, 380–383, 386, 405. 468–471, 473, 476, 481
See also J-shaped proportionality (in groups), 355
distribution; Pareto law prospect theory, 248
predator, 304, 404–405 prostate cancer screening, 13,
predictable imbalance 430, 448, 452
(environment). See PSA test, 13, 448
environment structure, skew; proximal cue. See cue, proximal
see also J-shaped proximate mechanism, 28
distribution; power law Prozac®, 439
emergence of, 404–405 psychiatry, 439, 440
prediction, 16, 33, 37, 63–69. PsychInfo, 27
See also forecasting; positive psychology, 30, 82, 487–488,
predictive value 490, 497
clinical, 63–64, 363 clinical, 63, 440
out-of-sample, 25, 66–68, 250. social, 103, 336
See also robustness, out-of-
sample question node (decision tree), 367
out-of-population, 25, 250. QuickEst heuristic, 380, 384ff
See also robustness, out-of- accuracy, 386–387
population definition, 385
statistical, 63–65 ecological rationality of,
preference, 142 393–398
stability assumption, 410 frugality, 387, 388–389
preferential choice, 21, 214–215, robustness, 386–393
251, 270 use of, 398–400
pregnancy, 428
prejudice, 104 radiologists, 432, 447
prevention behavior, 450–451 rain. See weather
primacy, 248. See also memory rake (tree), 371–372, 375
principle of total evidence, 3 random member rule, 355–356
prior probability, 315–316. See randomness, 421, 424
also Bayesian models random weights. See linear
priority heuristic, 20, 264 model, random-weight
584 SUBJECT INDEX
ranking (cues). See cue-ordering food, 117, 118

rules, for trees frequency, 169, 175
ranking (teams), 417–420. in groups, 127, 167ff, 341
See also soccer knowledge, 74
rank–size distribution, 380–382. latency (speed). See fluency
See also environment heuristic, retrieval time in
structure, J-shaped name, 115, 117, 128
distribution; power law noncompensatory use of,
rape, 113 116–118, 134–139, 174, 183
rarity, 309ff. See also not using, 129–134, 141
environment structure, retrieval primacy, 117
skew; skew semantic, 121
assumption, 311 versus frequency
Bayesian analysis of, 312–316 information, 121
of fierce animals, 404–405 recognition correlation, 123
heuristic, 333 recognition heuristic, 11, 19,
knowledge of, 324–325 21, 22, 74, 114ff, 151ff,
in phrasing hypotheses, 167–168, 341
326–327, 334 adaptive use of, 118–119
prevalence of, 320 application rate, 157–159
sensitivity to, 310 definition, 9, 114, 152
rational analysis of memory, individual differences, 135–137
146–148, 151, 248. See also neural basis of, 134
Adaptive Control of use of, 118–119, 127–129
Thought–Rational recognition validity, 22, 114,
rationality, 12, 30, 311, 425, 118–120, 123, 130–132,
487–488, 490, 496. See also 157–158, 169–171, 177–178.
bounded rationality; See also validity
ecological rationality; redundancy. See also
optimization, under environment structure,
constraints; social redundancy
rationality; unbounded effect on strategy use, 209–213
rationality judgment of, 212
differences in, 23 reference class, 119, 121, 439–440,
rats, 118. See also recognition, food 442, 445
recall, 113, 246. See also memory unknown, 130–131
receiver operating characteristic regression. See classification and
(ROC), 365 regression tree; logistic
recency, 151, 154, 248, 258–259. regression; multiple linear
See also memory regression; ridge regression;
recognition, 113ff, 147. See also true regression
memory regression toward the mean,
brand, 117, 138, 422 86–89, 98–100
collective, 122, 124 reinforcement learning, 22, 226, 239
domain, 123–125 reiteration effect, 140, 153
ecological analysis of, 122–126 relative information cost,
episodic, 120–121 262–263, 265, 271–272.
failure of use of, 125–126 See also information, costs
SUBJECT INDEX 585
relative risk reduction, 441–442, novelty, 39–40

444–445, 451. See also out-of-population, 38–39.
absolute risk reduction See also prediction, out-of-
representation of information, population
422, 428ff, 495. See also out-of-sample, 37–39.
pictorial stimuli See also prediction,
verbal vs. numerical, 449–450 out-of-sample
representative sample, 130–131. Rome, 94, 416
See also sampling process routine (habit) effects,
representativeness heuristic, 92 225–226, 238
reputational bets, 79 rules of thumb, 12, 455. See also
response time, 29, 138–139, heuristics
235–237, 267, 357, 403–404. RULEX (rule-plus-exception
See also process (tracing) model), 370
data
retention. See also memory sample (information), 317–321,
function (curve), 146–148 357–358
interval, 147–149 sample size, 5, 17, 43–45, 53–58,
retrieval ease. See ease of 96, 101–103, 108–109, 197,
retrieval 254, 303, 378. See also
retrieval speed (of cue values), environment structure, size
267, 403–404. See also of learning sample
response time sampling process, 92–96, 437
retrieval time (in fluency experience-based, 93–94, 96
heuristic). See fluency nonrepresentative, 94–95
heuristic, retrieval time in random, 95
ridge regression, 59 satisficing, 8, 19, 245–246, 253,
rigidity, 227–228. See also 455, 462, 497. See also
personality traits aspiration level; fixed-
right-of-way. See traffic, right-of- distance heuristic; stopping
way rules rule; threshold rule
risk, 428ff definition, 9
absolute, 428 SAT scores, 326–327. See also
aversion, 420 admissions
risk communication, 428ff science
obstacles to, 446–452 history of, 62
institutional constraints, nomothetic vs. idiographic, 63
446–448, 453 research methods, 216
risk perception, 84–86, 89–90, scientists, 79, 310, 382
105, 429, 449 scissors (Simon’s), 14–15, 30,
dread risk, 86, 90 34–35, 58
unknown risk, 86 screening, 430–431, 452, 494.
robustness, 24–25, 43–46, 60, See also mammography
194–195, 249–250, 369, screening; prostate cancer
386–393, 426, 465–466. screening
See also generalization; participation rate,
overfitting; sensitivity 450–452
analysis sea otter, 405
586 SUBJECT INDEX
search, 241ff, 454ff. See also secretary problem, 458–459.

building blocks; mate search; See also candidate count
parking; satisficing; secretary rule; cutoff rule; mate search;
problem; stopping rule search; successive non-
active information, 101–102, candidate count rule
316, 328 selection (of heuristics or
adaptive use of, 266 strategies). See heuristics,
alternative-wise, 247 selection of; strategy
costs, 210–211, 223–224, 267, selection
270–272, 360, 403. See also self-confirming prophecy, 103
information, costs self-deception, 227–228
by cue accessibility, 260 self-efficacy, 227–228
for cues (information), 26, 246 self-organized criticality, 404
cue-wise, 247, 260, 340 self-organizing sequential search,
by discrimination, 254–255, 258 278–280. See also cue-
Einstellung (mental set), 259 ordering rules; search,
exhaustive, 247–248, 252, 274, sequential
414, 425 count (tally) rule, 280
external vs. internal, 246–247, move-to-front rule, 279–280.
267–268 See also take-the-last
by fluency, 259–260 transpose rule, 279–280.
heuristic vs. optimal, 248–250 See also cue-ordering rules,
Internet, 243, 304 simple swap
library, 27 sensitivity (test), 432–433, 436,
limited, 195, 241ff, 247–248, 437, 443, 445. See also
252, 261, 273, 400, 497 conditional probabilities
in memory, 222, 232–237, 242, sensitivity analysis, 465–466.
246–247, 266–267 See also robustness
for objects (alternatives), 246 sequential statistics (Wald),
ordered, 248, 253, 275, 400. 244–245
See also cue order Shame of Gijón (soccer), 418
random, 253–254. See also shopping, 17, 19. See also
minimalist heuristic consumer behavior
by recency, 258–259 short-term memory. See memory,
rule, 8 short-term
sequential, 247, 455ff. See also shrinkage (model fit), 69, 89
self-organizing sequential side effect, 428, 431, 438–441,
search 445, 449, 452. See also
stopping, 210–213, 249, health care; medicine;
261–266. See also stopping screening
rule signal detection theory, 244, 490
by success, 256–257, 272, 289 simplicity, 57, 60, 61–62, 414,
time, 280 419, 426. See also
by usefulness, 255–257, 272, 289 robustness
by validity, 255–257, resistance to, 61, 75, 78–79
268–269, 272 single-event probabilities,
search committee (hiring), 167, 438–441, 445. See also
173, 339–340, 341–342 representation of information
SUBJECT INDEX 587
skew (skewness), 83, 90–91, Stevens’s law, 406

105–109, 461, 481. See also stock, 90, 122, 125, 128, 270, 290.
distribution, skewed; See also investment
environment structure, skew stock market, 4–5, 25, 125
skiing, 130. See also sports game, 220, 223–227, 231
slot machines, 422–424. See also stopping rule, 8, 210–213,
gambling 244–245, 338. See also
small sample. See sample size aspiration level; building
small-world problem, 25, 249, blocks, stopping rule; cue,
492, 493 discriminating; search,
soccer, 124, 128–129, 135, 137, stopping
168–169, 417–420, 426. See asymmetric, 385
also sports fixed-number, 264
social combination rule. See one-reason, 261–263, 265,
group decision making, 267–272, 293. See also
social combination rule frugality; lexicographic rule;
social communication rule. See minimalist heuristic; take-
group decision making, the-best
social communication rule single-cue, 264–265, 271–272
social environment, 18–19, 457ff two-reason, 263, 270–271. See
social learning. See learning, also confirmation model;
social take-two
social norms, 409, 413 stop sign, 416
social psychology. See St. Petersburg paradox, 107. See
psychology, social also expected utility theory;
sound, 422–423 gambles
source knowledge, 132–134, 181 strategy selection, 22, 33, 226,
space-count heuristic, 468–473, 239, 399, 402
476, 481 intelligence and, 229–231
spacing effects, 148–152 strategy selection theory, 22
species identification, 371 street gangs, 383
specificity (test), 432–433, 443. stroke, 430, 449. See also heart
See also conditional disease
probabilities structure of environment.
speech, 147 See environment structure
splitting profile (tree), 370–372. studying (school), 149, 151
See also cue, profile success (of cues), 256. See also
spontaneous numbers, 385 search, by success
sports, 5–6, 28–29, 115, 122, successive non-candidate count
124–125, 128, 152, 242, rule, 459, 469
417–419. See also baseball; sun. See weather
handball; hockey; skiing; surprise, 25
soccer; tennis SV (single variable) model,
statistical inference, 244, 429, 77–78
487–488, 496 switching costs, 410, 414
statistical information. See Switzerland, 22
information, statistical symptoms, 430, 431. See also
stereotype, 103–104 diagnosis
588 SUBJECT INDEX
take-the-best, 18, 21, 23–24, 29, taxi drivers. See drivers

41–45, 51–60, 74–78, 117, team. See group decision making;
153, 193–194, 197ff, 216ff, sports
252, 262, 266–267, 269, technology, 86, 89
275–276, 344, 347–352, television, 113, 147, 246, 417.
355–356, 363, 402, 417 See also media
definition, 9 temperature (weather), 35–39,
determining cue order, 46–49, 94. See also climate
276–278 change; weather
ecological rationality of, 50–60, tennis, 21, 22, 124, 128. See also
200, 261–263 sports
empirical tests of, 216ff, test set, 386–387. See also
289–290 accuracy, fitting; cross-
frugality, 201–202, 278–279 validation; generalization;
greedy, 41–45, 53–57, 262 robustness; training set
in groups, 342–343 testing-to-the-limits paradigm
mistakes, 76. See also error, (memory), 108
decision Tetris, 17
personality and use, 226–229. textbooks, 71, 447
See also individual theory testing. See hypothesis
differences testing
robustness (generalization), threshold rule, 458, 468. See also
43–45, 57–58, 197–201, satisficing
207–208 thrombosis, 428
simplicity of, 57–58 time pressure, 140, 189, 238, 272.
stopping consistent with, See also constraints, limited
210–213, 261. See also cue, time
discriminating time series, 66–69. See also
universal use of, 220–221 forecasting
take-the-first, 236–237 tit-for-tat, 10. See also game
take-the-last, 258, 261, 283. See theory
also search, by recency; trade-off, 26, 33, 50, 74, 364–365,
search, Einstellung 415. See also effort–accuracy
take-two, 194, 197ff. See also trade-off
confirmation model; traffic, 465
stopping rule, two-reason right-of-way rules, 276,
tallying, 193, 252, 262, 342–343, 416–417, 426
347–352, 355–356, 371, 400, Tragedy of the Commons, 465
402, 437, 469. See also training set, 42–43, 53, 197–198,
Dawes’s rule 386–387. See also accuracy,
definition, 9 fitting; cross-validation;
tally rule (ordering). See cue- generalization
ordering rules, tally; self- trajectory computation, 5–6, 28–29
organizing sequential search, transparency, 364, 369, 378, 414,
count (tally) rule 415–419, 426, 443, 450
task structure, 189. See also lack of, 446–452
information, costs; search, transpose rule. See cue-ordering
costs; time pressure rules, simple swap;
SUBJECT INDEX 589
self-organizing sequential vacuum cleaner, 269

search, transpose rule validity, 9, 118, 255, 343. See also
travel time. See parking, travel environment structure;
time fluency validity; knowledge
tree (classification). See validity; recognition validity;
classification tree; fast and search, by validity; validity
frugal tree; see also measures
estimation tree; natural calculating from stored
frequency tree exemplars, 277–278
trial-and-error learning. See conditions favoring ordering
learning, trial-and-error by, 303
true regression, 387 J-shaped distribution,
trust, 242, 450 345–354, 358
trust-your-doctor heuristic, 14 linear distribution, 345–354
truth (of statements), 139–141, 153 negative correlation with
TTB. See take-the-best discrimination rate,
two-alternative choice, 251–252, 254–255, 287
281, 371, 419 ordering cues by, 41, 236–237,
Type III error, 27, 82 250, 252, 254, 261–262, 290,
303. See also cue-ordering
ultimatum game, 8. See also game rules, validity
theory validity measures
Umwelt, 18. See also conditional validity, 42,
environment structure 52–53, 262
uncertainty, 4, 16–17, 18, 25, 35, ecological validity, 52,
46, 60, 426, 497. See also 278–279, 284–286, 295
environment structure, negative validity, 374–375
degree of uncertainty positive validity, 374–375
extended, 39–40 subjective, 294–295, 306
underconfidence, 97–100. See variability. See also environment
also overconfidence structure, variance or
underestimation, 84–86, 89. See variability
also overestimation of criterion values, 393, 396–397
unit-weight linear model. See of cue importance, 18,
linear model, unit-weight; 269–270. See also
see also Dawes’s rule; environment structure,
tallying dispersion of cue validities
United States, 105, 119, 413, 426, variance, 17, 46–51, 53–59, 83,
430, 443 86–90, 96–104, 238, 491. See
universal calculus, 24 also bias–variance dilemma;
urban growth, 404–405. See also environment structure,
city population variance or variability;
usefulness (of cues), 255–256. See mean–variance portfolio
also search, by usefulness vicarious functioning, 187
utility, 420–421. See also expected votes. See approval vote;
utility maximization; Condorcet jury theorem;
multiattribute utility theory; group decision making;
St. Petersburg paradox majority rule
590 SUBJECT INDEX
wealth, 125. See also data sets, working memory. See memory,
billionaires working
distribution, 379–381, 383 World Cup, 417–420. See also
weather, 35–39, 309–310, soccer
312–315, 439–440. See also
climate change; forecasting; x-out-of-y heuristic, 469–471,
temperature 473, 476, 481
Weber’s law, 406
weighted additive linear model zero-sum game, 423
(WADD), 269, 342–343, 344, Zig-Zag rule. See fast and frugal
347–352, 355–356 tree, Zig-Zag rule for
weighting and adding, 12–13, constructing
23–24, 29, 193, 235, 415, Zig-zag tree, 371–372, 375–378.
418. See also Franklin’s See also fast and frugal tree
rule Zipf’s law, 382, 388, 405. See also
word frequency, 382 power law

22

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

22

Uploaded by

Copyright:

Available Formats

Ecological Rationality

Intelligence in the World

General Editor: Stephen Stich, Rutgers University

Published in the Series

Natural Selection and Social Theory: Selected Papers of Robert

Adaptive Thinking: Rationality in the Real World

In Gods We Trust: The Evolutionary Landscape of Religion

The Origin and Evolution of Cultures

The Innate Mind: Structure and Contents

The Innate Mind, Volume 3: Foundations and the Future

Rationality for Mortals: How People Cope with Uncertainty

Ecological Rationality: Intelligence in the World

Intelligence in the World

Oxford New York

Copyright © 2012 by Peter M. Todd and Gerd Gigerenzer

Published by Oxford University Press, Inc.

Oxford is a registered trademark of Oxford University Press

All rights reserved. No part of this publication may be reproduced,

Library of Congress Cataloging-in-Publication Data

T welve years ago, we invited readers to participate in a journey

information in a “rational” way, identiﬁed variously with proposi-

There are many people who have helped us in producing this

Bloomington and Berlin Peter M. Todd

The ABC Research Group xv

Part I The Research Agenda

Part II Uncertainty in the World

Part III Correlations Between Recognition and the World

6 How Smart Forgetting Helps Heuristic Inference 144

Part IV Redundancy and Variability in the World

Part V Rarity and Skewness in the World

Part VI Designing the World

18 Car Parking as a Game Between Simple Heuristics 454

Part VII Afterword

The ABC Research Group is an interdisciplinary and international

Will M. Bennis Henry Brighton

Valerie M. Chase Daniel G. Goldstein

John M.C. Hutchinson Henrik Olsson

Rüdiger Sparr Jan K. Woike

Human rational behavior...is shaped by a scissors whose

heuristics, and in part from the structure of the environment: Our

1/ N rule: Invest equally in each of the N alternatives.

Markowitz was not alone in using this heuristic; empirical stud-

Computing the trajectory of a ball is not a simple feat. Theoretically,

The operation of this modiﬁed rule is intuitive: If players see the

As these examples illustrate, a heuristic is a strategy that ignores

object moving through three-dimensional space, which is an

Building Blocks of Heuristics

The Adaptive Toolbox

of area. Humans can visually estimate area by combining height

What Is Not a Heuristic?

found it extreamly [sic] useful. By the way, if you do not learn

Modern versions of Franklin’s moral algebra include expected

incontinence and impotence from operations following positive

What Is Ecological Rationality?

The concept of ecological rationality—of speciﬁc decision-making

the structure of task environments and the computational capabili-

Given a heuristic, in what environments does it succeed?

The investment example answers the ﬁrst and second questions,

1/N rule is likely to perform better than the mean–variance port-

The Structure of Environments

Uncertainty The degree of uncertainty refers to how well the available

Number of Alternatives In general, problems with a large number of

Redundancy How highly correlated different cues are in the environ-